Guide
Webhooks
Instead of polling the job status endpoint, provide a webhook URL when submitting a crawl. WebExtract sends a POST request to that URL the moment the crawl job reaches a terminal state.
Configuring a webhook
Add a webhookUrl to your crawl submission. The URL must be publicly reachable over HTTPS.
{
"url": "https://docs.acme.com",
"limit": 100,
"formats": ["markdown"],
"webhookUrl": "https://yourapp.com/webhooks/crawl"
}Webhook payload
WebExtract sends a JSON payload via POST with a Content-Type: application/json header.
{
"event": "crawl.completed",
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"pagesStored": 87,
"pagesTotal": 92,
"creditsUsed": 92,
"timestamp": "2026-03-15T10:47:00Z"
}Receiving webhooks
Your endpoint must respond with a 2xx status within 10 seconds. Non-2xx responses are logged as warnings but do not cause retries.
app.post('/webhooks/crawl', async (req, res) => {
const { event, jobId, pagesStored } = req.body;
if (event === 'crawl.completed') {
// fetch pages and process them
await processJob(jobId);
}
res.sendStatus(200);
});Security
Use HTTPS only
WebExtract blocks webhook delivery to non-HTTPS URLs and private/internal IP addresses.
Validate the jobId
Always look up the jobId against your own records before processing — never trust the payload blindly.
Idempotent handlers
While unlikely, design your handler to be safe if called twice for the same job.
Related