Guide

Webhooks

Instead of polling the job status endpoint, provide a webhook URL when submitting a crawl. WebExtract sends a POST request to that URL the moment the crawl job reaches a terminal state.

Configuring a webhook

Add a webhookUrl to your crawl submission. The URL must be publicly reachable over HTTPS.

Crawl submission with webhook
{
  "url": "https://docs.acme.com",
  "limit": 100,
  "formats": ["markdown"],
  "webhookUrl": "https://yourapp.com/webhooks/crawl"
}

Webhook payload

WebExtract sends a JSON payload via POST with a Content-Type: application/json header.

Payload (crawl.completed)
{
  "event": "crawl.completed",
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "pagesStored": 87,
  "pagesTotal": 92,
  "creditsUsed": 92,
  "timestamp": "2026-03-15T10:47:00Z"
}

Receiving webhooks

Your endpoint must respond with a 2xx status within 10 seconds. Non-2xx responses are logged as warnings but do not cause retries.

Example handler (Node.js)
app.post('/webhooks/crawl', async (req, res) => {
  const { event, jobId, pagesStored } = req.body;

  if (event === 'crawl.completed') {
    // fetch pages and process them
    await processJob(jobId);
  }

  res.sendStatus(200);
});

Security

Use HTTPS only

WebExtract blocks webhook delivery to non-HTTPS URLs and private/internal IP addresses.

Validate the jobId

Always look up the jobId against your own records before processing — never trust the payload blindly.

Idempotent handlers

While unlikely, design your handler to be safe if called twice for the same job.

Related