Guide
JSON Extraction
Describe what data you want in plain English and WebExtract extracts structured JSON objects from every page — product prices, article titles, author names, review scores, or any information visible on the page.
How it works
You write a prompt
Describe the data structure you want extracted in natural language.
AI reads the page
Each crawled page is analyzed against your prompt using an LLM.
Structured JSON returned
Results are stored as extractedJson on each page object.
Example: Product data
Crawl an e-commerce site and extract product names, prices, and availability from every page.
{
"url": "https://store.example.com",
"formats": ["json"],
"jsonOptions": {
"prompt": "Extract the product name, price in USD,
and whether it is in stock.",
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "product",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"inStock": { "type": "boolean" }
}
}
}
}
}
}{
"name": "Wireless Mechanical Keyboard",
"price": 89.99,
"inStock": true
}Prompt tips
Be specific about field names
Say "extract the product name as productName" rather than just "extract the name".
Use response_format for strict shapes
Providing a JSON schema ensures consistent output across all pages.
Combine with markdown format
Request both "markdown" and "json" to get the full page text plus structured data.
Null fields are normal
If a field isn't found on a page, it will be null — not an error.
Up next