Guide
Exporting Data
All crawled pages are stored permanently in your account. Export them at any time in four formats — no lock-in, no expiry, no re-crawl required.
Export formats
.jsonArray of page objects with all fields. Best for programmatic processing.
APIs, JavaScript, Python
.jsonlOne JSON object per line. Ideal for streaming large datasets or LLM fine-tuning.
LLM training, data pipelines
.csvSpreadsheet-compatible. URL, status, title, and metadata columns. No HTML/Markdown.
Excel, Google Sheets, analytics
.zipOne .md file per page, named by URL slug. Includes all content, ready for docs pipelines.
RAG, knowledge bases, docs
Export from the dashboard
Open any completed crawl job in your dashboard. Click the Export dropdown in the top-right corner and select your format. The file downloads immediately.
Export sizes
Dashboard exports include all pages for jobs with up to 100 pages. For larger jobs, use the API to paginate and process pages in batches.
Export via API
For large crawls or automated pipelines, paginate through pages using the list endpoint and process them in your own code.
let cursor = 0, allPages = [];
while (true) {
const res = await fetch(
`/api/crawl/${jobId}/pages?limit=100&cursor=${cursor}`,
{ headers: { Authorization: `Bearer ${key}` } }
);
const { pages, cursor: next } = await res.json();
allPages.push(...pages);
if (pages.length < 100) break;
cursor = next;
}Each page in the list response omits heavy fields (html, markdown, extractedJson) by default. Fetch GET /api/crawl/:id/pages/:pageId to get full content for individual pages.
Related