WebExtract

Web data extraction platform— Now in beta

Turn any website into structured data.
Stored forever.

Submit a URL and get back clean Markdown, HTML, or AI-extracted JSON from every page — permanently stored, ready to export or pipe into any LLM pipeline.

Start crawling free See how it works

No account required · 1 page · Clean markdown output

∞pages never expire

3output formats

1¢per page crawled

Free tier — no card required

◆Permanent Storage◆LLM-Ready Markdown◆JSON Extraction◆HTML Output◆Webhook Delivery◆CSV Export◆API-First◆Credit-Based Pricing◆Incremental Crawls◆Pattern Filtering◆robots.txt Compliant◆Sitemap Discovery◆Permanent Storage◆LLM-Ready Markdown◆JSON Extraction◆HTML Output◆Webhook Delivery◆CSV Export◆API-First◆Credit-Based Pricing◆Incremental Crawls◆Pattern Filtering◆robots.txt Compliant◆Sitemap Discovery

What you get

Your data.
Your format. Forever.

Every feature is available via REST API — no SDK, no lock-in, any language.

01Storage

Permanent Page Storage

Raw crawl results expire after 14 days. WebExtract stores every page indefinitely in your account — searchable, exportable, always there.

02Clean Output

LLM-Ready Markdown

Every page is cleaned, structured, and returned as Markdown — stripped of nav, ads, and boilerplate. Pipe directly into any LLM or RAG pipeline.

03AI Extraction

Structured JSON Extraction

Describe what data you want in plain English. WebExtract extracts structured objects from every page — titles, prices, authors, summaries, anything.

04Smart Discovery

Sitemap + Link Discovery

Discovers pages from both sitemaps and inline links simultaneously. Configurable depth, include/exclude patterns, and external link control.

05Async-Native

Webhook on Completion

Provide a webhook URL and receive a POST the moment your crawl finishes. Full results included — no polling loop, no waiting.

06Portable Data

Export Anytime

Download your crawled pages as JSON, JSONL, CSV, or Markdown. Your data, your format — not locked in any proprietary store.

How it works

Submit a URL

POST your target URL with config — page limit, output formats, URL patterns.

Pages Discovered

WebExtract finds every page via sitemaps and links, renders JS, and captures content.

Stored Permanently

Every crawled page is persisted to your account — no expiry, no data loss.

Use Anywhere

Fetch inline results, receive a webhook, or export as JSON, CSV, or Markdown.

POST /api/crawl201

// one call — get everything back
const res = await fetch('https://api.webextract.dev/api/crawl', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer cai_sk_••••' },
  body: JSON.stringify({
    url: 'https://docs.acme.com',
    limit: 50, formats: ['markdown']
  })
});

// results inline — no polling needed
const { pages } = await res.json();
// pages[0] → { url, markdown, metadata }
// stored in your account forever ✦

Pricing

Pay for what you crawl

1 page crawled = 1 credit consumed. Nothing else.

Free

$0/mo

500 pages / month

✦500 pages/month
✦Markdown + HTML output
✦API access
✦Permanent storage

Start Free

Starter

$9/mo

5,000 pages / month

✦5,000 pages/month
✦All output formats
✦Webhooks
✦JSON AI extraction
✦Priority support

Get Started

Pro

$29/mo

25,000 pages / month

✦25,000 pages/month
✦Everything in Starter
✦Team workspace
✦SLA guarantee

Go Pro

All plans include permanent storage, API access, and webhook support.

Get started today

Start extracting.
Your first 500 pages are free.

No credit card. No setup. API key ready in seconds — start crawling immediately.

Create free account View API docs

Turn any website into structured data.Stored forever.

Your data.Your format. Forever.