Web data extraction platform— Now in beta

Turn any website into structured data.
Stored forever.

Submit a URL and get back clean Markdown, HTML, or AI-extracted JSON from every page — permanently stored, ready to export or pipe into any LLM pipeline.

No account required · 1 page · Clean markdown output

pages never expire
3output formats
per page crawled
Free tier — no card required

What you get

Your data.
Your format. Forever.

Every feature is available via REST API — no SDK, no lock-in, any language.

01Storage

Permanent Page Storage

Raw crawl results expire after 14 days. WebExtract stores every page indefinitely in your account — searchable, exportable, always there.

02Clean Output

LLM-Ready Markdown

Every page is cleaned, structured, and returned as Markdown — stripped of nav, ads, and boilerplate. Pipe directly into any LLM or RAG pipeline.

03AI Extraction

Structured JSON Extraction

Describe what data you want in plain English. WebExtract extracts structured objects from every page — titles, prices, authors, summaries, anything.

04Smart Discovery

Sitemap + Link Discovery

Discovers pages from both sitemaps and inline links simultaneously. Configurable depth, include/exclude patterns, and external link control.

05Async-Native

Webhook on Completion

Provide a webhook URL and receive a POST the moment your crawl finishes. Full results included — no polling loop, no waiting.

06Portable Data

Export Anytime

Download your crawled pages as JSON, JSONL, CSV, or Markdown. Your data, your format — not locked in any proprietary store.

How it works
1

Submit a URL

POST your target URL with config — page limit, output formats, URL patterns.

2

Pages Discovered

WebExtract finds every page via sitemaps and links, renders JS, and captures content.

3

Stored Permanently

Every crawled page is persisted to your account — no expiry, no data loss.

4

Use Anywhere

Fetch inline results, receive a webhook, or export as JSON, CSV, or Markdown.

POST /api/crawl201
// one call — get everything back
const res = await fetch('https://api.webextract.dev/api/crawl', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer cai_sk_••••' },
  body: JSON.stringify({
    url: 'https://docs.acme.com',
    limit: 50, formats: ['markdown']
  })
});

// results inline — no polling needed
const { pages } = await res.json();
// pages[0] → { url, markdown, metadata }
// stored in your account forever ✦

Pricing

Pay for what you crawl

1 page crawled = 1 credit consumed. Nothing else.

Free

$0/mo

500 pages / month

  • 500 pages/month
  • Markdown + HTML output
  • API access
  • Permanent storage
Start Free

Starter

$9/mo

5,000 pages / month

  • 5,000 pages/month
  • All output formats
  • Webhooks
  • JSON AI extraction
  • Priority support
Get Started

Pro

$29/mo

25,000 pages / month

  • 25,000 pages/month
  • Everything in Starter
  • Team workspace
  • SLA guarantee
Go Pro

All plans include permanent storage, API access, and webhook support.

Get started today

Start extracting.
Your first 500 pages are free.

No credit card. No setup. API key ready in seconds — start crawling immediately.