Guide
Markdown Output
WebExtract converts any web page into clean, structured Markdown — automatically stripping navigation menus, footers, cookie banners, ads, and boilerplate. The result is ready to pipe directly into any LLM, RAG pipeline, or document store.
Requesting Markdown
Include "markdown" in the formats array. Markdown is included in the default output if you omit the field entirely.
Request
{
"url": "https://example.com/blog",
"limit": 20,
"formats": ["markdown"]
}What gets stripped
✕Navigation bars & menus
✕Header & footer boilerplate
✕Cookie consent banners
✕Sidebar widgets
✕Social share buttons
✕Advertisement blocks
✓Main article content
✓Code blocks (preserved)
✓Tables (converted)
✓Images (as alt text)
Output example
page.markdown (truncated)
# Getting Started with the API
This guide walks you through your first API call.
Authentication uses Bearer tokens in the request header.
## Prerequisites
- An active API key
- curl or any HTTP client
## First Request
```bash
curl -X POST https://api.example.com/data
```JavaScript rendering
For pages that load content dynamically (React SPAs, Angular apps), enable JavaScript rendering with "render": true. This adds processing time but ensures all content is captured.
Note: Rendering is slower and consumes more processing resources. Use it only for SPAs — most documentation sites and blogs don't require it.
Up next