Guide

Markdown Output

WebExtract converts any web page into clean, structured Markdown — automatically stripping navigation menus, footers, cookie banners, ads, and boilerplate. The result is ready to pipe directly into any LLM, RAG pipeline, or document store.

Requesting Markdown

Include "markdown" in the formats array. Markdown is included in the default output if you omit the field entirely.

Request
{
  "url": "https://example.com/blog",
  "limit": 20,
  "formats": ["markdown"]
}

What gets stripped

Navigation bars & menus
Header & footer boilerplate
Cookie consent banners
Sidebar widgets
Social share buttons
Advertisement blocks
Main article content
Code blocks (preserved)
Tables (converted)
Images (as alt text)

Output example

page.markdown (truncated)
# Getting Started with the API

This guide walks you through your first API call.
Authentication uses Bearer tokens in the request header.

## Prerequisites

- An active API key
- curl or any HTTP client

## First Request

```bash
curl -X POST https://api.example.com/data
```

JavaScript rendering

For pages that load content dynamically (React SPAs, Angular apps), enable JavaScript rendering with "render": true. This adds processing time but ensures all content is captured.

Note: Rendering is slower and consumes more processing resources. Use it only for SPAs — most documentation sites and blogs don't require it.

Up next