Guide

Markdown Output

WebExtract converts any web page into clean, structured Markdown — automatically stripping navigation menus, footers, cookie banners, ads, and boilerplate. The result is ready to pipe directly into any LLM, RAG pipeline, or document store.

Requesting Markdown

Include "markdown" in the formats array. Markdown is included in the default output if you omit the field entirely.

Request

{
  "url": "https://example.com/blog",
  "limit": 20,
  "formats": ["markdown"]
}

What gets stripped

✕Navigation bars & menus

✕Header & footer boilerplate

✕Cookie consent banners

✕Sidebar widgets

✕Social share buttons

✕Advertisement blocks

✓Main article content

✓Code blocks (preserved)

✓Tables (converted)

✓Images (as alt text)

Output example

page.markdown (truncated)

# Getting Started with the API

This guide walks you through your first API call.
Authentication uses Bearer tokens in the request header.

## Prerequisites

- An active API key
- curl or any HTTP client

## First Request

```bash
curl -X POST https://api.example.com/data
```

JavaScript rendering

For pages that load content dynamically (React SPAs, Angular apps), enable JavaScript rendering with "render": true. This adds processing time but ensures all content is captured.

Note: Rendering is slower and consumes more processing resources. Use it only for SPAs — most documentation sites and blogs don't require it.

Up next

JSON Extraction Export Data Submit a Crawl