For agent clients that speak the Model Context Protocol, WebReaper ships an MCP server. It surfaces the same scraping capabilities as tools, so an MCP-only client can scrape, map, crawl, and extract without shelling out to the CLI itself.

There are two servers, same tools, different transports:

WebReaper.Mcp speaks stdio, the transport local clients use when they spawn the server as a child process (Cursor, Claude Desktop, Copilot Studio).
WebReaper.Mcp.AspNetCore speaks Streamable HTTP, for clients that connect to a URL instead. The headline case is n8n, whose MCP Client node is URL-only and cannot reach a stdio server.

What it provides

Both servers expose WebReaper's core operations as MCP tools:

scrape a page to clean Markdown
map a site to discover its URLs
extract structured data with a JSON schema
extract_with_prompt structured data by a natural-language prompt, with an LLM
extract_inferred structured data with no schema: an LLM infers the schema once, then WebReaper extracts deterministically (cheaper and more consistent than a prompt across similarly shaped pages)
crawl a whole site: a bounded on-domain sweep returning one Markdown record per page

The extract_with_prompt and extract_inferred tools call an LLM, so they need an OpenAI-compatible endpoint configured on the MCP host: set WEBREAPER_LLM_MODEL and WEBREAPER_LLM_BASE_URL (for example https://api.openai.com/v1, or http://localhost:11434/v1 for a local Ollama), with the API key in WEBREAPER_LLM_API_KEY (or OPENAI_API_KEY). Both take an optional per-call model parameter that overrides WEBREAPER_LLM_MODEL for that one call. The other tools need no setup.

crawl is a single long blocking call bounded by maxPages (default 50, hard cap 1000). It emits MCP progress notifications per page for clients that render them (Claude Desktop, Cursor); a blocking client like n8n just waits for the result. For a large site, prefer map to list the URLs and then scrape each, so every call stays short.

Use it with n8n

Run the HTTP server. The container image bakes a headless Chromium, so browser=true works out of the box:

docker run -p 8080:8080 -e WEBREAPER_MCP_TOKEN=your-secret \
  ghcr.io/alex-on-ai/webreaper-mcp-http:latest

WEBREAPER_MCP_TOKEN is required here: the server binds all interfaces in the container, and it refuses to start on a non-loopback interface without a token, so an unauthenticated public scraper is structurally prevented.

Then add an MCP Client Tool node in your n8n workflow:

Server Transport: HTTP Streamable (not the deprecated SSE)
MCP Endpoint URL: the server URL (for example http://webreaper-mcp:8080 when both run on the same Docker network, or http://host.docker.internal:8080)
Authentication: Bearer, value = your WEBREAPER_MCP_TOKEN

n8n fetches the tool list automatically. Attach the node to an AI Agent, or call a tool directly and iterate over results with n8n's own nodes (for example map, then a loop that scrapes each URL).

To share one browser pool instead of launching Chromium per call, set WEBREAPER_CDP_URL to an external CDP endpoint (a browserless sidecar); WEBREAPER_MCP_MAX_CONCURRENT_BROWSERS caps concurrent launches otherwise.

When to reach for it

If you are using Claude Code, the bundled Claude Code skill is the simpler path: it routes plain-language requests to the CLI directly. The MCP servers are the right adapter for clients that talk to tools over MCP, the stdio one for local process-spawning clients and the HTTP one for URL-based clients like n8n. The scope is single-tenant, self-run; a hosted multi-tenant endpoint is future work.

Version pinning

The MCP satellites are built on the preview ModelContextProtocol SDK. Because it is a preview dependency, pin the package version explicitly when you ship, so a later preview release does not change behavior under you:

dotnet add package WebReaper.Mcp --version <pinned-version>

Pinning here is a deliberate recommendation, not a formality: preview SDKs move quickly, and an explicit version keeps your integration reproducible.