MCP server
Expose WebReaper as MCP tools over stdio or Streamable HTTP, for Cursor, Claude Desktop, Copilot Studio, and n8n.
For agent clients that speak the Model Context Protocol, WebReaper ships an MCP server. It surfaces the same scraping capabilities as tools, so an MCP-only client can scrape, map, crawl, and extract without shelling out to the CLI itself.
There are two servers, same tools, different transports:
WebReaper.Mcpspeaks stdio, the transport local clients use when they spawn the server as a child process (Cursor, Claude Desktop, Copilot Studio).WebReaper.Mcp.AspNetCorespeaks Streamable HTTP, for clients that connect to a URL instead. The headline case is n8n, whose MCP Client node is URL-only and cannot reach a stdio server.
What it provides
Both servers expose WebReaper's core operations as MCP tools:
scrapea page to clean Markdownmapa site to discover its URLsextractstructured data with a JSON schemaextract_with_promptstructured data by a natural-language prompt, with an LLMextract_inferredstructured data with no schema: an LLM infers the schema once, then WebReaper extracts deterministically (cheaper and more consistent than a prompt across similarly shaped pages)crawla whole site: a bounded on-domain sweep returning one Markdown record per page
The extract_with_prompt and extract_inferred tools call an LLM, so they need
an OpenAI-compatible endpoint configured on the MCP host: set WEBREAPER_LLM_MODEL
and WEBREAPER_LLM_BASE_URL (for example https://api.openai.com/v1, or
http://localhost:11434/v1 for a local Ollama), with the API key in
WEBREAPER_LLM_API_KEY (or OPENAI_API_KEY). Both take an optional per-call
model parameter that overrides WEBREAPER_LLM_MODEL for that one call. The
other tools need no setup.
crawl is a single long blocking call bounded by maxPages (default 50,
hard cap 1000). It emits MCP progress notifications per page for clients that
render them (Claude Desktop, Cursor); a blocking client like n8n just waits for
the result. For a large site, prefer map to list the URLs and then scrape
each, so every call stays short.
Use it with n8n
Run the HTTP server. The container image bakes a headless Chromium, so
browser=true works out of the box:
docker run -p 8080:8080 -e WEBREAPER_MCP_TOKEN=your-secret \
ghcr.io/alex-on-ai/webreaper-mcp-http:latestWEBREAPER_MCP_TOKEN is required here: the server binds all interfaces in the
container, and it refuses to start on a non-loopback interface without a token,
so an unauthenticated public scraper is structurally prevented.
Then add an MCP Client Tool node in your n8n workflow:
- Server Transport: HTTP Streamable (not the deprecated SSE)
- MCP Endpoint URL: the server URL (for example
http://webreaper-mcp:8080when both run on the same Docker network, orhttp://host.docker.internal:8080) - Authentication: Bearer, value = your
WEBREAPER_MCP_TOKEN
n8n fetches the tool list automatically. Attach the node to an AI Agent, or call
a tool directly and iterate over results with n8n's own nodes (for example map,
then a loop that scrapes each URL).
To share one browser pool instead of launching Chromium per call, set
WEBREAPER_CDP_URL to an external CDP endpoint (a browserless sidecar);
WEBREAPER_MCP_MAX_CONCURRENT_BROWSERS caps concurrent launches otherwise.
When to reach for it
If you are using Claude Code, the bundled Claude Code skill is the simpler path: it routes plain-language requests to the CLI directly. The MCP servers are the right adapter for clients that talk to tools over MCP, the stdio one for local process-spawning clients and the HTTP one for URL-based clients like n8n. The scope is single-tenant, self-run; a hosted multi-tenant endpoint is future work.
Version pinning
The MCP satellites are built on the preview ModelContextProtocol SDK. Because
it is a preview dependency, pin the package version explicitly when you ship, so
a later preview release does not change behavior under you:
dotnet add package WebReaper.Mcp --version <pinned-version>Pinning here is a deliberate recommendation, not a formality: preview SDKs move quickly, and an explicit version keeps your integration reproducible.