MCP server that fetches web-page content through a Playwright headless browser, offering Readability extraction, Markdown/HTML output and HTTP/SSE transports.
https://github.com/jae-jae/fetcher-mcpMost web scrapers break when they hit modern web applications. You know the drill – empty responses from React apps, missing content from lazy-loaded sections, and authentication walls that block your automation.
Fetcher MCP solves this with a Playwright-powered browser that actually executes JavaScript, handles authentication flows, and extracts clean content automatically.
Real Browser Execution: Unlike curl or basic scrapers, Fetcher MCP runs a full Chromium instance. JavaScript renders, dynamic content loads, and you get the actual page users see.
Smart Content Extraction: Built-in Readability algorithm strips away navigation, ads, and page clutter – you get just the article content in clean Markdown or HTML.
Parallel Processing: The fetch_urls tool handles multiple URLs concurrently. Instead of sequential requests that take forever, batch operations finish in seconds.
Authentication Support: Debug mode keeps the browser window open so you can manually log in, then continue fetching authenticated content. Perfect for sites behind login walls.
Anti-Bot Handling: Configurable navigation waits and timeouts handle sites with CAPTCHA redirects or verification flows that would break traditional scrapers.
AI Training Data Collection: Fetch clean article content from news sites, blogs, and documentation without fighting JavaScript frameworks or authentication requirements.
# Batch fetch multiple articles for training data
npx -y fetcher-mcp
# Then use fetch_urls with your URL list
Competitive Intelligence: Monitor competitor sites that use complex JavaScript for content updates. Get reliable, consistent data extraction even when they change their frontend stack.
Documentation Aggregation: Pull content from multiple technical documentation sites, many of which are now built with modern JavaScript frameworks that break traditional scrapers.
Content Research Workflows: Research teams can fetch content from subscription sites by using debug mode for authentication, then automate the rest of the content extraction.
Skip the complex Playwright installation dance:
# Run immediately - no local setup required
npx -y fetcher-mcp
# First time? Install the browser
npx playwright install chromium
For Claude Desktop integration, just add this to your config:
{
"mcpServers": {
"fetcher": {
"command": "npx",
"args": ["-y", "fetcher-mcp"]
}
}
}
Beyond MCP protocol, Fetcher runs as a standalone HTTP service:
npx -y fetcher-mcp --transport=http --port=3000
Now you have both /mcp and /sse endpoints for integrating with any application stack.
Fetcher MCP ships with resource optimization enabled – it blocks images, stylesheets, and media by default. This means faster fetching and lower bandwidth usage while still executing the JavaScript you need for content.
The Readability extraction handles most content sites automatically, but you can disable it for sites where you need the full HTML structure.
Production deployment is straightforward:
docker run -p 3000:3000 ghcr.io/jae-jae/fetcher-mcp:latest
The container handles all Playwright dependencies and browser installation automatically.
Debug mode lets you interact with the browser directly:
npx -y fetcher-mcp --debug
Perfect for handling complex authentication flows, solving CAPTCHAs manually, or debugging why certain sites aren't working as expected.
Bottom Line: If you're building applications that need reliable web content extraction, Fetcher MCP handles the JavaScript execution and content cleaning that basic scrapers can't. It's the difference between spending hours debugging scraper failures and having content extraction that just works.
Try it immediately with npx -y fetcher-mcp – no installation, no configuration files, no browser setup headaches.