puppeteer-vision-mcp
If you are the rightful owner of puppeteer-vision-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcpreview.com.
Puppeteer Vision MCP Server is a tool for scraping webpages and converting them to markdown format using Puppeteer, Readability, and Turndown, with AI-driven interaction capabilities.
Puppeteer vision MCP Server
This Model Context Protocol (MCP) server provides a tool for scraping webpages and converting them to markdown format using Puppeteer, Readability, and Turndown. It features AI-driven interaction capabilities to handle cookies, captchas, and other interactive elements automatically.
Now easily runnable via npx
!
Features
- Scrapes webpages using Puppeteer with stealth mode
- Uses AI-powered interaction to automatically handle:
- Cookie consent banners
- CAPTCHAs
- Newsletter or subscription prompts
- Paywalls and login walls
- Age verification prompts
- Interstitial ads
- Any other interactive elements blocking content
- Extracts main content with Mozilla's Readability
- Converts HTML to well-formatted Markdown
- Special handling for code blocks, tables, and other structured content
- Accessible via the Model Context Protocol
- Option to view browser interaction in real-time by disabling headless mode
- Easily consumable as an
npx
package.
Quick Start with NPX
The recommended way to use this server is via npx
, which ensures you're running the latest version without needing to clone or manually install.
-
Prerequisites: Ensure you have Node.js and npm installed.
-
Environment Setup: The server requires an
OPENAI_API_KEY
. You can provide this and other optional configurations in two ways:.env
file: Create a.env
file in the directory where you will run thenpx
command.- Shell Environment Variables: Export the variables in your terminal session.
Example
.env
file or shell exports:# Required OPENAI_API_KEY=your_api_key_here # Optional (defaults shown) # VISION_MODEL=gpt-4.1 # API_BASE_URL=https://api.openai.com/v1 # Uncomment to override # TRANSPORT_TYPE=stdio # Options: stdio, sse, http # USE_SSE=true # Deprecated: use TRANSPORT_TYPE=sse instead # PORT=3001 # Only used in sse/http modes # DISABLE_HEADLESS=true # Uncomment to see the browser in action
-
Run the Server: Open your terminal and run:
npx -y puppeteer-vision-mcp-server
- The
-y
flag automatically confirms any prompts fromnpx
. - This command will download (if not already cached) and execute the server.
- By default, it starts in
stdio
mode. SetTRANSPORT_TYPE=sse
orTRANSPORT_TYPE=http
for HTTP server modes.
- The
Using as an MCP Tool with NPX
This server is designed to be integrated as a tool within an MCP-compatible LLM orchestrator. Here's an example configuration snippet:
{
"mcpServers": {
"web-scraper": {
"command": "npx",
"args": ["-y", "puppeteer-vision-mcp-server"],
"env": {
"OPENAI_API_KEY": "YOUR_OPENAI_API_KEY_HERE",
// Optional:
// "VISION_MODEL": "gpt-4.1",
// "API_BASE_URL": "https://api.example.com/v1",
// "TRANSPORT_TYPE": "stdio", // or "sse" or "http"
// "DISABLE_HEADLESS": "true" // To see the browser during operations
}
}
// ... other MCP servers
}
}
When configured this way, the MCP orchestrator will manage the lifecycle of the puppeteer-vision-mcp-server
process.
Environment Configuration Details
Regardless of how you run the server (NPX or local development), it uses the following environment variables:
OPENAI_API_KEY
: (Required) Your API key for accessing the vision model.VISION_MODEL
: (Optional) The model to use for vision analysis.- Default:
gpt-4.1
- Can be any model with vision capabilities.
- Default:
API_BASE_URL
: (Optional) Custom API endpoint URL.- Use this to connect to alternative OpenAI-compatible providers (e.g., Together.ai, Groq, Anthropic, local deployments).
TRANSPORT_TYPE
: (Optional) The transport protocol to use.- Options:
stdio
(default),sse
,http
stdio
: Direct process communication (recommended for most use cases)sse
: Server-Sent Events over HTTP (legacy mode)http
: Streamable HTTP transport with session management
- Options:
USE_SSE
: (Optional, deprecated) Set totrue
to enable SSE mode over HTTP.- Deprecated: Use
TRANSPORT_TYPE=sse
instead.
- Deprecated: Use
PORT
: (Optional) The port for the HTTP server in SSE or HTTP mode.- Default:
3001
.
- Default:
DISABLE_HEADLESS
: (Optional) Set totrue
to run the browser in visible mode.- Default:
false
(browser runs in headless mode).
- Default:
Communication Modes
The server supports three communication modes:
- stdio (Default): Communicates via standard input/output.
- Perfect for direct integration with LLM tools that manage processes.
- Ideal for command-line usage and scripting.
- No HTTP server is started. This is the default mode.
- SSE mode: Communicates via Server-Sent Events over HTTP.
- Enable by setting
TRANSPORT_TYPE=sse
in your environment. - Starts an HTTP server on the specified
PORT
(default: 3001). - Use when you need to connect to the tool over a network.
- Connect to:
http://localhost:3001/sse
- Enable by setting
- HTTP mode: Communicates via Streamable HTTP transport with session management.
- Enable by setting
TRANSPORT_TYPE=http
in your environment. - Starts an HTTP server on the specified
PORT
(default: 3001). - Supports full session management and resumable connections.
- Connect to:
http://localhost:3001/mcp
- Enable by setting
Tool Usage (MCP Invocation)
The server provides a scrape-webpage
tool.
Tool Parameters:
url
(string, required): The URL of the webpage to scrape.autoInteract
(boolean, optional, default: true): Whether to automatically handle interactive elements.maxInteractionAttempts
(number, optional, default: 3): Maximum number of AI interaction attempts.waitForNetworkIdle
(boolean, optional, default: true): Whether to wait for network to be idle before processing.
Response Format:
The tool returns its result in a structured format:
content
: An array containing a single text object with the raw markdown of the scraped webpage.metadata
: Contains additional information:message
: Status message.success
: Boolean indicating success.contentSize
: Size of the content in characters (on success).
Example Success Response:
{
"content": [
{
"type": "text",
"text": "# Page Title\n\nThis is the content..."
}
],
"metadata": {
"message": "Scraping successful",
"success": true,
"contentSize": 8734
}
}
Example Error Response:
{
"content": [
{
"type": "text",
"text": ""
}
],
"metadata": {
"message": "Error scraping webpage: Failed to load the URL",
"success": false
}
}
How It Works
AI-Driven Interaction
The system uses vision-capable AI models (configurable via VISION_MODEL
and API_BASE_URL
) to analyze screenshots of web pages and decide on actions like clicking, typing, or scrolling to bypass overlays and consent forms. This process repeats up to maxInteractionAttempts
.
Content Extraction
After interactions, Mozilla's Readability extracts the main content, which is then sanitized and converted to Markdown using Turndown with custom rules for code blocks and tables.
Installation & Development (for Modifying the Code)
If you wish to contribute, modify the server, or run a local development version:
-
Clone the Repository:
git clone https://github.com/djannot/puppeteer-vision-mcp.git cd puppeteer-vision-mcp
-
Install Dependencies:
npm install
-
Build the Project:
npm run build
-
Set Up Environment: Create a
.env
file in the project's root directory with yourOPENAI_API_KEY
and any other desired configurations (see "Environment Configuration Details" above). -
Run for Development:
npm start # Starts the server using the local build
Or, for automatic rebuilding on changes:
npm run dev
Customization (for Developers)
You can modify the behavior of the scraper by editing:
src/ai/vision-analyzer.ts
(analyzePageWithAI
function): Customize the AI prompt.src/ai/page-interactions.ts
(executeAction
function): Add new action types.src/scrapers/webpage-scraper.ts
(visitWebPage
function): Change Puppeteer options.src/utils/markdown-formatters.ts
: Adjust Turndown rules for Markdown conversion.
Dependencies
Key dependencies include:
@modelcontextprotocol/sdk
puppeteer
,puppeteer-extra
@mozilla/readability
,jsdom
turndown
,sanitize-html
openai
(or compatible API for vision models)express
(for SSE mode)zod
Related MCP Servers
View all browser_automation servers →Fetch
by modelcontextprotocol
A Model Context Protocol server that provides web content fetching capabilities, enabling LLMs to retrieve and process content from web pages.
markdownify-mcp
by zcaceres
Markdownify is a Model Context Protocol (MCP) server that converts various file types and web content to Markdown format.
deepwiki-mcp
by regenrek
This is an unofficial Deepwiki MCP Server that processes Deepwiki URLs, crawls pages, converts them to Markdown, and returns documents or lists by page.
mcp-playwright
by executeautomation
A Model Context Protocol server that provides browser automation capabilities using Playwright.
fetch-mcp
by zcaceres
This MCP server provides functionality to fetch web content in various formats, including HTML, JSON, plain text, and Markdown.
web-eval-agent
by Operative-Sh
operative.sh's MCP Server is a tool for autonomous debugging of web applications directly from your code editor.
cursor-talk-to-figma-mcp
by sonnylazuardi
This project implements a Model Context Protocol (MCP) integration between Cursor AI and Figma, allowing Cursor to communicate with Figma for reading designs and modifying them programmatically.