crawl4ai-mcp-server

crawl4ai-mcp-server

3.5

Crawl4ai MCP Server provides web crawling capabilities using crawl4ai with markdown output for LLM.

Top Comments

The Crawl4ai MCP Server is designed to facilitate web crawling tasks by leveraging the capabilities of the crawl4ai platform. It outputs the crawled data in markdown format, making it suitable for integration with language learning models (LLM). The server is built on Node.js and requires access to a crawl4ai instance. It supports crawling multiple URLs and returns the content with proper citations, formatted in markdown. The server is configurable through environment variables, allowing for customization of the API URL and authentication token if needed. It also includes error handling mechanisms to manage common issues such as invalid URLs, authentication errors, and network connectivity problems.

Features

  • {'name': 'Web Crawling', 'description': 'Crawl web pages and retrieve content in markdown format with citations.'}
  • {'name': 'Markdown Output', 'description': 'Outputs crawled data in markdown format, suitable for LLM integration.'}
  • {'name': 'Error Handling', 'description': 'Includes mechanisms to handle common errors such as invalid URLs and network issues.'}
  • {'name': 'Authentication Support', 'description': 'Supports optional authentication for secure access to the crawl4ai API.'}
  • {'name': 'Development Mode', 'description': 'Offers a development mode with auto-rebuild for easier testing and debugging.'}

MCP Tools

  • crawl_urls: Crawl web pages and get markdown content with citations. Requires a list of URLs.

Usage with Different Platforms

nodejs

bash
git clone https://github.com/Kirill812/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server
npm install
npm run build

configuration


{
  "mcpServers": {
    "crawl4ai": {
      "command": "node",
      "args": [
        "/path/to/crawl4ai-mcp-server/build/index.js"
      ],
      "env": {
        "CRAWL4AI_API_URL": "http://127.0.0.1:11235",
        "CRAWL4AI_AUTH_TOKEN": "your-auth-token"
      }
    }
  }
}

Frequently Asked Questions

What should I do if I encounter a timeout error?

Try reducing the number of URLs per request to avoid timeout errors.

How can I ensure my authentication token is valid?

Verify the token with the crawl4ai API service and ensure it has not expired.

What happens if a website blocks the crawling request?

The service will automatically handle retries with different user agents to bypass blocks.