mcp-qdrant-docs
mcp-qdrant-docs is a TypeScript-based server that scrapes and indexes website content into a Qdrant vector database. It allows users to query scraped content using natural language through dynamically named tools. The server can be configured and run using command-line options and environment variables.
mcp-qdrant-docs MCP Server
A Model Context Protocol server
This is a TypeScript-based MCP server that scrapes website content, indexes it into a Qdrant vector database, and provides a tool to answer questions about the indexed content.
使用例
ask_hono_docs: Hono.devのドキュメント内容について質問できます
ask_reactrouter_docs: ReactRouter.comのドキュメント内容について質問できます
ask_gradio_docs: Gradio.appのLLMsドキュメントについて質問できます
Features
Tool: ask_<doc name>_docs
- Name: Dynamically generated based on the
DOCS_URL
or--start-url
(e.g.,ask_reactrouter_docs
). - Functionality: Allows users to ask natural language questions about the content scraped from the specified website.
- Process:
- On startup (or if
force-reindex
is specified), the server scrapes the target website. - The scraped content is processed, chunked, and embedded using a sentence transformer model.
- These embeddings and content chunks are stored in a Qdrant collection specific to the website.
- When the tool is called with a query, the server embeds the query, searches Qdrant for relevant chunks, and returns the found content as the answer.
- On startup (or if
- Input: A natural language query (string).
- Output: Text containing the relevant content chunks found in the Qdrant index.
Installation
To use with Claude Desktop, add the server config:
On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
npm i -g @kazuph/mcp-qdrant-docs
Using npx
The recommended way to run this server is using npx
within your MCP client configuration (e.g., Claude Desktop's claude_desktop_config.json
). This avoids the need for global installation.
Example claude_desktop_config.json
using npx
:
{
"mcpServers": {
"react-router-docs": { // A unique name for this server instance
"command": "npx",
"args": [
"@kazuph/mcp-qdrant-docs", // The command registered in package.json bin
// Optional: Add command-line arguments here if needed
// "--start-url", "https://some-default-url.com/",
// "--debug"
],
// Optional: Set environment variables for configuration
"env": {
"DOCS_URL": "https://reactrouter.com/",
"QDRANT_URL": "http://your-qdrant-instance:6333",
"COLLECTION_NAME": "react-router-docs", // Base name for the collection
"EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2"
// "DEBUG": "true" // Alternative way to enable debug logging
}
}
// You can add more server instances for different documentation sites here
}
}
Command-Line Options
When running the server directly (e.g., using npx mcp-qdrant-docs
or npm run dev --
), you can use the following command-line options. These options override corresponding environment variables if both are set.
--start-url <url>
or-s <url>
:- Required (if
DOCS_URL
env var is not set). - The starting URL of the website to scrape.
- Overrides the
DOCS_URL
environment variable.
- Required (if
--limit <number>
or-l <number>
:- Maximum number of pages to scrape.
- Default:
300
.
--match <pattern>
or-m <pattern>
:- URL path patterns (prefix match) to limit scraping. Can be specified multiple times.
- Example:
--match /docs/ --match /api/
- Default: Scrapes all pages under the
start-url
domain.
--force-reindex
:- Force re-scraping and re-indexing even if the Qdrant collection already exists.
- Default:
false
.
--collection-name <name>
or-c <name>
:- Base name for the Qdrant collection. The final collection name will be
<base_name>-<sanitized_hostname>
. - Overrides the
COLLECTION_NAME
environment variable. - Default:
docs-collection
.
- Base name for the Qdrant collection. The final collection name will be
--qdrant-url <url>
:- URL of the Qdrant instance.
- Overrides the
QDRANT_URL
environment variable. - Default:
http://localhost:6333
.
--embedding-model <model_name>
:- Name of the sentence transformer model to use for embeddings (from Hugging Face or local).
- Overrides the
EMBEDDING_MODEL
environment variable. - Default:
Xenova/all-MiniLM-L6-v2
.
--debug
:- Enable detailed debug logging.
- Overrides the
DEBUG
environment variable (if set totrue
). - Default:
false
.
--help
or-h
:- Show the help message listing all options.
Example using command-line options:
npx @kazuph/mcp-qdrant-docs --start-url https://example-docs.com/ --collection-name my-docs --limit 50 --debug
Configuration Priority:
The server uses the following priority for settings:
- Command-line arguments: (e.g.,
--start-url
,--collection-name
) - Highest priority. - Environment variables: (e.g.,
DOCS_URL
,COLLECTION_NAME
) - Used if command-line arguments are not provided. - Default values: (Defined within the code) - Lowest priority.
Example: Adding React Router Documentation
To add a server instance specifically for querying React Router documentation, add the following entry to your mcpServers
configuration (e.g., in claude_desktop_config.json
):
{
"mcpServers": {
// ... other servers ...
"react-router-docs": {
"command": "npx", // Or the direct path if not installed globally
"args": [
"@kazuph/mcp-qdrant-docs"
// No need to specify --start-url etc. if using env vars
],
"env": {
"DOCS_URL": "https://reactrouter.com/",
"QDRANT_URL": "http://your-qdrant-instance:6333", // Replace with your Qdrant URL
"COLLECTION_NAME": "react-router-docs", // Base name, will become 'react-router-docs-reactrouter_com'
"EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2" // Or your preferred model
// "DEBUG": "true" // Enable debug logs if needed
}
}
// ... other servers ...
}
}
Resulting Tool:
Once this server instance is running and connected to your MCP client, it will provide a tool named similar to ask_reactrouter_docs
(not ask_reactrouter_com_docs
).
- Tool Name:
ask_<doc name>_docs
(e.g.,ask_reactrouter_docs
) - Description: Ask a question about the content of the site specified by
DOCS_URL
(or--start-url
). - Input: A natural language query about the documentation.
The server will automatically scrape the site (if the collection doesn't exist or --force-reindex
is used), index the content into the specified Qdrant collection (react-router-docs-reactrouter_com
in this example), and then use the index to answer your queries via the provided tool (ask_reactrouter_docs
).