databricks-mcp-server
This project sets up an MCP server for executing SQL queries on Databricks, facilitating data retrieval and complex task execution through APIs. It supports features like listing schemas, tables, and views, as well as describing table schemas and extracting SQL definitions.
Databricks MCP Server
This is a Model Context Protocol (MCP) server for executing SQL queries against Databricks using the Statement Execution API. It can retrieve data by performing SQL requests using the Databricks API. When used in an Agent mode, it can successfully iterate over a number of requests to perform complex tasks. It is even better when coupled with Unity Catalog Metadata.
Features
- Execute SQL queries on Databricks
- List available schemas in a catalog
- List tables in a schema
- Describe table schemas
- List views in a catalog and schema
- Get SQL definitions of views
Setup
System Requirements
- Python 3.10+
- If you plan to install via
uv
, ensure it's installed
Installation
- Clone this repository:
git clone https://github.com/yourusername/mcp-databricks-server.git
cd mcp-databricks-server
- Install the required dependencies:
pip install -r requirements.txt
Or if using uv
:
uv pip install -r requirements.txt
-
Set up your environment variables:
Option 1: Using a .env file (recommended)
Create a .env file with your Databricks credentials:
DATABRICKS_HOST=your-databricks-instance.cloud.databricks.com DATABRICKS_TOKEN=your-databricks-access-token DATABRICKS_SQL_WAREHOUSE_ID=your-sql-warehouse-id
Option 2: Setting environment variables directly
export DATABRICKS_HOST="https://adb-5769108933149883.3.azuredatabricks.net" export DATABRICKS_TOKEN="your-databricks-access-token" export DATABRICKS_SQL_WAREHOUSE_ID="aebcaaff2f9457a3
You can find your SQL warehouse ID in the Databricks UI under SQL Warehouses.
Troubleshooting Setup Issues
Missing Dependencies
If you encounter "ModuleNotFoundError" when starting the server, ensure all dependencies are installed:
pip install databricks-sdk httpx python-dotenv mcp asyncio
Authentication Issues
If you encounter authentication errors:
- Verify your token has the correct permissions in Databricks
- Ensure the host URL is correct (should include the full domain)
- Check that your SQL warehouse ID is valid and active
SQL Warehouse Not Running
If queries fail to execute:
- Ensure your SQL warehouse is running in the Databricks UI
- The warehouse may take a few minutes to start if it was idle
Permissions Requirements
Before using this MCP server, ensure that:
-
SQL Warehouse Permissions: The user associated with the provided token must have appropriate permissions to access the specified SQL warehouse. You can configure warehouse permissions in the Databricks UI under SQL Warehouses > [Your Warehouse] > Permissions.
-
Token Permissions: The personal access token used should have the minimum necessary permissions to perform the required operations. It is strongly recommended to:
- Create a dedicated token specifically for this application
- Grant read-only permissions where possible to limit security risks
- Avoid using tokens with workspace-wide admin privileges
-
Data Access Permissions: The user associated with the token must have appropriate permissions to access the catalogs, schemas, and tables that will be queried.
To set SQL warehouse permissions via the Databricks REST API, you can use:
GET /api/2.0/sql/permissions/warehouses/{warehouse_id}
to check current permissionsPATCH /api/2.0/sql/permissions/warehouses/{warehouse_id}
to update permissions
For security best practices, consider regularly rotating your access tokens and auditing query history to monitor usage.
Running the Server
Standalone Mode
To run the server in standalone mode:
python main.py
This will start the MCP server using stdio transport, which can be used with Agent Composer or other MCP clients.
Using with Cursor
To use this MCP server with Cursor, you need to configure it in your Cursor settings:
- Create a
.cursor
directory in your home directory if it doesn't already exist - Create or edit the
mcp.json
file in that directory:
mkdir -p ~/.cursor
touch ~/.cursor/mcp.json
- Add the following configuration to the
mcp.json
file, replacing the directory path with the actual path to where you've installed this server:
{
"mcpServers": {
"databricks": {
"command": "uv",
"args": [
"--directory",
"/path/to/your/mcp-databricks-server",
"run",
"main.py"
]
}
}
}
If you're not using uv
, you can use python
instead:
{
"mcpServers": {
"databricks": {
"command": "python",
"args": [
"/path/to/your/mcp-databricks-server/main.py"
]
}
}
}
- Restart Cursor to apply the changes
Now you can use the Databricks MCP server directly within Cursor's AI assistant.
Available Tools
The server provides the following tools:
-
execute_sql_query
: Execute a SQL query and return the resultsexecute_sql_query(sql: str) -> str
-
list_schemas
: List all available schemas in a specific cataloglist_schemas(catalog: str) -> str
-
list_tables
: List all tables in a specific schemalist_tables(schema: str) -> str
-
describe_table
: Describe a table's schemadescribe_table(table_name: str) -> str
-
list_views
: List all views in a specific catalog and schemalist_views(catalog_name: str, schema_name: str) -> str
-
get_view_definition
: Get the SQL DDL definition for a specific viewget_view_definition(view_name: str) -> str
Example Usage
In Agent Composer or other MCP clients, you can use these tools like:
execute_sql_query("SELECT * FROM my_schema.my_table LIMIT 10")
list_schemas("my_catalog")
list_tables("my_catalog.my_schema")
describe_table("my_catalog.my_schema.my_table")
list_views("my_catalog", "my_schema")
get_view_definition("my_catalog.my_schema.my_view")
Use Case: Migrating Views to dbt
This server is particularly useful for migrating views from Databricks to dbt:
- Use
list_views
to discover all views in your catalog and schema - For each view, use
get_view_definition
to retrieve the SQL definition - Transform the SQL as needed and save it into your dbt project structure
Handling Long-Running Queries
The server is designed to handle long-running queries by polling the Databricks API until the query completes or times out. The default timeout is 10 minutes (60 retries with 10-second intervals), which can be adjusted in the dbapi.py
file if needed.
Dependencies
- databricks-sdk: For interacting with the Databricks Workspace API
- httpx: For making HTTP requests to the Databricks API
- python-dotenv: For loading environment variables from .env file
- mcp: The Model Context Protocol library
- asyncio: For asynchronous operations
Development
Adding New Tools
If you need to add new tools:
- Add new functions to
dbapi.py
to interact with the Databricks API - Register the new tools in
main.py
by adding them to themcp_tools
list - Update the documentation in this README
Testing
To test your changes:
- Make sure your .env file is properly configured
- Run the server:
python main.py
- Try using the tools with a MCP client