data-dictionary-mcp

data-dictionary-mcp

0

The Data Dictionary MCP is a Model Context Protocol server designed to transform database tables into Wikipedia-style data dictionaries. It coordinates AI agents to analyze, describe, and verify database structures, supporting multiple formats like JSON and CSV.

Data Dictionary MCP

A Model Context Protocol (MCP) server that coordinates AI agents to transform database tables into Wikipedia-style data dictionaries.

Overview

The Data Dictionary MCP project automates the conversion of various database formats into comprehensive, human-readable data dictionaries using AI-powered analysis and description. It leverages the Model Context Protocol (MCP) to coordinate AI agents for analyzing, describing, and verifying database structures.

Features

  • Multi-Format Support: Process JSON, CSV, and Plain Text files (with more formats planned)
  • AI-Powered Analysis: Generate field descriptions and identify relationships
  • MCP Integration: Coordinate AI agents using the Model Context Protocol
  • Schema Extraction: Extract database schemas from various formats into a unified representation
  • Wikipedia-Style Output: Present data dictionaries in a familiar, accessible format

Project Status

This project is in active development. See the Project Roadmap for details.

Getting Started

Prerequisites

  • Python 3.9+
  • Git
  • pip or poetry for dependency management

Installation

  1. Clone the repository:

    git clone https://github.com/jonahkeegan/data-dictionary-mcp.git
    cd data-dictionary-mcp
    
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Run the application:

    python src/main.py
    

Project Structure

data-dictionary-mcp/
├── docs/                  # Documentation
├── src/                   # Source code
│   ├── mcp/               # MCP server components
│   ├── analyzers/         # Format analyzers
│   ├── agents/            # Agent coordination
│   └── dictionary/        # Dictionary generation
├── tests/                 # Test suite
├── memory-bank/           # Cline memory bank
├── .gitignore
├── .clinerules            # Cline rules
├── README.md
└── requirements.txt

Project Roadmap

Milestone 1: MCP Server Foundation and Format Analyzers

  • Implement MCP server with basic tool definitions
  • Develop format analyzers for JSON, CSV, and Plain Text
  • Create schema extraction system
  • Implement unit tests for core components

Milestone 2: AI Agent Coordination and Field Description

  • Implement agent coordination system
  • Develop field description generation
  • Create task distribution and result aggregation
  • Add integration tests

Milestone 3: Content Verification and Publishing

  • Implement content validation
  • Develop Wikipedia-style formatting
  • Create export capabilities
  • Add end-to-end tests

Milestone 4: User Interface and Deployment

  • Develop web interface
  • Implement search capabilities
  • Add user feedback system
  • Create deployment infrastructure

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is open source and available under the .