gwas-catalog-mcp

gwas-catalog-mcp

0

This project is an MCP server offering a programmatic interface to the GWAS Catalog REST API, enabling access to genetic study data for research purposes. It handles large datasets with flexible storage options and supports various data retrieval endpoints under active development.

GWAS Catalog MCP Server

Overview

This MCP server provides a programmatic interface to the GWAS Catalog REST API, enabling access to GWAS study, variant, trait, and association data. The server handles large result sets automatically by providing both in-memory results and file-based storage options.

Status

🚧 Under Active Development 🚧

This project is currently under active development. Features and APIs may change without notice.

Dependencies

  • uv
  • mcp[cli]
  • fastmcp
  • requests

Directory Structure

.
├── server.py             # Main FastMCP server entrypoint
├── utils.py              # Utility functions
├── pyproject.toml        # Project metadata and dependencies
├── README.md             # Usage and documentation
├── tests/                # Test suite and test data
│   ├── run_tests.py
│   ├── input/
│   └── output/
│       ├── success/
│       └── error/
└── ...

Setup and Running

Install dependencies

uv sync

Activate the virtual environment

. .venv/bin/activate

Run the MCP server

uv run server.py

Run tests

python tests/run_tests.py

MCP Tool Specification

Tool name

  • GWAS_catalog

Common Parameters

Most tools support the following common parameters:

ParameterTypeDefaultDescription
max_items_in_memoryint5000Maximum number of items to return in memory
force_to_fileboolFalseForce writing results to file regardless of size
output_dirstr"/tmp"Directory for file output when results exceed limit
force_no_fileboolFalseNever write results to file
remove_linksboolTrueRemove '_links' fields from API responses

Tool Endpoints and Parameters

Get study
ParameterTypeRequiredDescriptionExample
studyIdstrYesGWAS Catalog study identifier"GCST000001"
remove_linksboolNoRemove '_links' fields (default: True)
Get association
ParameterTypeRequiredDescriptionExample
associationIdstrYesGWAS Catalog association identifier"123456"
remove_linksboolNoRemove '_links' fields (default: True)
Get variant
ParameterTypeRequiredDescriptionExample
variantIdstrYesVariant identifier (e.g., rsID)"rs123"
remove_linksboolNoRemove '_links' fields (default: True)
Get trait
ParameterTypeRequiredDescriptionExample
efoIdstrYesEFO trait identifier"EFO_0000305"
remove_linksboolNoRemove '_links' fields (default: True)
Search variants in region
ParameterTypeRequiredDescriptionExample
chromosomestrYesChromosome (e.g., "1")"1"
startintYesStart position (GRCh38/hg38)1000000
endintYesEnd position (GRCh38/hg38)2000000
efo_idstrNoEFO trait identifier"EFO_0008531"
...commonSee common parameters above
Get variants from EFO IDs
ParameterTypeRequiredDescriptionExample
efo_idslistYesList of EFO trait identifiers["EFO_0000305", "EFO_0000310"]
...commonSee common parameters above
Trait variant ranking
ParameterTypeRequiredDescriptionExample
efo_idstrYesEFO trait identifier"EFO_0008531"
top_nintNoNumber of top records to return (default: 10)10
...commonSee common parameters above
Get study associations
ParameterTypeRequiredDescriptionExample
studyIdstrYesGWAS Catalog study identifier"GCST000001"
...commonSee common parameters above
Get trait studies
ParameterTypeRequiredDescriptionExample
efoIdstrYesEFO trait identifier"EFO_0000305"
...commonSee common parameters above
Get trait associations
ParameterTypeRequiredDescriptionExample
efoIdstrYesEFO trait identifier"EFO_0000305"
...commonSee common parameters above
Get associations from variant (uses GWAS Catalog REST API)
ParameterTypeRequiredDescriptionExample
variantIdstrYesVariant identifier"rs112735431"
...commonSee common parameters above

Note: This method returns all associations for a variant, including an is_gwas_significant flag indicating if the p-value meets the genome-wide significance threshold (p ≤ 5e-8). Basically, the MCP server will only return if is_gwas_significant is True.

Get region-trait associations (uses GWAS Summary Statistics API)
ParameterTypeRequiredDescriptionExample
chromosomestrYesChromosome (e.g., "1")"1"
startintYesStart position (base-pair)1000000
endintYesEnd position (base-pair)2000000
efo_idstrYesEFO trait identifier"EFO_0008531"
...commonSee common parameters above

Note: Endpoints marked as "uses GWAS Summary Statistics API" access https://www.ebi.ac.uk/gwas/summary-statistics/api instead of the main REST API.

Output Format

All API responses follow a consistent structure:

{
  "request_url": "https://www.ebi.ac.uk/gwas/rest/api/...",
  "items": [...],  // List of results, limited by max_items_in_memory
  "total_items_aft_process": 123,  // Total number of results after processing
  "is_complete": true,  // Whether all results are included in items
  "metadata": {
    "subset_size": 100,  // Number of items in the current response (after using max_items_in_memory parameter)
    "max_items_in_memory": 5000,  // Current memory threshold
    "total_items": 150,  // Total number of items before processing
    "significant_items": 80  // Number of genome-wide significant items (if applicable)
  }
}
Large Result Sets

When results exceed max_items_in_memory:

  1. A subset of results is returned in the items field
  2. is_complete will be False
  3. The complete dataset is automatically saved to a file
  4. The response includes an output_file field with the file path

Example large result response:

{
  "request_url": "...",
  "items": [...],  // First max_items_in_memory results
  "total_items_aft_process": 10000,
  "is_complete": false,
  "metadata": {
    "subset_size": 5000,
    "max_items_in_memory": 5000,
    "total_items": 12000,  // Original number of items
    "significant_items": 8000,  // Number of genome-wide significant items
    "output_file": "/tmp/large_result_abc123.json"
  }
}

IMPORTANT:

  • Always check the is_complete and output_file fields. If is_complete is false, only a subset of results is in items and the full result is saved to the file specified by output_file.
  • For endpoints that process p-values (e.g., associations), total_items represents the original count, while total_items_aft_process represents the count after filtering.
  • Study-related endpoints do not include p-value related metadata (significant_items).
Special Output Notes
  • get_trait_associations may return a list of association IDs or, if the response format is unexpected, the raw association data structure.
  • Some endpoints (notably those using the summary-statistics API) may return a single object in items if only one result is found.

Credits

This tool relies on the GWAS Catalog REST API and GWAS Summary Statistics API.

Please cite and credit the GWAS Catalog and each study when using this tool in your work.

License

This MCP server itself is licensed under the Apache License 2.0 - see the file for details.

This project uses the GWAS Catalog REST API and data provided by EMBL-EBI. Please ensure you cite the GWAS Catalog and the original studies when using this tool or its outputs. See the GWAS Catalog Terms of Use for details.

Acknowledgements