KunihiroS_kv-extractor-mcp-server

KunihiroS_kv-extractor-mcp-server

0

The Flexible Key-Value Extracting MCP Server extracts key-value pairs from unstructured text using language models and pydantic-ai. It features autonomous key discovery, robust handling of complex inputs, and type-safe output in multiple formats. The server supports multilingual input and ensures consistent data extraction.

Flexible Key-Value Extracting MCP Server

Version: 0.3.1

This MCP server extracts key-value pairs from arbitrary, noisy, or unstructured text using LLMs (GPT-4.1-mini) and pydantic-ai. It ensures type safety and supports multiple output formats (JSON, YAML, TOML). The server provides advantages for key-value extraction from challenging real-world text:

  • Automatic Key Discovery: Autonomously identifies and extracts relevant key-value pairs from unstructured text without pre-defined keys.
  • Superior Robustness for Complex Inputs: Designed to handle arbitrary, noisy, or unstructured text with a multi-step pipeline.
  • Advanced Multi-Lingual Preprocessing: Utilizes spaCy for Named Entity Recognition in Japanese, English, and Chinese to enhance extraction accuracy.
  • Iterative Refinement and Typing: Employs LLM-based type annotation and evaluation, ensuring accurate data types.
  • Guaranteed Type Safety and Schema Adherence: Pydantic ensures type-safe and validated output.
  • Consistent and Predictable Output: Always returns a well-formed response.

Release Notes

v0.3.1

  • Update: Improve type evaluation prompt for robust correction.

v0.2.0

  • Fix: Lang code for zh-cn / zh-tw.

v0.1.0

  • Initial release

Tools

  • /extract_json: Extracts key-value pairs in JSON format.
  • /extract_yaml: Extracts key-value pairs in YAML format.
  • /extract_toml: Extracts key-value pairs in TOML format.

Note:

  • Supported languages: Japanese, English, and Chinese.
  • Extraction relies on pydantic-ai and LLMs. Perfect extraction is not guaranteed.

Features

  • Flexible extraction: Handles any input.
  • JP / EN / ZH-CN / ZH-TW full support: Uses spaCy NER for supported languages.
  • Type-safe output: Validated with Pydantic.
  • Multiple formats: JSON, YAML, or TOML output.
  • Robust error handling.
  • High accuracy: Utilizes GPT-4.1-mini and Pydantic.

Tested Scenarios

Tested with various inputs like simple key-value pairs and unstructured text.