KunihiroS_kv-extractor-mcp-server
0
The Flexible Key-Value Extracting MCP Server extracts key-value pairs from unstructured text using language models and pydantic-ai. It features autonomous key discovery, robust handling of complex inputs, and type-safe output in multiple formats. The server supports multilingual input and ensures consistent data extraction.
Flexible Key-Value Extracting MCP Server
Version: 0.3.1
This MCP server extracts key-value pairs from arbitrary, noisy, or unstructured text using LLMs (GPT-4.1-mini) and pydantic-ai. It ensures type safety and supports multiple output formats (JSON, YAML, TOML). The server provides advantages for key-value extraction from challenging real-world text:
- Automatic Key Discovery: Autonomously identifies and extracts relevant key-value pairs from unstructured text without pre-defined keys.
- Superior Robustness for Complex Inputs: Designed to handle arbitrary, noisy, or unstructured text with a multi-step pipeline.
- Advanced Multi-Lingual Preprocessing: Utilizes spaCy for Named Entity Recognition in Japanese, English, and Chinese to enhance extraction accuracy.
- Iterative Refinement and Typing: Employs LLM-based type annotation and evaluation, ensuring accurate data types.
- Guaranteed Type Safety and Schema Adherence: Pydantic ensures type-safe and validated output.
- Consistent and Predictable Output: Always returns a well-formed response.
Release Notes
v0.3.1
- Update: Improve type evaluation prompt for robust correction.
v0.2.0
- Fix: Lang code for zh-cn / zh-tw.
v0.1.0
- Initial release
Tools
/extract_json
: Extracts key-value pairs in JSON format./extract_yaml
: Extracts key-value pairs in YAML format./extract_toml
: Extracts key-value pairs in TOML format.
Note:
- Supported languages: Japanese, English, and Chinese.
- Extraction relies on pydantic-ai and LLMs. Perfect extraction is not guaranteed.
Features
- Flexible extraction: Handles any input.
- JP / EN / ZH-CN / ZH-TW full support: Uses spaCy NER for supported languages.
- Type-safe output: Validated with Pydantic.
- Multiple formats: JSON, YAML, or TOML output.
- Robust error handling.
- High accuracy: Utilizes GPT-4.1-mini and Pydantic.
Tested Scenarios
Tested with various inputs like simple key-value pairs and unstructured text.