README - Claude Voice MCP by gregmulvihill

Claude Voice MCP

⚠️ PRE-ALPHA WARNING ⚠️
This project is in pre-alpha stage. The content has been created conceptually but has not been tested. Proceed with caution as significant changes may occur before the first stable release.

MCP server implementation that enables voice conversations with Claude Desktop, initially focusing on Text-to-Speech (TTS) capabilities.

Project Overview

This project implements a Model-Centric Programming (MCP) server that extends Claude Desktop with voice conversation capabilities. The initial focus is on Text-to-Speech functionality, converting Claude's text responses into spoken audio.

Features

MCP server implementation compatible with Claude Desktop
Text-to-Speech conversion for Claude's responses
WebSocket-based real-time communication
Multiple language and voice support
Simple test client for verification

CogentEcho.ai Ecosystem Integration

This repository is part of the CogentEcho.ai ecosystem:

Strategic Layer: Orchestrate-AI - Strategic orchestration and business logic
Tactical Layer: Automated-Dev-Agents - Tactical task execution and agent management
Foundation Layer: Multi-Tiered Memory Architecture - Memory services for persistence
Tool Manager: MCP Manager - Manages Claude MCP servers, including this one

Development Roadmap

Phase 1 (Current): Text-to-Speech Implementation
- Basic MCP server setup
- Text-to-Speech integration
- Configuration options for voice selection
Phase 2 (Future): Speech-to-Text Implementation
- Audio capture and processing
- Speech recognition integration
- Full duplex conversation support

Getting Started

Prerequisites

Node.js 18.x or higher
Claude Desktop application
Web browser for testing the test client

Installation

# Clone the repository
git clone https://github.com/gregmulvihill/claude-voice-mcp.git

# Navigate to project directory
cd claude-voice-mcp

# Install dependencies
npm install

# Copy environment example and modify as needed
cp .env.example .env

# Start the server
npm start

Testing the MCP Server

The repository includes a simple test client to verify the functionality of the MCP server:

Start the MCP server using npm start
Open the test-client/index.html file in a web browser
Connect to the MCP server using the default WebSocket URL (ws://localhost:3000/api/v1/ws)
Enter text and click "Generate Speech" to test the TTS functionality

Development

Branch Protection

The main branch is protected and requires pull requests with at least one approval before merging. This ensures code quality and proper review of all changes.

Development Workflow

Fork the repository
Create a feature branch
Make your changes
Submit a pull request for review
Address any feedback
Your changes will be merged after approval

Integrating with Claude Desktop

Claude Desktop supports connection to MCP servers for enhanced functionality. To integrate this voice MCP server with Claude Desktop:

Method 1: Using the Claude Desktop UI

Open Claude Desktop
Go to Settings > Extensions
Click "Add MCP Server"
Enter the server URL: http://localhost:3000/api/v1
Click "Connect" and follow the authentication prompts if required

Method 2: Using the Command Line

If your Claude Desktop application supports command-line installation of MCP servers:

# Run the MCP server
npm start

# In a separate terminal, use the Claude Desktop CLI to add the MCP server
claude-desktop extensions add --url=http://localhost:3000/api/v1 --name="Claude Voice"

Method 3: Using npx (For Development)

For development and testing purposes, you can install the MCP server directly in Claude Desktop:

cd claude-voice-mcp
npm run build

# Install the server into Claude Desktop
npx @anthropic/claude-desktop-mcp install --path=./dist

Verification

After installation, verify the integration:

In Claude Desktop, go to Settings > Extensions
Confirm "Claude Voice" is listed and shows "Connected" status
Start a conversation with Claude
Click the voice icon that appears in the interface to activate voice output

Technical Architecture

The MCP server acts as an intermediary between Claude Desktop and voice processing services:

Claude Desktop sends text responses to the MCP server
The MCP server processes the text through a TTS engine
Audio is streamed back to Claude Desktop for playback

The primary components are:

MCP Protocol Implementation: Handles API endpoints and WebSocket communication
TTS Service: Processes text into speech using Google's TTS API
Session Management: Maintains connection state and client information

API Documentation

REST Endpoints

GET /api/v1/info: Returns information about the MCP server
GET /api/v1/health: Health check endpoint
POST /api/v1/register: Registers a client with the MCP server
GET /api/v1/tts/config: Returns TTS configuration options
POST /api/v1/tts: Processes a TTS request

WebSocket Messages

Client to Server:
- tts_request: Request to convert text to speech
- tts_cancel: Cancel an in-progress TTS request
- ping: Keepalive message
Server to Client:
- tts_response: Response containing audio data
- tts_status: Status updates for TTS processing
- error: Error messages
- pong: Response to ping messages

Troubleshooting

If you encounter issues with the MCP server:

Connection Issues:
- Verify the server is running (npm start)
- Check that the port (default 3000) is not blocked by a firewall
- Ensure Claude Desktop has permission to connect to local servers
TTS Issues:
- Check server logs for specific error messages
- Verify internet connectivity (required for Google TTS API)
- Try with shorter text samples to isolate problems
Integration Issues:
- Restart both the MCP server and Claude Desktop
- Check Claude Desktop logs for connection errors
- Verify the server URL is correctly configured

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.