Build and deploy self-hosted Text-to-Speech API using MeloTTS from Hugging Face and create a LiveKit plugin for voice agents. Use this skill when building TTS systems, LiveKit voice agents, or self-hosted speech synthesis solutions.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: tts-livekit-plugin description: Build and deploy self-hosted Text-to-Speech API using MeloTTS from Hugging Face and create a LiveKit plugin for voice agents. Use this skill when building TTS systems, LiveKit voice agents, or self-hosted speech synthesis solutions.
TTS LiveKit Plugin Skill
This skill provides a complete solution for building self-hosted Text-to-Speech (TTS) systems integrated with LiveKit voice agents.
What This Skill Does
-
Creates a Self-Hosted TTS API Server
- FastAPI-based REST API
- Uses MeloTTS model from Hugging Face
- Supports streaming audio responses
- Multi-language and multi-voice support
- Production-ready with proper error handling
-
Builds a LiveKit TTS Plugin
- Fully compatible with LiveKit agents framework
- Implements standard TTS interface
- Streaming audio support
- Proper error handling and retries
- Drop-in replacement for commercial TTS providers
-
Provides Complete Testing
- Comprehensive test suite for API
- Integration tests for plugin
- No mocked functions - all real implementations
- Performance and concurrency tests
-
Includes Full Documentation
- API documentation with examples
- Plugin usage guide
- Deployment guide for production
- Multiple usage examples
Components
API Server (api/)
- server.py: FastAPI server with MeloTTS integration
- requirements.txt: Python dependencies
- Endpoints:
GET /: Health checkGET /voices: List available voicesPOST /synthesize: Full audio synthesisPOST /synthesize/stream: Streaming synthesis
LiveKit Plugin (plugin/)
- melotts_plugin.py: LiveKit TTS plugin implementation
- Extends
livekit.agents.tts.TTSbase class - Implements
ChunkedStreamfor audio streaming - Uses aiohttp for HTTP requests
- Proper exception handling (APIConnectionError, APITimeoutError, APIStatusError)
Tests (tests/)
-
test_api.py: API server tests
- Health checks
- Voice listing
- Synthesis (streaming and non-streaming)
- Multiple languages
- Error handling
- Concurrency
-
test_plugin.py: Plugin integration tests
- Plugin initialization
- Synthesis with real API
- Multiple languages
- Error handling
- Concurrency
- Timeout handling
Examples (examples/)
- test_api_client.py: Standalone API testing script
- simple_agent.py: Basic LiveKit agent example
- voice_assistant.py: Complete voice assistant implementation
Documentation (docs/)
- API.md: Complete API reference
- PLUGIN.md: Plugin usage guide
- DEPLOYMENT.md: Production deployment guide
Quick Start
1. Start the TTS API Server
cd api
pip install -r requirements.txt
python -m unidic download
python server.py
Server runs on http://localhost:8000
2. Test the API
cd examples
python test_api_client.py
3. Use in LiveKit Agent
from melotts_plugin import TTS
tts = TTS(
api_base_url="http://localhost:8000",
language="EN",
speaker="EN-US",
speed=1.0
)
stream = tts.synthesize("Hello from LiveKit!")
Features
- ✅ Self-hosted (no external API dependencies)
- ✅ High-quality natural speech (MeloTTS)
- ✅ 6 languages: English, Spanish, French, Chinese, Japanese, Korean
- ✅ Multiple voices per language
- ✅ Streaming audio for low latency
- ✅ CPU-friendly (optimized for real-time inference)
- ✅ GPU support (automatic if available)
- ✅ LiveKit agents framework compatible
- ✅ Production-ready error handling
- ✅ Comprehensive test coverage
- ✅ Full documentation
Architecture
┌─────────────────┐ HTTP POST ┌──────────────────┐
│ LiveKit Agent │ ──────────────────► │ TTS API │
│ │ │ Server │
│ ┌───────────┐ │ │ │
│ │ MeloTTS │ │ Audio Stream │ ┌────────────┐ │
│ │ Plugin │ │ ◄────────────────── │ │ MeloTTS │ │
│ └───────────┘ │ (WAV chunks) │ │ Model │ │
└─────────────────┘ │ └────────────┘ │
└──────────────────┘
Why MeloTTS?
- High Quality: Natural-sounding speech
- Fast: Optimized for real-time inference
- CPU-Friendly: Works well even without GPU
- Multi-lingual: 6 languages supported
- Low Latency: ~150-200ms TTFB
- Open Source: Free to use and modify
Performance
- Latency: 150-200ms time-to-first-byte
- CPU Usage: Optimized for real-time on CPUs
- GPU Support: Automatic acceleration if available
- Streaming: Chunked delivery for low latency
- Concurrent Requests: Handles multiple simultaneous requests
Supported Languages
| Language | Code | Speakers |
|---|---|---|
| English | EN | EN-US, EN-BR, EN-AU, EN-IN |
| Spanish | ES | ES |
| French | FR | FR |
| Chinese | ZH | ZH |
| Japanese | JP | JP |
| Korean | KR | KR |
Testing
All tests use real implementations - no mocks:
# Start API server
cd api && python server.py
# Run API tests
cd tests && pytest test_api.py -v
# Run plugin tests
cd tests && pytest test_plugin.py -v
Deployment
Multiple deployment options:
- Standalone: Run directly with Python/Uvicorn
- Docker: Containerized deployment
- Kubernetes: Scalable cloud deployment
- Cloud: AWS, GCP, Azure support
See docs/DEPLOYMENT.md for detailed guides.
Integration with LiveKit
The plugin is a drop-in replacement for other TTS providers:
# Instead of:
# from livekit.plugins import openai
# tts = openai.TTS()
# Use:
from melotts_plugin import TTS
tts = TTS(api_base_url="http://localhost:8000")
# Same interface, self-hosted!
Use Cases
- Voice assistants
- Interactive voice response (IVR) systems
- Accessibility tools
- Educational applications
- Multilingual customer service bots
- Real-time voice agents
- Live streaming with voice synthesis
Requirements
API Server:
- Python 3.9+
- 2GB+ RAM
- FastAPI, MeloTTS, Uvicorn
- Optional: GPU for faster inference
LiveKit Plugin:
- Python 3.9+
- livekit-agents >= 0.8.0
- aiohttp >= 3.9.0
Security
For production:
- Add API authentication
- Enable HTTPS/TLS
- Implement rate limiting
- Configure CORS
- Set up monitoring
See docs/DEPLOYMENT.md#security for details.
When to Use This Skill
Use this skill when you need to:
- Build a self-hosted TTS solution
- Create LiveKit voice agents with custom TTS
- Avoid commercial TTS API costs
- Have full control over voice synthesis
- Support multiple languages
- Deploy TTS in private/air-gapped environments
- Build voice assistants
- Integrate TTS into existing applications
Troubleshooting
Server won't start:
- Run
python -m unidic download - Check port 8000 is available
- Verify dependencies installed
Plugin connection errors:
- Ensure API server is running
- Check
api_base_urlconfiguration - Verify network connectivity
Audio quality issues:
- Try different voices/speakers
- Adjust speed parameter
- Check sample rate configuration
See documentation for more troubleshooting tips.
Resources
License
Apache 2.0 License
Support
- Check the documentation in
docs/ - Review examples in
examples/ - Run the test suite to verify setup
- Check logs for error messages
This skill provides everything needed for production-ready, self-hosted TTS with LiveKit integration. All code is fully functional with no mocks or placeholders.
More by Okeysir198
View allComprehensive guide for building functional tools for LiveKit voice agents using the @function_tool decorator. Use when creating tools for LiveKit agents to enable capabilities like API calls, database queries, multi-agent coordination, or any external integrations. Covers tool design, RunContext handling, interruption patterns, parameter documentation, testing, and production best practices.
Build self-hosted speech-to-text APIs using Hugging Face models (Whisper, Wav2Vec2) and create LiveKit voice agent plugins. Use when building STT infrastructure, creating custom LiveKit plugins, deploying self-hosted transcription services, or integrating Whisper/HF models with LiveKit agents. Includes FastAPI server templates, LiveKit plugin implementation, model selection guides, and production deployment patterns.
Guide for creating effective prompts and instructions for LiveKit voice agents. Use when building conversational AI agents with the LiveKit Agents framework, including (1) Creating new voice agent prompts from scratch, (2) Improving existing agent instructions, (3) Optimizing prompts for text-to-speech output, (4) Integrating tool/function calling capabilities, (5) Building multi-agent systems with handoffs, (6) Ensuring voice-friendly formatting and brevity for natural conversations, (7) Iteratively improving prompts based on testing and feedback, (8) Building industry-specific agents (debt collection, healthcare, banking, customer service, front desk).
Build self-hosted TTS APIs using HuggingFace models (Parler-TTS, F5-TTS, XTTS-v2) and create LiveKit voice agent plugins with streaming support. Use when creating production-ready text-to-speech systems that need: (1) Self-hosted TTS with full control, (2) LiveKit voice agent integration, (3) Streaming audio for low-latency conversations, (4) Custom voice characteristics, (5) Cost-effective alternatives to cloud TTS providers like ElevenLabs or Google TTS.
