Agent SkillsAgent Skills
Okeysir198

tts-livekit-plugin

@Okeysir198/tts-livekit-plugin
Okeysir198
1
3 forks
Updated 4/12/2026
View on GitHub

Build and deploy self-hosted Text-to-Speech API using MeloTTS from Hugging Face and create a LiveKit plugin for voice agents. Use this skill when building TTS systems, LiveKit voice agents, or self-hosted speech synthesis solutions.

Installation

$npx agent-skills-cli install @Okeysir198/tts-livekit-plugin
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathtts-livekit-plugin/SKILL.md
Branchmain
Scoped Name@Okeysir198/tts-livekit-plugin

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions


name: tts-livekit-plugin description: Build and deploy self-hosted Text-to-Speech API using MeloTTS from Hugging Face and create a LiveKit plugin for voice agents. Use this skill when building TTS systems, LiveKit voice agents, or self-hosted speech synthesis solutions.

TTS LiveKit Plugin Skill

This skill provides a complete solution for building self-hosted Text-to-Speech (TTS) systems integrated with LiveKit voice agents.

What This Skill Does

  1. Creates a Self-Hosted TTS API Server

    • FastAPI-based REST API
    • Uses MeloTTS model from Hugging Face
    • Supports streaming audio responses
    • Multi-language and multi-voice support
    • Production-ready with proper error handling
  2. Builds a LiveKit TTS Plugin

    • Fully compatible with LiveKit agents framework
    • Implements standard TTS interface
    • Streaming audio support
    • Proper error handling and retries
    • Drop-in replacement for commercial TTS providers
  3. Provides Complete Testing

    • Comprehensive test suite for API
    • Integration tests for plugin
    • No mocked functions - all real implementations
    • Performance and concurrency tests
  4. Includes Full Documentation

    • API documentation with examples
    • Plugin usage guide
    • Deployment guide for production
    • Multiple usage examples

Components

API Server (api/)

  • server.py: FastAPI server with MeloTTS integration
  • requirements.txt: Python dependencies
  • Endpoints:
    • GET /: Health check
    • GET /voices: List available voices
    • POST /synthesize: Full audio synthesis
    • POST /synthesize/stream: Streaming synthesis

LiveKit Plugin (plugin/)

  • melotts_plugin.py: LiveKit TTS plugin implementation
  • Extends livekit.agents.tts.TTS base class
  • Implements ChunkedStream for audio streaming
  • Uses aiohttp for HTTP requests
  • Proper exception handling (APIConnectionError, APITimeoutError, APIStatusError)

Tests (tests/)

  • test_api.py: API server tests

    • Health checks
    • Voice listing
    • Synthesis (streaming and non-streaming)
    • Multiple languages
    • Error handling
    • Concurrency
  • test_plugin.py: Plugin integration tests

    • Plugin initialization
    • Synthesis with real API
    • Multiple languages
    • Error handling
    • Concurrency
    • Timeout handling

Examples (examples/)

  • test_api_client.py: Standalone API testing script
  • simple_agent.py: Basic LiveKit agent example
  • voice_assistant.py: Complete voice assistant implementation

Documentation (docs/)

  • API.md: Complete API reference
  • PLUGIN.md: Plugin usage guide
  • DEPLOYMENT.md: Production deployment guide

Quick Start

1. Start the TTS API Server

cd api
pip install -r requirements.txt
python -m unidic download
python server.py

Server runs on http://localhost:8000

2. Test the API

cd examples
python test_api_client.py

3. Use in LiveKit Agent

from melotts_plugin import TTS

tts = TTS(
    api_base_url="http://localhost:8000",
    language="EN",
    speaker="EN-US",
    speed=1.0
)

stream = tts.synthesize("Hello from LiveKit!")

Features

  • ✅ Self-hosted (no external API dependencies)
  • ✅ High-quality natural speech (MeloTTS)
  • ✅ 6 languages: English, Spanish, French, Chinese, Japanese, Korean
  • ✅ Multiple voices per language
  • ✅ Streaming audio for low latency
  • ✅ CPU-friendly (optimized for real-time inference)
  • ✅ GPU support (automatic if available)
  • ✅ LiveKit agents framework compatible
  • ✅ Production-ready error handling
  • ✅ Comprehensive test coverage
  • ✅ Full documentation

Architecture

┌─────────────────┐      HTTP POST       ┌──────────────────┐
│  LiveKit Agent  │ ──────────────────►  │   TTS API        │
│                 │                       │   Server         │
│  ┌───────────┐  │                       │                  │
│  │ MeloTTS   │  │   Audio Stream       │  ┌────────────┐  │
│  │ Plugin    │  │ ◄──────────────────  │  │  MeloTTS   │  │
│  └───────────┘  │    (WAV chunks)      │  │  Model     │  │
└─────────────────┘                       │  └────────────┘  │
                                          └──────────────────┘

Why MeloTTS?

  • High Quality: Natural-sounding speech
  • Fast: Optimized for real-time inference
  • CPU-Friendly: Works well even without GPU
  • Multi-lingual: 6 languages supported
  • Low Latency: ~150-200ms TTFB
  • Open Source: Free to use and modify

Performance

  • Latency: 150-200ms time-to-first-byte
  • CPU Usage: Optimized for real-time on CPUs
  • GPU Support: Automatic acceleration if available
  • Streaming: Chunked delivery for low latency
  • Concurrent Requests: Handles multiple simultaneous requests

Supported Languages

LanguageCodeSpeakers
EnglishENEN-US, EN-BR, EN-AU, EN-IN
SpanishESES
FrenchFRFR
ChineseZHZH
JapaneseJPJP
KoreanKRKR

Testing

All tests use real implementations - no mocks:

# Start API server
cd api && python server.py

# Run API tests
cd tests && pytest test_api.py -v

# Run plugin tests
cd tests && pytest test_plugin.py -v

Deployment

Multiple deployment options:

  1. Standalone: Run directly with Python/Uvicorn
  2. Docker: Containerized deployment
  3. Kubernetes: Scalable cloud deployment
  4. Cloud: AWS, GCP, Azure support

See docs/DEPLOYMENT.md for detailed guides.

Integration with LiveKit

The plugin is a drop-in replacement for other TTS providers:

# Instead of:
# from livekit.plugins import openai
# tts = openai.TTS()

# Use:
from melotts_plugin import TTS
tts = TTS(api_base_url="http://localhost:8000")

# Same interface, self-hosted!

Use Cases

  • Voice assistants
  • Interactive voice response (IVR) systems
  • Accessibility tools
  • Educational applications
  • Multilingual customer service bots
  • Real-time voice agents
  • Live streaming with voice synthesis

Requirements

API Server:

  • Python 3.9+
  • 2GB+ RAM
  • FastAPI, MeloTTS, Uvicorn
  • Optional: GPU for faster inference

LiveKit Plugin:

  • Python 3.9+
  • livekit-agents >= 0.8.0
  • aiohttp >= 3.9.0

Security

For production:

  • Add API authentication
  • Enable HTTPS/TLS
  • Implement rate limiting
  • Configure CORS
  • Set up monitoring

See docs/DEPLOYMENT.md#security for details.

When to Use This Skill

Use this skill when you need to:

  1. Build a self-hosted TTS solution
  2. Create LiveKit voice agents with custom TTS
  3. Avoid commercial TTS API costs
  4. Have full control over voice synthesis
  5. Support multiple languages
  6. Deploy TTS in private/air-gapped environments
  7. Build voice assistants
  8. Integrate TTS into existing applications

Troubleshooting

Server won't start:

  • Run python -m unidic download
  • Check port 8000 is available
  • Verify dependencies installed

Plugin connection errors:

  • Ensure API server is running
  • Check api_base_url configuration
  • Verify network connectivity

Audio quality issues:

  • Try different voices/speakers
  • Adjust speed parameter
  • Check sample rate configuration

See documentation for more troubleshooting tips.

Resources

License

Apache 2.0 License

Support

  1. Check the documentation in docs/
  2. Review examples in examples/
  3. Run the test suite to verify setup
  4. Check logs for error messages

This skill provides everything needed for production-ready, self-hosted TTS with LiveKit integration. All code is fully functional with no mocks or placeholders.

More by Okeysir198

View all
livekit-agent-tools
1

Comprehensive guide for building functional tools for LiveKit voice agents using the @function_tool decorator. Use when creating tools for LiveKit agents to enable capabilities like API calls, database queries, multi-agent coordination, or any external integrations. Covers tool design, RunContext handling, interruption patterns, parameter documentation, testing, and production best practices.

livekit-stt-selfhosted
1

Build self-hosted speech-to-text APIs using Hugging Face models (Whisper, Wav2Vec2) and create LiveKit voice agent plugins. Use when building STT infrastructure, creating custom LiveKit plugins, deploying self-hosted transcription services, or integrating Whisper/HF models with LiveKit agents. Includes FastAPI server templates, LiveKit plugin implementation, model selection guides, and production deployment patterns.

livekit-prompt-builder
1

Guide for creating effective prompts and instructions for LiveKit voice agents. Use when building conversational AI agents with the LiveKit Agents framework, including (1) Creating new voice agent prompts from scratch, (2) Improving existing agent instructions, (3) Optimizing prompts for text-to-speech output, (4) Integrating tool/function calling capabilities, (5) Building multi-agent systems with handoffs, (6) Ensuring voice-friendly formatting and brevity for natural conversations, (7) Iteratively improving prompts based on testing and feedback, (8) Building industry-specific agents (debt collection, healthcare, banking, customer service, front desk).

tts-livekit-plugin
1

Build self-hosted TTS APIs using HuggingFace models (Parler-TTS, F5-TTS, XTTS-v2) and create LiveKit voice agent plugins with streaming support. Use when creating production-ready text-to-speech systems that need: (1) Self-hosted TTS with full control, (2) LiveKit voice agent integration, (3) Streaming audio for low-latency conversations, (4) Custom voice characteristics, (5) Cost-effective alternatives to cloud TTS providers like ElevenLabs or Google TTS.