What is Coqui TTS?
Coqui TTS is an open-source text-to-speech library supporting multiple TTS models and voice cloning. It can generate natural-sounding speech and clone voices from short audio samples.
Installation
pip install TTS
# List available models
tts --list_models
# Download and test a model
tts --text "Hello, this is a test of text to speech." \
--model_name tts_models/en/ljspeech/tacotron2-DDC \
--out_path output.wav
Voice Cloning
# Clone a voice from a short audio sample (5-30 seconds)
tts --text "This is my cloned voice speaking." \
--model_name tts_models/multilingual/multi-dataset/xtts_v2 \
--speaker_wav speaker_sample.wav \
--language_idx en \
--out_path cloned_output.wav
TTS Server API
# Start TTS server
tts-server --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
--host 0.0.0.0 --port 5002
# API usage
curl "http://localhost:5002/api/tts?text=Hello+world" -o output.wav
# With voice cloning via API
curl -X POST "http://localhost:5002/api/tts" \
-F "text=Hello from cloned voice" \
-F "speaker_wav=@speaker_sample.wav" \
-o cloned.wav
Python API
from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")
# Generate speech
tts.tts_to_file(
text="Welcome to our service.",
speaker_wav="speaker.wav",
language="en",
file_path="output.wav"
)
Use Cases
- Custom voice assistants and IVR systems
- Audiobook generation
- Video narration automation
- Accessibility features for applications
- Multilingual content generation