How to Set Up Whisper for Speech-to-Text
OpenAI Whisper is an open-source speech recognition model that transcribes audio to text with high accuracy. Run it on your Breeze for private, unlimited transcription.
Requirements
- A Breeze with at least 4 GB RAM (8 GB for larger models)
- Python 3.9 or newer
- ffmpeg for audio processing
Installation
sudo apt update && sudo apt install ffmpeg python3-venv python3-pip -y
python3 -m venv ~/whisper-env
source ~/whisper-env/bin/activate
pip install openai-whisper
Transcribe an Audio File
whisper audio.mp3 --model base --language en
Model Sizes
- tiny -- 39M parameters, ~1 GB RAM, fastest
- base -- 74M parameters, ~1 GB RAM, good accuracy
- small -- 244M parameters, ~2 GB RAM, balanced
- medium -- 769M parameters, ~5 GB RAM, high accuracy
- large -- 1.5B parameters, ~10 GB RAM, best quality
Output Formats
Whisper supports multiple output formats:
whisper audio.mp3 --model small --output_format srt
whisper audio.mp3 --model small --output_format vtt
whisper audio.mp3 --model small --output_format json
Batch Processing
Process multiple files at once:
for f in *.mp3; do whisper "$f" --model base; done
For a web API, consider wrapping Whisper with FastAPI to accept uploads and return transcriptions.