How to Set Up Whisper for Speech-to-Text

By Admin · Mar 1, 2026 · Updated Jun 25, 2026 · 60 views · 1 min read

How to Set Up Whisper for Speech-to-Text

OpenAI Whisper is an open-source speech recognition model that transcribes audio to text with high accuracy. Run it on your Breeze for private, unlimited transcription.

Requirements

A Breeze with at least 4 GB RAM (8 GB for larger models)
Python 3.9 or newer
ffmpeg for audio processing

Installation

sudo apt update && sudo apt install ffmpeg python3-venv python3-pip -y
python3 -m venv ~/whisper-env
source ~/whisper-env/bin/activate
pip install openai-whisper

Transcribe an Audio File

whisper audio.mp3 --model base --language en

Model Sizes

tiny -- 39M parameters, ~1 GB RAM, fastest
base -- 74M parameters, ~1 GB RAM, good accuracy
small -- 244M parameters, ~2 GB RAM, balanced
medium -- 769M parameters, ~5 GB RAM, high accuracy
large -- 1.5B parameters, ~10 GB RAM, best quality

Output Formats

Whisper supports multiple output formats:

whisper audio.mp3 --model small --output_format srt
whisper audio.mp3 --model small --output_format vtt
whisper audio.mp3 --model small --output_format json

Batch Processing

Process multiple files at once:

for f in *.mp3; do whisper "$f" --model base; done

For a web API, consider wrapping Whisper with FastAPI to accept uploads and return transcriptions.

How to Set Up Whisper for Speech-to-Text

How to Set Up Whisper for Speech-to-Text

Requirements

Installation

Transcribe an Audio File

Model Sizes

Output Formats

Batch Processing

Related Articles