Docs / AI & Machine Learning / How to Set Up Whisper for Speech-to-Text

How to Set Up Whisper for Speech-to-Text

By Admin · Mar 1, 2026 · Updated Apr 23, 2026 · 29 views · 1 min read

How to Set Up Whisper for Speech-to-Text

OpenAI Whisper is an open-source speech recognition model that transcribes audio to text with high accuracy. Run it on your Breeze for private, unlimited transcription.

Requirements

  • A Breeze with at least 4 GB RAM (8 GB for larger models)
  • Python 3.9 or newer
  • ffmpeg for audio processing

Installation

sudo apt update && sudo apt install ffmpeg python3-venv python3-pip -y
python3 -m venv ~/whisper-env
source ~/whisper-env/bin/activate
pip install openai-whisper

Transcribe an Audio File

whisper audio.mp3 --model base --language en

Model Sizes

  • tiny -- 39M parameters, ~1 GB RAM, fastest
  • base -- 74M parameters, ~1 GB RAM, good accuracy
  • small -- 244M parameters, ~2 GB RAM, balanced
  • medium -- 769M parameters, ~5 GB RAM, high accuracy
  • large -- 1.5B parameters, ~10 GB RAM, best quality

Output Formats

Whisper supports multiple output formats:

whisper audio.mp3 --model small --output_format srt
whisper audio.mp3 --model small --output_format vtt
whisper audio.mp3 --model small --output_format json

Batch Processing

Process multiple files at once:

for f in *.mp3; do whisper "$f" --model base; done

For a web API, consider wrapping Whisper with FastAPI to accept uploads and return transcriptions.

Was this article helpful?