Docs / AI & Machine Learning / How to Run Whisper Speech-to-Text on Your Breeze

How to Run Whisper Speech-to-Text on Your Breeze

By Admin · Mar 2, 2026 · Updated Apr 23, 2026 · 28 views · 3 min read

How to Run Whisper Speech-to-Text on Your Breeze

Whisper is an open-source automatic speech recognition (ASR) model that can transcribe audio in dozens of languages with remarkable accuracy. Running Whisper on your own Breeze keeps your audio data private and eliminates per-minute transcription costs.

Prerequisites

  • A Breeze instance with at least 4 GB of RAM (8 GB recommended for larger models)
  • Python 3.9 or later
  • FFmpeg installed for audio processing

Installing FFmpeg and Dependencies

FFmpeg is required for handling various audio formats:

sudo apt update
sudo apt install -y ffmpeg python3 python3-pip python3-venv

Installing Whisper

Create a virtual environment and install the Whisper package:

python3 -m venv ~/whisper-env
source ~/whisper-env/bin/activate
pip install openai-whisper

For GPU acceleration (NVIDIA), ensure CUDA is installed and use pip install openai-whisper[gpu].

Transcribing Audio Files

Use the command-line interface to transcribe an audio file:

whisper audio.mp3 --model medium --language en --output_format txt

Available model sizes and their approximate RAM requirements:

  • tiny — ~1 GB RAM, fastest but least accurate
  • base — ~1 GB RAM, good for clear speech
  • small — ~2 GB RAM, solid accuracy
  • medium — ~5 GB RAM, recommended balance
  • large — ~10 GB RAM, best accuracy

Using Whisper in Python

For programmatic use, import Whisper directly:

import whisper

model = whisper.load_model("medium")
result = model.transcribe("meeting_recording.mp3")
print(result["text"])

The result dictionary also contains segments with timestamps for each phrase, useful for generating subtitles.

Building a Transcription API

Wrap Whisper in a FastAPI application for on-demand transcription:

from fastapi import FastAPI, UploadFile
import whisper, tempfile, os

app = FastAPI()
model = whisper.load_model("medium")

@app.post("/transcribe")
async def transcribe(file: UploadFile):
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
        tmp.write(await file.read())
        tmp_path = tmp.name
    result = model.transcribe(tmp_path)
    os.unlink(tmp_path)
    return {"text": result["text"], "segments": result["segments"]}

Batch Processing Multiple Files

Process an entire directory of audio files:

#!/bin/bash
for file in /data/audio/*.mp3; do
    echo "Transcribing: $file"
    whisper "$file" --model medium --output_dir /data/transcripts/ --output_format srt
done

Using Faster-Whisper for Better Performance

The faster-whisper library uses CTranslate2 for significantly faster inference with lower memory usage:

pip install faster-whisper

from faster_whisper import WhisperModel
model = WhisperModel("medium", compute_type="int8")
segments, info = model.transcribe("audio.mp3")
for segment in segments:
    print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")

This can run 4 to 8 times faster than standard Whisper on CPU, making it ideal for Breeze instances without GPU.

Was this article helpful?