Whisper.cpp Speech-to-Text Server

By Admin · Mar 15, 2026 · Updated Jun 25, 2026 · 609 views · 2 min read

What is Whisper.cpp?

Whisper.cpp is a C/C++ port of OpenAI Whisper that runs efficiently on CPUs without requiring a GPU. It provides accurate speech-to-text transcription in multiple languages, running entirely on your server for complete privacy.

Installation

git clone https://github.com/ggerganov/whisper.cpp.git /opt/whisper.cpp
cd /opt/whisper.cpp
make -j$(nproc)

# Download a model
bash ./models/download-ggml-model.sh base.en
# Models: tiny, base, small, medium, large-v3
# Add .en suffix for English-only (faster)

Basic Transcription

# Convert audio to WAV format (required)
ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

# Transcribe
./main -m models/ggml-base.en.bin -f output.wav

# With timestamps
./main -m models/ggml-base.en.bin -f output.wav -otxt -osrt -ovtt

HTTP Server

# Build the server
make server

# Run HTTP API server
./server -m models/ggml-base.en.bin --host 0.0.0.0 --port 8080

# API usage
curl http://localhost:8080/inference \
    -F "file=@audio.wav" \
    -F "response_format=json"

Performance Tuning

# Use more threads for faster transcription
./main -m models/ggml-base.en.bin -f audio.wav -t 8

# For GPU acceleration (if CUDA available)
make clean
WHISPER_CUDA=1 make -j

# Model selection guide:
# tiny: fastest, least accurate (~1GB RAM)
# base: good balance (~1.5GB RAM)
# small: better accuracy (~2.5GB RAM)
# medium: high accuracy (~5GB RAM)
# large-v3: best accuracy (~10GB RAM)

Use Cases

Meeting transcription and note-taking
Podcast and video subtitling
Voice-to-text input for applications
Call center transcription
Accessibility services