How to Deploy a Machine Learning Model with FastAPI

By Admin · Mar 2, 2026 · Updated Jun 24, 2026 · 66 views · 3 min read

How to Deploy a Machine Learning Model with FastAPI

FastAPI is a modern, high-performance Python web framework ideal for serving machine learning models as REST APIs. Its async support and automatic OpenAPI documentation make it the go-to choice for ML engineers looking to move models into production on their Breeze.

Prerequisites

A Breeze instance with at least 2 GB of RAM
Python 3.9 or later installed
A trained ML model saved as a pickle, joblib, or ONNX file

Installing FastAPI and Uvicorn

Create a project directory and virtual environment, then install the dependencies:

mkdir ~/ml-api && cd ~/ml-api
python3 -m venv venv
source venv/bin/activate
pip install fastapi uvicorn scikit-learn joblib pydantic

Training and Saving a Sample Model

If you do not already have a trained model, here is a quick example using scikit-learn:

import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=100)
model.fit(X, y)
joblib.dump(model, 'model.joblib')

Building the FastAPI Application

Create main.py with the following content:

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI(title="ML Prediction API")
model = joblib.load("model.joblib")

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: int
    probability: list[float]

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    features = np.array(request.features).reshape(1, -1)
    prediction = int(model.predict(features)[0])
    probability = model.predict_proba(features)[0].tolist()
    return PredictionResponse(prediction=prediction, probability=probability)

@app.get("/health")
async def health():
    return {"status": "healthy"}

Running the API Server

Start the server with Uvicorn:

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

Visit http://your-breeze-ip:8000/docs to see the auto-generated Swagger documentation and test your prediction endpoint interactively.

Creating a Systemd Service

For production deployment, create a systemd service at /etc/systemd/system/ml-api.service:

[Unit]
Description=ML Prediction API
After=network.target

[Service]
User=www-data
WorkingDirectory=/home/deploy/ml-api
ExecStart=/home/deploy/ml-api/venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
Restart=always
Environment=PYTHONUNBUFFERED=1

[Install]
WantedBy=multi-user.target

Enable and start the service with sudo systemctl enable --now ml-api.

Adding Authentication

Protect your endpoint with API key authentication using FastAPI’s dependency injection:

from fastapi import Depends, HTTPException, Header

async def verify_api_key(x_api_key: str = Header(...)):
    if x_api_key != "your-secret-key":
        raise HTTPException(status_code=401, detail="Invalid API key")

@app.post("/predict", dependencies=[Depends(verify_api_key)])
async def predict(request: PredictionRequest):
    ...

This ensures only authorized clients can call your ML inference endpoint on your Breeze.

How to Deploy a Machine Learning Model with FastAPI