How to Deploy a Machine Learning Model with FastAPI
FastAPI is a modern, high-performance Python web framework ideal for serving machine learning models as REST APIs. Its async support and automatic OpenAPI documentation make it the go-to choice for ML engineers looking to move models into production on their Breeze.
Prerequisites
- A Breeze instance with at least 2 GB of RAM
- Python 3.9 or later installed
- A trained ML model saved as a pickle, joblib, or ONNX file
Installing FastAPI and Uvicorn
Create a project directory and virtual environment, then install the dependencies:
mkdir ~/ml-api && cd ~/ml-api
python3 -m venv venv
source venv/bin/activate
pip install fastapi uvicorn scikit-learn joblib pydantic
Training and Saving a Sample Model
If you do not already have a trained model, here is a quick example using scikit-learn:
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=100)
model.fit(X, y)
joblib.dump(model, 'model.joblib')
Building the FastAPI Application
Create main.py with the following content:
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI(title="ML Prediction API")
model = joblib.load("model.joblib")
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: int
probability: list[float]
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
features = np.array(request.features).reshape(1, -1)
prediction = int(model.predict(features)[0])
probability = model.predict_proba(features)[0].tolist()
return PredictionResponse(prediction=prediction, probability=probability)
@app.get("/health")
async def health():
return {"status": "healthy"}
Running the API Server
Start the server with Uvicorn:
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
Visit http://your-breeze-ip:8000/docs to see the auto-generated Swagger documentation and test your prediction endpoint interactively.
Creating a Systemd Service
For production deployment, create a systemd service at /etc/systemd/system/ml-api.service:
[Unit]
Description=ML Prediction API
After=network.target
[Service]
User=www-data
WorkingDirectory=/home/deploy/ml-api
ExecStart=/home/deploy/ml-api/venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
Restart=always
Environment=PYTHONUNBUFFERED=1
[Install]
WantedBy=multi-user.target
Enable and start the service with sudo systemctl enable --now ml-api.
Adding Authentication
Protect your endpoint with API key authentication using FastAPI’s dependency injection:
from fastapi import Depends, HTTPException, Header
async def verify_api_key(x_api_key: str = Header(...)):
if x_api_key != "your-secret-key":
raise HTTPException(status_code=401, detail="Invalid API key")
@app.post("/predict", dependencies=[Depends(verify_api_key)])
async def predict(request: PredictionRequest):
...
This ensures only authorized clients can call your ML inference endpoint on your Breeze.