How to Deploy LocalAI for API-Compatible LLM Hosting
LocalAI is an OpenAI API-compatible server that runs LLMs, image generation, and audio models locally on your Breeze. It is a drop-in replacement for cloud AI APIs.
Requirements
- A Breeze with at least 8 GB RAM
- Docker installed
Quick Start with Docker
docker run -d -p 8080:8080 \
-v localai-models:/build/models \
--name localai \
--restart always \
localai/localai:latest-cpu
Install a Model
Use the built-in model gallery:
curl http://localhost:8080/models/apply -H "Content-Type: application/json" \
-d '{"id": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"}'
Use the API
LocalAI is compatible with OpenAI client libraries:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Hello"}]}'
Integration
Point any application that supports OpenAI APIs to http://your-breeze-ip:8080 as the base URL. No API key is required by default. Add authentication with a reverse proxy for production use.
Resource Monitoring
Check container resource usage with docker stats localai. Adjust model quantization level if memory is constrained.