How to Build an AI Chatbot with Open-Source Models
Building your own AI chatbot with open-source models gives you full control over the model, data privacy, and customization. Running the entire stack on your Breeze means no API costs, no rate limits, and complete ownership of your conversational AI.
Prerequisites
- A Breeze instance with at least 8 GB of RAM (16 GB recommended)
- Python 3.10 or later
- Basic knowledge of Python web development
Choosing an Open-Source LLM
Several excellent open-source models are suitable for chatbot applications:
- Llama 3 8B — strong general-purpose model, good balance of quality and speed
- Mistral 7B Instruct — excellent instruction following at a compact size
- Phi-3 — Microsoft’s compact model, efficient on CPU
- Gemma 2 — lightweight model from Google with good conversational ability
Setting Up Ollama as the LLM Backend
Ollama provides the simplest way to run LLMs locally:
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3
ollama pull mistral
Verify the installation:
ollama run llama3 "Hello, how are you?"
Building the Chat Backend
Create a FastAPI backend that manages conversation history and streams responses:
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
import httpx, json
app = FastAPI()
class ChatMessage(BaseModel):
role: str
content: str
class ChatRequest(BaseModel):
messages: list[ChatMessage]
model: str = "llama3"
SYSTEM_PROMPT = "You are a helpful, friendly assistant. Be concise and accurate."
@app.post("/chat")
async def chat(request: ChatRequest):
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
messages.extend([m.dict() for m in request.messages])
async def generate():
async with httpx.AsyncClient() as client:
async with client.stream("POST", "http://localhost:11434/api/chat",
json={"model": request.model, "messages": messages, "stream": True},
timeout=120.0
) as response:
async for line in response.aiter_lines():
if line:
data = json.loads(line)
if "message" in data:
yield f"data: {json.dumps(data['message'])}\n\n"
if data.get("done"):
yield "data: [DONE]\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
Building the Chat Frontend
Create a simple HTML/JavaScript chat interface:
<!DOCTYPE html>
<html>
<head>
<title>AI Chatbot</title>
<style>
#chat { max-width: 800px; margin: 0 auto; padding: 20px; }
.message { margin: 10px 0; padding: 12px; border-radius: 8px; }
.user { background: #e3f2fd; text-align: right; }
.assistant { background: #f5f5f5; }
#input { width: 80%; padding: 10px; }
#send { padding: 10px 20px; }
</style>
</head>
<body>
<div id="chat">
<div id="messages"></div>
<input id="input" placeholder="Type your message...">
<button id="send">Send</button>
</div>
<script>
const messages = [];
async function sendMessage() {
const input = document.getElementById('input');
const text = input.value.trim();
if (!text) return;
messages.push({role: 'user', content: text});
appendMessage('user', text);
input.value = '';
const res = await fetch('/chat', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({messages: messages})
});
const reader = res.body.getReader();
let assistantText = '';
const msgEl = appendMessage('assistant', '');
while (true) {
const {done, value} = await reader.read();
if (done) break;
const chunk = new TextDecoder().decode(value);
for (const line of chunk.split('\n')) {
if (line.startsWith('data: ') && !line.includes('[DONE]')) {
const data = JSON.parse(line.slice(6));
assistantText += data.content || '';
msgEl.textContent = assistantText;
}
}
}
messages.push({role: 'assistant', content: assistantText});
}
function appendMessage(role, text) {
const el = document.createElement('div');
el.className = 'message ' + role;
el.textContent = text;
document.getElementById('messages').appendChild(el);
return el;
}
</script>
</body>
</html>
Adding Conversation Memory
For persistent conversations, store chat histories in a database:
import sqlite3, uuid
def save_conversation(conv_id, messages):
conn = sqlite3.connect("chatbot.db")
conn.execute("CREATE TABLE IF NOT EXISTS conversations (id TEXT, messages TEXT, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)")
conn.execute("INSERT OR REPLACE INTO conversations (id, messages) VALUES (?, ?)",
(conv_id, json.dumps(messages)))
conn.commit()
def load_conversation(conv_id):
conn = sqlite3.connect("chatbot.db")
row = conn.execute("SELECT messages FROM conversations WHERE id = ?", (conv_id,)).fetchone()
return json.loads(row[0]) if row else []
Adding Guardrails
Implement basic content filtering and safety measures:
BLOCKED_TOPICS = ["harmful", "illegal", "dangerous"]
def check_input(text: str) -> bool:
text_lower = text.lower()
return not any(topic in text_lower for topic in BLOCKED_TOPICS)
@app.post("/chat")
async def chat(request: ChatRequest):
last_message = request.messages[-1].content
if not check_input(last_message):
return {"error": "This topic is not supported."}
# ... proceed with normal chat flow
Deploying to Production
Create a systemd service for the chat backend:
[Unit]
Description=AI Chatbot Backend
After=network.target ollama.service
[Service]
User=deploy
WorkingDirectory=/home/deploy/chatbot
ExecStart=/home/deploy/chatbot/venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000
Restart=always
[Install]
WantedBy=multi-user.target
Place Nginx in front with SSL termination, and your Breeze is serving a fully private AI chatbot that you control entirely.