Docs / AI & Machine Learning / How to Build an AI Chatbot with Open-Source Models

How to Build an AI Chatbot with Open-Source Models

By Admin · Mar 2, 2026 · Updated Apr 25, 2026 · 28 views · 5 min read

How to Build an AI Chatbot with Open-Source Models

Building your own AI chatbot with open-source models gives you full control over the model, data privacy, and customization. Running the entire stack on your Breeze means no API costs, no rate limits, and complete ownership of your conversational AI.

Prerequisites

  • A Breeze instance with at least 8 GB of RAM (16 GB recommended)
  • Python 3.10 or later
  • Basic knowledge of Python web development

Choosing an Open-Source LLM

Several excellent open-source models are suitable for chatbot applications:

  • Llama 3 8B — strong general-purpose model, good balance of quality and speed
  • Mistral 7B Instruct — excellent instruction following at a compact size
  • Phi-3 — Microsoft’s compact model, efficient on CPU
  • Gemma 2 — lightweight model from Google with good conversational ability

Setting Up Ollama as the LLM Backend

Ollama provides the simplest way to run LLMs locally:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3
ollama pull mistral

Verify the installation:

ollama run llama3 "Hello, how are you?"

Building the Chat Backend

Create a FastAPI backend that manages conversation history and streams responses:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
import httpx, json

app = FastAPI()

class ChatMessage(BaseModel):
    role: str
    content: str

class ChatRequest(BaseModel):
    messages: list[ChatMessage]
    model: str = "llama3"

SYSTEM_PROMPT = "You are a helpful, friendly assistant. Be concise and accurate."

@app.post("/chat")
async def chat(request: ChatRequest):
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    messages.extend([m.dict() for m in request.messages])

    async def generate():
        async with httpx.AsyncClient() as client:
            async with client.stream("POST", "http://localhost:11434/api/chat",
                json={"model": request.model, "messages": messages, "stream": True},
                timeout=120.0
            ) as response:
                async for line in response.aiter_lines():
                    if line:
                        data = json.loads(line)
                        if "message" in data:
                            yield f"data: {json.dumps(data['message'])}\n\n"
                        if data.get("done"):
                            yield "data: [DONE]\n\n"

    return StreamingResponse(generate(), media_type="text/event-stream")

Building the Chat Frontend

Create a simple HTML/JavaScript chat interface:

<!DOCTYPE html>
<html>
<head>
    <title>AI Chatbot</title>
    <style>
        #chat { max-width: 800px; margin: 0 auto; padding: 20px; }
        .message { margin: 10px 0; padding: 12px; border-radius: 8px; }
        .user { background: #e3f2fd; text-align: right; }
        .assistant { background: #f5f5f5; }
        #input { width: 80%; padding: 10px; }
        #send { padding: 10px 20px; }
    </style>
</head>
<body>
<div id="chat">
    <div id="messages"></div>
    <input id="input" placeholder="Type your message...">
    <button id="send">Send</button>
</div>
<script>
const messages = [];
async function sendMessage() {
    const input = document.getElementById('input');
    const text = input.value.trim();
    if (!text) return;
    messages.push({role: 'user', content: text});
    appendMessage('user', text);
    input.value = '';
    const res = await fetch('/chat', {
        method: 'POST',
        headers: {'Content-Type': 'application/json'},
        body: JSON.stringify({messages: messages})
    });
    const reader = res.body.getReader();
    let assistantText = '';
    const msgEl = appendMessage('assistant', '');
    while (true) {
        const {done, value} = await reader.read();
        if (done) break;
        const chunk = new TextDecoder().decode(value);
        for (const line of chunk.split('\n')) {
            if (line.startsWith('data: ') && !line.includes('[DONE]')) {
                const data = JSON.parse(line.slice(6));
                assistantText += data.content || '';
                msgEl.textContent = assistantText;
            }
        }
    }
    messages.push({role: 'assistant', content: assistantText});
}
function appendMessage(role, text) {
    const el = document.createElement('div');
    el.className = 'message ' + role;
    el.textContent = text;
    document.getElementById('messages').appendChild(el);
    return el;
}
</script>
</body>
</html>

Adding Conversation Memory

For persistent conversations, store chat histories in a database:

import sqlite3, uuid

def save_conversation(conv_id, messages):
    conn = sqlite3.connect("chatbot.db")
    conn.execute("CREATE TABLE IF NOT EXISTS conversations (id TEXT, messages TEXT, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)")
    conn.execute("INSERT OR REPLACE INTO conversations (id, messages) VALUES (?, ?)",
                 (conv_id, json.dumps(messages)))
    conn.commit()

def load_conversation(conv_id):
    conn = sqlite3.connect("chatbot.db")
    row = conn.execute("SELECT messages FROM conversations WHERE id = ?", (conv_id,)).fetchone()
    return json.loads(row[0]) if row else []

Adding Guardrails

Implement basic content filtering and safety measures:

BLOCKED_TOPICS = ["harmful", "illegal", "dangerous"]

def check_input(text: str) -> bool:
    text_lower = text.lower()
    return not any(topic in text_lower for topic in BLOCKED_TOPICS)

@app.post("/chat")
async def chat(request: ChatRequest):
    last_message = request.messages[-1].content
    if not check_input(last_message):
        return {"error": "This topic is not supported."}
    # ... proceed with normal chat flow

Deploying to Production

Create a systemd service for the chat backend:

[Unit]
Description=AI Chatbot Backend
After=network.target ollama.service

[Service]
User=deploy
WorkingDirectory=/home/deploy/chatbot
ExecStart=/home/deploy/chatbot/venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000
Restart=always

[Install]
WantedBy=multi-user.target

Place Nginx in front with SSL termination, and your Breeze is serving a fully private AI chatbot that you control entirely.

Was this article helpful?