Running a private AI coding assistant on your VPS gives you the benefits of tools like GitHub Copilot without sending your proprietary code to third-party servers. This guide covers setting up Continue.dev, Tabby, or similar open-source coding assistants backed by local LLMs on your Kazepute Breeze.
Why Self-Host a Coding Assistant?
Third-party AI coding tools process your code on external servers, raising concerns about intellectual property, compliance (SOC 2, HIPAA), and costs. A self-hosted assistant keeps everything on your infrastructure while providing real-time code completion, documentation generation, and refactoring suggestions.
Hardware Requirements
For a responsive coding assistant, you need adequate resources:
- Minimum: 4 vCPU, 16GB RAM — runs 7B parameter models (CodeLlama-7B, DeepSeek Coder 6.7B)
- Recommended: 8 vCPU, 32GB RAM — runs 13B–34B models with better quality
- Optimal: GPU-equipped instance — fastest inference for larger models
Option 1: Tabby — Self-Hosted AI Coding Assistant
Tabby is purpose-built for code completion and works with VS Code, JetBrains IDEs, and Vim/Neovim.
Install Tabby with Docker
# Create directory for Tabby data
mkdir -p /opt/tabby/data
# Run Tabby with CPU inference
docker run -d \
--name tabby \
-p 8080:8080 \
-v /opt/tabby/data:/data \
--restart unless-stopped \
tabbyml/tabby:latest \
serve --model TabbyML/DeepSeek-Coder-6.7B \
--device cpu
# For GPU-equipped servers (CUDA)
docker run -d \
--name tabby \
--gpus all \
-p 8080:8080 \
-v /opt/tabby/data:/data \
--restart unless-stopped \
tabbyml/tabby:latest \
serve --model TabbyML/CodeLlama-34B
Configure Tabby for Your Repository
Tabby can index your codebase for context-aware completions:
# Create Tabby config
cat > /opt/tabby/data/config.toml