Set Up a Private AI Coding Assistant on Your VPS

By Admin · Mar 15, 2026 · Updated Jun 24, 2026 · 177 views · 2 min read

Running a private AI coding assistant on your VPS gives you the benefits of tools like GitHub Copilot without sending your proprietary code to third-party servers. This guide covers setting up Continue.dev, Tabby, or similar open-source coding assistants backed by local LLMs on your Kazepute Breeze.

Why Self-Host a Coding Assistant?

Third-party AI coding tools process your code on external servers, raising concerns about intellectual property, compliance (SOC 2, HIPAA), and costs. A self-hosted assistant keeps everything on your infrastructure while providing real-time code completion, documentation generation, and refactoring suggestions.

Hardware Requirements

For a responsive coding assistant, you need adequate resources:

Minimum: 4 vCPU, 16GB RAM — runs 7B parameter models (CodeLlama-7B, DeepSeek Coder 6.7B)
Recommended: 8 vCPU, 32GB RAM — runs 13B–34B models with better quality
Optimal: GPU-equipped instance — fastest inference for larger models

Option 1: Tabby — Self-Hosted AI Coding Assistant

Tabby is purpose-built for code completion and works with VS Code, JetBrains IDEs, and Vim/Neovim.

Install Tabby with Docker

# Create directory for Tabby data
mkdir -p /opt/tabby/data

# Run Tabby with CPU inference
docker run -d \
  --name tabby \
  -p 8080:8080 \
  -v /opt/tabby/data:/data \
  --restart unless-stopped \
  tabbyml/tabby:latest \
  serve --model TabbyML/DeepSeek-Coder-6.7B \
  --device cpu

# For GPU-equipped servers (CUDA)
docker run -d \
  --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v /opt/tabby/data:/data \
  --restart unless-stopped \
  tabbyml/tabby:latest \
  serve --model TabbyML/CodeLlama-34B

Configure Tabby for Your Repository

Tabby can index your codebase for context-aware completions:

# Create Tabby config
cat > /opt/tabby/data/config.toml