Docs / AI & Machine Learning / OpenAI-Compatible API Gateway with LiteLLM

OpenAI-Compatible API Gateway with LiteLLM

By Admin · Mar 15, 2026 · Updated Apr 23, 2026 · 716 views · 2 min read

Why a Unified API Gateway?

Different LLM providers have different APIs, rate limits, and pricing. A unified gateway lets your applications use one consistent API format while the gateway handles routing, fallbacks, load balancing, and cost tracking across providers.

Quick Setup

pip install litellm[proxy]

# Start the proxy
litellm --model ollama/llama3 --port 4000

# Or with config file
litellm --config config.yaml --port 4000

Configuration

# config.yaml
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: local
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434

general_settings:
  master_key: sk-your-master-key

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 2
  fallbacks:
    - gpt-4: [claude, local]

Usage

# Use exactly like OpenAI API
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000",
    api_key="sk-your-master-key"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)
# If OpenAI fails, automatically falls back to Claude, then local Ollama

Virtual Keys and Budgets

# Create team-specific API keys with budgets
curl -X POST http://localhost:4000/key/generate \
    -H "Authorization: Bearer sk-your-master-key" \
    -d '{
        "models": ["gpt-4", "claude"],
        "max_budget": 100.00,
        "budget_duration": "monthly",
        "metadata": {"team": "engineering"}
    }'

Features

  • 100+ LLM provider support
  • Automatic fallbacks and retries
  • Spend tracking and budget limits
  • Virtual API keys per team/project
  • Request logging and analytics
  • Rate limiting per key
  • Streaming support

Was this article helpful?