LiteLLM Multi-Provider LLM Routing

By Admin · Mar 15, 2026 · Updated Apr 23, 2026 · 638 views · 2 min read

What is LiteLLM?

LiteLLM is a proxy server that provides a unified OpenAI-compatible API for 100+ LLM providers. It handles routing, load balancing, fallbacks, rate limiting, and cost tracking across providers like OpenAI, Anthropic, Cohere, Azure, AWS Bedrock, and local models.

Installation

pip install litellm[proxy]

# Or Docker
docker run -d --name litellm \
    -p 4000:4000 \
    -v /opt/litellm/config.yaml:/app/config.yaml \
    --restart unless-stopped \
    ghcr.io/berriai/litellm:main-latest \
    --config /app/config.yaml

Configuration

# /opt/litellm/config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: sk-your-openai-key

  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514
      api_key: sk-ant-your-key

  - model_name: local-llama
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434

litellm_settings:
  drop_params: true
  set_verbose: false

router_settings:
  routing_strategy: least-busy
  num_retries: 3
  fallbacks:
    - gpt-4o: [claude-sonnet, local-llama]

Usage

# Use like OpenAI API
curl http://localhost:4000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer sk-your-litellm-key" \
    -d '{
        "model": "gpt-4o",
        "messages": [{"role": "user", "content": "Hello!"}]
    }'

# If OpenAI fails, automatically falls back to Claude, then Ollama

Features

Unified API for 100+ LLM providers
Automatic fallbacks and retries
Load balancing across model deployments
Rate limiting and spend tracking
Virtual API keys for team management
Streaming support
Cost tracking and budgets per key