Docs / Databases / Scaling MySQL with Vitess: A Practical Guide

Scaling MySQL with Vitess: A Practical Guide

By Admin · Mar 15, 2026 · Updated Apr 23, 2026 · 338 views · 4 min read

Vitess is a database clustering system for horizontal scaling of MySQL, originally built at YouTube to handle massive traffic. It provides sharding, connection pooling, and query routing while maintaining MySQL compatibility. This guide covers deploying Vitess for production MySQL scaling.

Why Vitess?

Vitess solves several MySQL scaling challenges:

  • Horizontal sharding — automatically split data across multiple MySQL instances
  • Connection pooling — multiplexes thousands of application connections into a small number of MySQL connections
  • Query protection — prevents poorly-written queries from overwhelming the database
  • Online schema changes — apply DDL without locking tables
  • Topology management — handles failover and replication automatically

Vitess Architecture

Key Vitess components:

  • VTGate — the query router; applications connect here instead of MySQL directly
  • VTTablet — runs alongside each MySQL instance, managing it and serving queries
  • Topology Service — stores cluster metadata (uses etcd, ZooKeeper, or Consul)
  • VTCtld — cluster management daemon and web interface
  • VTOrc — automated failover orchestrator

Installation with Docker Compose

# Clone Vitess
git clone https://github.com/vitessio/vitess.git
cd vitess/examples/compose

# Start a local cluster with 2 shards
docker compose up -d

# This creates:
# - 1 VTGate (port 15991 for MySQL protocol)
# - 2 shards, each with 1 primary + 1 replica
# - etcd for topology
# - VTCtld with web UI (port 15000)

Production Deployment with Kubernetes

Vitess is designed for Kubernetes. Use the Vitess Operator for production deployments:

# Install the Vitess operator
kubectl apply -f https://github.com/planetscale/vitess-operator/releases/latest/download/operator.yaml

# Create a VitessCluster resource
cat <<EOF | kubectl apply -f -
apiVersion: planetscale.com/v2
kind: VitessCluster
metadata:
  name: production
spec:
  images:
    vtgate: vitess/lite:v19
    vttablet: vitess/lite:v19
    vtbackup: vitess/lite:v19
    vtctld: vitess/lite:v19
    vtorc: vitess/lite:v19
  cells:
    - name: zone1
      gateway:
        replicas: 2
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
  keyspaces:
    - name: commerce
      turndownPolicy: Immediate
      partitionings:
        - equal:
            parts: 2
            shardTemplate:
              databaseInitScriptSecret:
                name: commerce-schema
                key: init_db.sql
              tabletPools:
                - cell: zone1
                  type: replica
                  replicas: 3
                  mysqld:
                    resources:
                      requests:
                        cpu: "4"
                        memory: "8Gi"
                  dataVolumeClaimTemplate:
                    accessModes: ["ReadWriteOnce"]
                    resources:
                      requests:
                        storage: 100Gi
EOF

Creating a Keyspace and Schema

# A keyspace is the Vitess equivalent of a database
# Connect to VTGate
mysql -h vtgate-host -P 15991 -u user

# Create schema through vtctldclient
vtctldclient ApplySchema --sql="
CREATE TABLE customers (
    id BIGINT NOT NULL AUTO_INCREMENT,
    email VARCHAR(255) NOT NULL,
    name VARCHAR(255),
    PRIMARY KEY (id)
) ENGINE=InnoDB;

CREATE TABLE orders (
    id BIGINT NOT NULL AUTO_INCREMENT,
    customer_id BIGINT NOT NULL,
    total DECIMAL(10,2),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id)
) ENGINE=InnoDB;
" commerce

Sharding Strategy with VSchema

Vitess uses a VSchema to define how data is sharded:

// VSchema for the commerce keyspace
{
  "sharded": true,
  "vindexes": {
    "hash": {
      "type": "hash"
    },
    "customer_lookup": {
      "type": "consistent_lookup",
      "params": {
        "table": "customer_lookup",
        "from": "email",
        "to": "customer_id"
      }
    }
  },
  "tables": {
    "customers": {
      "column_vindexes": [
        {
          "column": "id",
          "name": "hash"
        }
      ]
    },
    "orders": {
      "column_vindexes": [
        {
          "column": "customer_id",
          "name": "hash"
        }
      ]
    }
  }
}
# Apply VSchema
vtctldclient ApplyVSchema --vschema-file=commerce_vschema.json commerce

Performing a Reshard

# Split 2 shards into 4
vtctldclient Reshard --workflow commerce2x4 --target-keyspace commerce create --source-shards='-80,80-' --target-shards='-40,40-80,80-c0,c0-'

# Monitor progress
vtctldclient Reshard --workflow commerce2x4 --target-keyspace commerce show

# Switch reads then writes
vtctldclient Reshard --workflow commerce2x4 --target-keyspace commerce switchtraffic --tablet-types=rdonly,replica
vtctldclient Reshard --workflow commerce2x4 --target-keyspace commerce switchtraffic --tablet-types=primary

# Complete the reshard
vtctldclient Reshard --workflow commerce2x4 --target-keyspace commerce complete

Online Schema Changes

# Vitess uses Online DDL (gh-ost or pt-osc under the hood)
vtctldclient ApplySchema --sql="ALTER TABLE orders ADD COLUMN status VARCHAR(50) DEFAULT 'pending'" --ddl-strategy="vitess" commerce

# Check migration status
vtctldclient OnlineDDL show commerce all

Connecting Applications

# Applications connect to VTGate — it looks like a normal MySQL server
# PHP
$pdo = new PDO('mysql:host=vtgate-host;port=15991;dbname=commerce', 'app_user', 'password');

# Node.js
const connection = mysql.createConnection({
  host: 'vtgate-host',
  port: 15991,
  database: 'commerce',
  user: 'app_user',
  password: 'password'
});

Monitoring Vitess

# VTGate exposes Prometheus metrics on /debug/vars
# Key metrics to monitor:
# - vtgate_queries_processed_total — query throughput
# - vtgate_error_counts — error rates
# - vttablet_query_counts — per-shard query distribution
# - vttablet_replication_lag_seconds — replica lag

# VTCtld web UI provides cluster overview
# Access at http://vtctld-host:15000

Production Best Practices

  • Start with 2 shards and plan for growth — Vitess makes adding shards straightforward
  • Choose shard keys that distribute data evenly and align with your query patterns
  • Deploy multiple VTGate instances behind a load balancer for high availability
  • Use VTOrc for automated primary failover within each shard
  • Test resharding in staging before production — it is a complex operation even when automated
  • Monitor tablet health and replication lag across all shards
  • Use Online DDL for all schema changes to avoid table locks

Was this article helpful?