Citus transforms PostgreSQL into a distributed database by transparently sharding tables across multiple nodes. It extends PostgreSQL rather than forking it, so you get full SQL compatibility with horizontal scalability. This guide covers deploying Citus for production workloads including multi-tenant SaaS applications and real-time analytics.
Why Citus?
- Transparent sharding — distribute tables across nodes with standard SQL
- Real PostgreSQL — not a fork; all PostgreSQL features, extensions, and tools work
- Multi-tenant optimization — co-locate tenant data for efficient joins
- Real-time analytics — parallel query execution across shards
- Columnar storage — built-in columnar engine for analytics tables
Installation
# Ubuntu/Debian
curl https://install.citusdata.com/community/deb.sh | sudo bash
sudo apt install postgresql-16-citus-12.1
# Rocky Linux
curl https://install.citusdata.com/community/rpm.sh | sudo bash
sudo dnf install citus121_16
# Enable the extension
sudo -u postgres psql -c "ALTER SYSTEM SET shared_preload_libraries = 'citus';"
sudo systemctl restart postgresql
Cluster Setup
Coordinator Node
-- On the coordinator (manages metadata and routes queries)
CREATE EXTENSION citus;
-- Add worker nodes
SELECT citus_set_coordinator_host('coordinator-ip', 5432);
SELECT * FROM citus_add_node('worker1-ip', 5432);
SELECT * FROM citus_add_node('worker2-ip', 5432);
SELECT * FROM citus_add_node('worker3-ip', 5432);
-- Verify cluster
SELECT * FROM citus_get_active_worker_nodes();
Worker Nodes
-- On each worker node
CREATE EXTENSION citus;
-- Workers are configured automatically when added from the coordinator
Distributing Tables
Multi-Tenant Pattern (Recommended for SaaS)
-- Create tables with tenant_id as the distribution column
CREATE TABLE tenants (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
plan TEXT DEFAULT 'free',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE users (
id BIGSERIAL,
tenant_id BIGINT REFERENCES tenants(id),
email TEXT NOT NULL,
name TEXT,
PRIMARY KEY (tenant_id, id)
);
CREATE TABLE orders (
id BIGSERIAL,
tenant_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
total DECIMAL(10,2),
status TEXT DEFAULT 'pending',
created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (tenant_id, id),
FOREIGN KEY (tenant_id, user_id) REFERENCES users(tenant_id, id)
);
-- Distribute all tables by tenant_id (co-locates related data)
SELECT create_distributed_table('tenants', 'id');
SELECT create_distributed_table('users', 'tenant_id');
SELECT create_distributed_table('orders', 'tenant_id');
Reference Tables (Small Lookup Tables)
-- Reference tables are copied to all nodes (for joins)
CREATE TABLE countries (
code CHAR(2) PRIMARY KEY,
name TEXT NOT NULL
);
SELECT create_reference_table('countries');
Columnar Storage for Analytics
-- Create a columnar table for analytics workloads
CREATE TABLE events (
tenant_id BIGINT,
event_type TEXT,
payload JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
) USING columnar;
SELECT create_distributed_table('events', 'tenant_id');
Querying Distributed Tables
-- Standard SQL works transparently
-- Single-tenant query (routed to one shard):
SELECT o.id, o.total, u.name
FROM orders o
JOIN users u ON u.tenant_id = o.tenant_id AND u.id = o.user_id
WHERE o.tenant_id = 42
ORDER BY o.created_at DESC
LIMIT 20;
-- Cross-tenant aggregation (parallel across all shards):
SELECT tenant_id, COUNT(*) AS order_count, SUM(total) AS revenue
FROM orders
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY tenant_id
ORDER BY revenue DESC
LIMIT 10;
-- Join with reference table (works on any shard):
SELECT u.name, c.name AS country
FROM users u
JOIN countries c ON c.code = u.country_code
WHERE u.tenant_id = 42;
Rebalancing and Adding Nodes
-- Add a new worker node
SELECT * FROM citus_add_node('worker4-ip', 5432);
-- Rebalance shards across all nodes
SELECT citus_rebalance_start();
-- Monitor rebalance progress
SELECT * FROM citus_rebalance_status();
Tenant Isolation
-- Isolate a large tenant to its own shard (performance isolation)
SELECT citus_move_shard_placement(
(SELECT shardid FROM pg_dist_shard WHERE logicalrelid = 'orders'::regclass
AND shardminvalue::bigint = 42),
'worker2-ip', 5432,
'dedicated-worker-ip', 5432
);
Monitoring
-- Check shard distribution
SELECT nodename, count(*) AS shard_count
FROM citus_shards
GROUP BY nodename;
-- Active distributed queries
SELECT * FROM citus_stat_activity WHERE state = 'active';
-- Table sizes across the cluster
SELECT table_name, pg_size_pretty(citus_total_relation_size(table_name::regclass))
FROM information_schema.tables
WHERE table_schema = 'public';
Best Practices
- Choose a distribution column that appears in most queries — typically
tenant_idfor SaaS or a natural partition key - Include the distribution column in all primary keys and foreign keys
- Use reference tables for small dimension/lookup tables to enable efficient joins
- Use columnar storage for append-only analytics tables (10x compression)
- Start with 32-64 shards per table (Citus uses hash-based distribution)
- Co-locate related tables by using the same distribution column for efficient joins
- Monitor shard sizes and rebalance when distribution becomes skewed