Write-heavy PostgreSQL workloads — such as IoT data ingestion, logging pipelines, and high-volume transactional systems — require careful tuning of storage, WAL configuration, checkpoints, and connection management. This guide covers the key parameters and techniques to maximize PostgreSQL write throughput while maintaining durability guarantees.
Understanding PostgreSQL Write Path
Every write in PostgreSQL follows this path:
- Write-ahead log (WAL) entry is created in WAL buffers
- WAL buffers are flushed to WAL files on disk (fsync)
- Data is modified in shared buffers (memory)
- Background writer and checkpointer flush dirty pages to data files
Bottlenecks can occur at any stage. The most common are WAL fsync latency, checkpoint I/O spikes, and lock contention.
Storage Configuration
Storage is the foundation of write performance. Optimal configuration:
# Use separate disks for WAL and data (most impactful single change)
# In postgresql.conf
# Move WAL to a dedicated fast SSD/NVMe
# Symlink or set in config:
# pg_wal directory on fast NVMe, data directory on separate NVMe
# Filesystem recommendations:
# - ext4 or XFS (XFS preferred for large files)
# - Mount with: noatime,nodiratime,discard (for SSDs)
# - Disable filesystem barriers if battery-backed cache: barrier=0
# Example /etc/fstab entries
/dev/nvme0n1p1 /var/lib/postgresql/data xfs defaults,noatime,nodiratime 0 2
/dev/nvme1n1p1 /var/lib/postgresql/wal xfs defaults,noatime,nodiratime 0 2
WAL Tuning
# postgresql.conf — WAL settings for write performance
# Increase WAL buffers (default 16MB on most systems)
wal_buffers = 64MB
# WAL compression reduces I/O at the cost of CPU
wal_compression = lz4 # PostgreSQL 15+, or 'on' for older versions
# WAL level — use 'replica' unless you need logical replication
wal_level = replica
# Commit behavior — CRITICAL for write throughput
# synchronous_commit = on # Safest: wait for WAL fsync (default)
# synchronous_commit = off # Fastest: risk losing last ~600ms of commits on crash
# synchronous_commit = local # Compromise for replicated setups
# For maximum throughput with acceptable risk:
synchronous_commit = off # ~3-5x write throughput increase
# Increase WAL size to reduce WAL file recycling
max_wal_size = 4GB # Default 1GB, increase for write-heavy loads
min_wal_size = 1GB
# Redo log capacity (PostgreSQL 15+)
# wal_recycle = on # Reuse WAL files instead of creating new ones
Checkpoint Tuning
Checkpoints flush all dirty buffers to disk, causing I/O spikes. Spreading this out reduces latency variance:
# Spread checkpoint I/O over longer periods
checkpoint_timeout = 15min # Default 5min, increase to 15-30min
checkpoint_completion_target = 0.9 # Spread checkpoint over 90% of the interval
# Increase WAL size to allow longer checkpoint intervals
max_wal_size = 8GB # Must be large enough to hold WAL between checkpoints
Shared Buffers and Memory
# Shared buffers — main data cache
shared_buffers = 8GB # 25% of RAM for dedicated database server
# Effective cache size — tells planner about OS cache
effective_cache_size = 24GB # 75% of total RAM
# Work memory for sorts and hash joins
work_mem = 64MB # Per-operation, be careful with high connection counts
# Maintenance work memory for VACUUM, CREATE INDEX
maintenance_work_mem = 2GB
Background Writer Tuning
# Background writer flushes dirty buffers proactively
bgwriter_delay = 50ms # Check every 50ms (default 200ms)
bgwriter_lru_maxpages = 400 # Max pages per round (default 100)
bgwriter_lru_multiplier = 4.0 # Multiplier for recent buffer needs (default 2.0)
bgwriter_flush_after = 512kB # Force OS flush after this many writes
Bulk Loading Optimization
For initial data loads or large batch inserts:
-- Disable indexes and constraints during load
ALTER TABLE large_table DISABLE TRIGGER ALL;
DROP INDEX idx_large_table_col1; -- Recreate after load
-- Use COPY instead of INSERT (10-100x faster)
COPY large_table FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER true);
-- Or from application
COPY large_table (col1, col2, col3) FROM STDIN WITH (FORMAT csv);
-- After loading
ALTER TABLE large_table ENABLE TRIGGER ALL;
CREATE INDEX CONCURRENTLY idx_large_table_col1 ON large_table (col1);
ANALYZE large_table;
Unlogged Tables for Temporary Data
-- Unlogged tables skip WAL — much faster writes but not crash-safe
CREATE UNLOGGED TABLE staging_data (
id SERIAL,
raw_json JSONB,
processed BOOLEAN DEFAULT false
);
-- Good for: staging tables, caches, session data, temp analytics
-- Bad for: anything you cannot afford to lose on crash
Connection and Concurrency
# Use a connection pooler (PgBouncer) for high connection counts
# PgBouncer in transaction mode allows hundreds of app connections
# with only 20-50 actual PostgreSQL connections
# PostgreSQL connection settings
max_connections = 200 # Keep low, use PgBouncer
max_parallel_workers_per_gather = 4
max_parallel_maintenance_workers = 4
VACUUM and Autovacuum Tuning
# Aggressive autovacuum for write-heavy tables
autovacuum_max_workers = 6 # Default 3
autovacuum_naptime = 30s # Check more frequently
autovacuum_vacuum_threshold = 50 # Trigger earlier
autovacuum_vacuum_scale_factor = 0.05 # 5% of table changed (default 20%)
autovacuum_vacuum_cost_delay = 2ms # Less throttling (default 2ms in PG 12+)
autovacuum_vacuum_cost_limit = 1000 # More work per round (default 200)
Partitioning for Write Performance
-- Partition large tables by time range
CREATE TABLE events (
id BIGINT GENERATED ALWAYS AS IDENTITY,
event_type VARCHAR(50),
payload JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE events_2025_02 PARTITION OF events
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- Benefits: smaller indexes per partition, faster VACUUM,
-- easy data archival by dropping old partitions
Benchmarking Write Performance
# Use pgbench for write testing
pgbench -i -s 100 testdb # Initialize with scale factor 100
# Run write-heavy benchmark
pgbench -c 32 -j 8 -T 300 -P 10 testdb
# -c: connections, -j: threads, -T: duration, -P: progress interval
# Custom write-heavy script
cat > insert_test.sql <<EOF
INSERT INTO test_table (data, created_at)
VALUES (md5(random()::text), now());
EOF
pgbench -c 32 -j 8 -T 60 -f insert_test.sql testdb
Summary Checklist
- Separate WAL and data onto different NVMe drives for the biggest single improvement
- Set
synchronous_commit = offif you can tolerate losing the last ~600ms of transactions - Increase
checkpoint_timeoutandmax_wal_sizeto reduce checkpoint frequency - Use
COPYfor bulk inserts, not individualINSERTstatements - Use PgBouncer for connection pooling to reduce per-connection overhead
- Tune autovacuum aggressively for write-heavy tables
- Partition time-series data for better write and maintenance performance
- Monitor with
pg_stat_bgwriterandpg_stat_walto verify tuning effectiveness