Implement Connection Draining for Zero-Downtime Deployments

By Admin · Mar 15, 2026 · Updated Apr 23, 2026 · 205 views · 3 min read

Connection draining (graceful shutdown) ensures that in-flight requests complete before a server is taken offline for deployments or maintenance. Without it, deployments cause errors for users with active connections. This guide covers implementing connection draining with Nginx and application servers.

How Connection Draining Works

Stop sending NEW requests to the server being drained
Wait for existing/in-flight requests to complete
After all connections close (or timeout), shut down the server
Deploy the update
Bring the server back and resume traffic

Nginx Upstream Health Checks and Draining

# Define upstream with multiple backends
upstream backend {
    server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:8081 max_fails=3 fail_timeout=30s;
    # Optional: backup server
    server 127.0.0.1:8082 backup;
}

server {
    listen 443 ssl http2;
    server_name example.com;

    location / {
        proxy_pass http://backend;
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_timeout 10s;
        proxy_next_upstream_tries 2;

        proxy_connect_timeout 5s;
        proxy_read_timeout 60s;
        proxy_send_timeout 60s;
    }
}

# Mark a server as "down" for draining:
# upstream backend {
#     server 127.0.0.1:8080 down;   # No new connections
#     server 127.0.0.1:8081;        # Receives all traffic
# }
# Then reload: nginx -s reload

Application-Level Graceful Shutdown

# Node.js graceful shutdown
const server = app.listen(8080);

let isShuttingDown = false;

process.on('SIGTERM', () => {
    console.log('SIGTERM received, starting graceful shutdown...');
    isShuttingDown = true;

    // Stop accepting new connections
    server.close(() => {
        console.log('All connections closed, exiting.');
        process.exit(0);
    });

    // Force exit after 30 seconds
    setTimeout(() => {
        console.error('Forced shutdown after timeout');
        process.exit(1);
    }, 30000);
});

// Health check endpoint
app.get('/health', (req, res) => {
    if (isShuttingDown) {
        res.status(503).json({ status: 'shutting down' });
    } else {
        res.status(200).json({ status: 'ok' });
    }
});

# Python (Gunicorn) — graceful shutdown is built-in
gunicorn app:app \
    --workers 4 \
    --graceful-timeout 30 \
    --timeout 60

# Send SIGTERM for graceful shutdown
# Workers finish current requests, then exit
kill -TERM $(cat /run/gunicorn.pid)

# Go — graceful shutdown
func main() {
    srv := &http.Server{Addr: ":8080"}

    go func() {
        sigChan := make(chan os.Signal, 1)
        signal.Notify(sigChan, syscall.SIGTERM)
         /dev/null; do
    sleep 1
done

# Step 5: Bring server 1 back, drain server 2
echo "Switching traffic..."
sudo sed -i "s/server 127.0.0.1:$APP_PORT_1 down;/server 127.0.0.1:$APP_PORT_1;/" /etc/nginx/conf.d/upstream.conf
sudo sed -i "s/server 127.0.0.1:$APP_PORT_2;/server 127.0.0.1:$APP_PORT_2 down;/" /etc/nginx/conf.d/upstream.conf
sudo nginx -s reload
sleep $DRAIN_TIMEOUT

# Step 6: Deploy to server 2
echo "Deploying to server 2..."
sudo systemctl stop app-2
# ... deploy code ...
sudo systemctl start app-2

until curl -sf http://127.0.0.1:$APP_PORT_2/health > /dev/null; do
    sleep 1
done

# Step 7: Bring server 2 back
sudo sed -i "s/server 127.0.0.1:$APP_PORT_2 down;/server 127.0.0.1:$APP_PORT_2;/" /etc/nginx/conf.d/upstream.conf
sudo nginx -s reload

echo "=== Deployment complete ==="

systemd Graceful Stop

# Configure systemd to send SIGTERM and wait
[Service]
ExecStop=/bin/kill -TERM $MAINPID
TimeoutStopSec=30
# systemd sends SIGTERM, waits 30s, then SIGKILL

Best Practices

Implement SIGTERM handlers in your application for graceful shutdown
Use health check endpoints that return 503 during shutdown
Set appropriate timeouts: Drain time should exceed your longest request duration
Use proxy_next_upstream so Nginx retries failed requests on healthy backends
Automate the process: Manual draining is error-prone — script it
Test your deployment process under load to verify zero errors during rollout