Connection draining (graceful shutdown) ensures that in-flight requests complete before a server is taken offline for deployments or maintenance. Without it, deployments cause errors for users with active connections. This guide covers implementing connection draining with Nginx and application servers.
How Connection Draining Works
- Stop sending NEW requests to the server being drained
- Wait for existing/in-flight requests to complete
- After all connections close (or timeout), shut down the server
- Deploy the update
- Bring the server back and resume traffic
Nginx Upstream Health Checks and Draining
# Define upstream with multiple backends
upstream backend {
server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;
server 127.0.0.1:8081 max_fails=3 fail_timeout=30s;
# Optional: backup server
server 127.0.0.1:8082 backup;
}
server {
listen 443 ssl http2;
server_name example.com;
location / {
proxy_pass http://backend;
proxy_next_upstream error timeout http_502 http_503;
proxy_next_upstream_timeout 10s;
proxy_next_upstream_tries 2;
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
}
}
# Mark a server as "down" for draining:
# upstream backend {
# server 127.0.0.1:8080 down; # No new connections
# server 127.0.0.1:8081; # Receives all traffic
# }
# Then reload: nginx -s reload
Application-Level Graceful Shutdown
# Node.js graceful shutdown
const server = app.listen(8080);
let isShuttingDown = false;
process.on('SIGTERM', () => {
console.log('SIGTERM received, starting graceful shutdown...');
isShuttingDown = true;
// Stop accepting new connections
server.close(() => {
console.log('All connections closed, exiting.');
process.exit(0);
});
// Force exit after 30 seconds
setTimeout(() => {
console.error('Forced shutdown after timeout');
process.exit(1);
}, 30000);
});
// Health check endpoint
app.get('/health', (req, res) => {
if (isShuttingDown) {
res.status(503).json({ status: 'shutting down' });
} else {
res.status(200).json({ status: 'ok' });
}
});
# Python (Gunicorn) — graceful shutdown is built-in
gunicorn app:app \
--workers 4 \
--graceful-timeout 30 \
--timeout 60
# Send SIGTERM for graceful shutdown
# Workers finish current requests, then exit
kill -TERM $(cat /run/gunicorn.pid)
# Go — graceful shutdown
func main() {
srv := &http.Server{Addr: ":8080"}
go func() {
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGTERM)
/dev/null; do
sleep 1
done
# Step 5: Bring server 1 back, drain server 2
echo "Switching traffic..."
sudo sed -i "s/server 127.0.0.1:$APP_PORT_1 down;/server 127.0.0.1:$APP_PORT_1;/" /etc/nginx/conf.d/upstream.conf
sudo sed -i "s/server 127.0.0.1:$APP_PORT_2;/server 127.0.0.1:$APP_PORT_2 down;/" /etc/nginx/conf.d/upstream.conf
sudo nginx -s reload
sleep $DRAIN_TIMEOUT
# Step 6: Deploy to server 2
echo "Deploying to server 2..."
sudo systemctl stop app-2
# ... deploy code ...
sudo systemctl start app-2
until curl -sf http://127.0.0.1:$APP_PORT_2/health > /dev/null; do
sleep 1
done
# Step 7: Bring server 2 back
sudo sed -i "s/server 127.0.0.1:$APP_PORT_2 down;/server 127.0.0.1:$APP_PORT_2;/" /etc/nginx/conf.d/upstream.conf
sudo nginx -s reload
echo "=== Deployment complete ==="
systemd Graceful Stop
# Configure systemd to send SIGTERM and wait
[Service]
ExecStop=/bin/kill -TERM $MAINPID
TimeoutStopSec=30
# systemd sends SIGTERM, waits 30s, then SIGKILL
Best Practices
- Implement SIGTERM handlers in your application for graceful shutdown
- Use health check endpoints that return 503 during shutdown
- Set appropriate timeouts: Drain time should exceed your longest request duration
- Use
proxy_next_upstreamso Nginx retries failed requests on healthy backends - Automate the process: Manual draining is error-prone — script it
- Test your deployment process under load to verify zero errors during rollout