Diagnose Intermittent Server Unreachability

By Admin · Mar 15, 2026 · Updated Jun 17, 2026 · 308 views · 4 min read

Intermittent connectivity issues are among the hardest to troubleshoot because the problem isn't present when you're investigating. The server works most of the time but occasionally becomes unreachable for seconds or minutes. This guide covers techniques for catching and diagnosing these elusive issues.

Potential Causes

Network packet loss along the path
CPU/memory spikes causing the network stack to drop packets
conntrack table full — dropping new connections
NIC driver issues or hardware problems
DDoS or traffic spikes
Upstream network issues (ISP, datacenter)

Continuous Monitoring Setup

# MTR — continuous path analysis (run from external server)
mtr --report-wide --report-cycles 1000 your-server-ip

# Ping with timestamps
ping -D -i 1 your-server-ip | tee /tmp/ping-monitor.log
# -D adds timestamps, -i 1 pings every second

# Parse ping log for packet loss
awk '/time=/ {print $1, $NF}' /tmp/ping-monitor.log | tail -20
grep "Request timeout\|unreachable" /tmp/ping-monitor.log

# Automated monitoring script
#!/bin/bash
# /opt/scripts/connectivity-monitor.sh
TARGET="your-server-ip"
LOG="/var/log/connectivity-$(date +%Y%m%d).log"

while true; do
    timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    result=$(ping -c 1 -W 2 "$TARGET" 2>&1)
    if [ $? -ne 0 ]; then
        echo "$timestamp FAIL: $result" >> "$LOG"
        # Additional diagnostics during failure
        echo "$timestamp --- traceroute ---" >> "$LOG"
        traceroute -n -w 2 -q 1 "$TARGET" >> "$LOG" 2>&1
    fi
    sleep 5
done

Server-Side Investigation

# Check for dropped packets at the NIC level
ip -s link show eth0
# Look for: RX errors, dropped, overrun, frame

# Check network interface errors over time
watch -n 1 "cat /proc/net/dev | grep eth0"

# Check conntrack table (connection tracking)
cat /proc/sys/net/netfilter/nf_conntrack_count
cat /proc/sys/net/netfilter/nf_conntrack_max
# If count approaches max, connections get dropped!

# Fix: Increase conntrack max
echo 262144 | sudo tee /proc/sys/net/netfilter/nf_conntrack_max
echo "net.netfilter.nf_conntrack_max = 262144" | sudo tee -a /etc/sysctl.d/99-conntrack.conf

# Check for SYN flood protection triggering
dmesg | grep "SYN flooding"
cat /proc/sys/net/ipv4/tcp_max_syn_backlog

# Check network buffer overflows
netstat -s | grep -i "overflow\|drop\|pruned"
ss -s  # Socket statistics summary

# Check for software interrupts overload
cat /proc/interrupts | grep eth
cat /proc/softirqs | head -5

CPU and Memory Correlation

# Check if unreachability correlates with resource spikes
# Install sar for historical data
sudo apt install sysstat

# View historical CPU usage (look for 100% spikes)
sar -u -f /var/log/sysstat/sa$(date +%d) | grep -v "Average"

# View historical memory usage
sar -r -f /var/log/sysstat/sa$(date +%d)

# Check OOM killer activity
journalctl | grep -i "oom\|out of memory"
dmesg | grep -i "oom\|killed process"

# Check for IO wait causing unresponsiveness
iostat -x 1 10  # Monitor disk IO
# High %iowait = disk bottleneck can make server unresponsive

Network Stack Tuning

# Increase SYN backlog
echo "net.ipv4.tcp_max_syn_backlog = 65535" | sudo tee -a /etc/sysctl.d/99-network.conf

# Increase somaxconn (listen backlog)
echo "net.core.somaxconn = 65535" | sudo tee -a /etc/sysctl.d/99-network.conf

# Increase network buffer sizes
echo "net.core.rmem_max = 16777216" | sudo tee -a /etc/sysctl.d/99-network.conf
echo "net.core.wmem_max = 16777216" | sudo tee -a /etc/sysctl.d/99-network.conf
echo "net.core.netdev_max_backlog = 5000" | sudo tee -a /etc/sysctl.d/99-network.conf

# Enable TCP timestamps and window scaling
echo "net.ipv4.tcp_timestamps = 1" | sudo tee -a /etc/sysctl.d/99-network.conf
echo "net.ipv4.tcp_window_scaling = 1" | sudo tee -a /etc/sysctl.d/99-network.conf

sudo sysctl -p /etc/sysctl.d/99-network.conf

Best Practices

Set up continuous monitoring from an external location to catch intermittent failures
Check conntrack table — a full table silently drops new connections
Correlate with resource usage: CPU, memory, and disk I/O spikes often cause network unreachability
Check NIC error counters: Rising dropped/error counts indicate hardware or driver issues
Use MTR for path analysis — it shows both latency and packet loss per hop
Collect data during outages: Automated scripts that run diagnostics when failures are detected