A kernel panic is Linux's equivalent of a Blue Screen of Death — the kernel encounters a fatal error it cannot recover from and halts the system. This guide covers understanding kernel panic messages, identifying root causes, and recovering your server.
What Causes Kernel Panics?
- Hardware failures: Bad RAM, failing disk, CPU overheating
- Kernel bugs: Particularly after kernel updates
- Corrupted filesystem: Root filesystem damage
- Out of memory: OOM killer can't free enough memory
- Bad kernel modules: Incompatible or buggy driver modules
- Misconfigured boot parameters: Wrong root device, missing initramfs
Reading Kernel Panic Messages
# View previous kernel panic logs (if system booted)
journalctl -b -1 -p emerg # Previous boot emergency messages
journalctl -b -1 -k # Previous boot kernel messages
dmesg | grep -i "panic\|oops\|bug\|error"
# Common panic messages:
# "Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block"
# → Root filesystem not found. Check GRUB config, fstab, or disk failure.
# "Kernel panic - not syncing: Out of memory and no killable processes"
# → System completely exhausted memory and swap
# "BUG: unable to handle kernel NULL pointer dereference"
# → Kernel bug, likely in a driver or module
# "Kernel panic - not syncing: Fatal exception in interrupt"
# → Hardware issue or driver bug during interrupt handling
Recovery Steps
1. Boot to Previous Kernel
# If panic happens after kernel update, boot older kernel via GRUB
# At GRUB menu, select "Advanced options" then choose the previous kernel
# Or via VPS console/rescue mode:
# Edit GRUB to boot old kernel
sudo vim /etc/default/grub
# Set: GRUB_DEFAULT="1>2" (selects the second kernel in submenu)
sudo update-grub
# List available kernels
dpkg --list | grep linux-image
# or
rpm -qa | grep kernel
2. Rescue Mode Filesystem Repair
# Boot into rescue mode (VPS provider console)
# Mount the root filesystem
mount /dev/vda1 /mnt
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
chroot /mnt
# Run filesystem check
fsck -y /dev/vda1
# Rebuild initramfs
update-initramfs -u -k all # Debian/Ubuntu
dracut --force # RHEL/Rocky
# Reinstall kernel if corrupted
apt install --reinstall linux-image-$(uname -r) # Debian/Ubuntu
dnf reinstall kernel # RHEL/Rocky
# Update GRUB
update-grub # Debian/Ubuntu
grub2-mkconfig -o /boot/grub2/grub.cfg # RHEL/Rocky
3. Check for Hardware Issues
# Check memory for errors
sudo memtest86+ # Run from boot menu
# Check disk health
sudo smartctl -a /dev/sda
sudo smartctl -t short /dev/sda # Run short self-test
# Check for I/O errors in logs
dmesg | grep -i "error\|i/o\|sector\|medium"
journalctl -b -1 | grep -i "ata\|scsi\|error"
# Check temperature (if applicable)
sensors # requires lm-sensors package
4. Fix OOM-Related Panics
# Check if OOM killer was involved
journalctl -b -1 | grep -i "oom\|out of memory"
dmesg | grep -i "oom"
# Increase swap
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
# Configure OOM behavior
# Prevent kernel panic on OOM (let OOM killer work)
echo 0 | sudo tee /proc/sys/vm/panic_on_oom
echo "vm.panic_on_oom = 0" | sudo tee -a /etc/sysctl.d/99-oom.conf
# Protect critical services from OOM killer
echo -1000 | sudo tee /proc/$(pidof sshd)/oom_score_adj
5. Debug Kernel Module Issues
# List loaded modules
lsmod
# Check for recently added/changed modules
journalctl -b -1 | grep -i "module\|insmod\|modprobe"
# Blacklist problematic module
echo "blacklist problematic_module" | sudo tee /etc/modprobe.d/blacklist-custom.conf
sudo update-initramfs -u
# Boot with minimal modules
# Add to GRUB: modprobe.blacklist=module1,module2
Prevention
# Enable automatic reboot after panic
echo "kernel.panic = 10" | sudo tee -a /etc/sysctl.d/99-panic.conf
sudo sysctl -p /etc/sysctl.d/99-panic.conf
# System will reboot 10 seconds after a panic
# Enable kdump for panic analysis
sudo apt install kdump-tools # Debian/Ubuntu
sudo dnf install kexec-tools # RHEL/Rocky
# Captures memory dump on panic for post-mortem analysis
# Set up monitoring
# Alert on high memory, disk errors, and temperature
Best Practices
- Keep a working kernel: Don't remove the previous kernel after updates until the new one is verified
- Enable panic reboot:
kernel.panic = 10so the server recovers automatically - Monitor hardware health: SMART disk checks and memory monitoring prevent hardware-related panics
- Maintain adequate swap: Prevents OOM-related kernel panics
- Test kernel updates in staging before applying to production servers
- Know your rescue boot process before you need it in an emergency