How to Debug Kubernetes Pod Failures
Pod failures are one of the most common issues in Kubernetes. Whether your pods are stuck in CrashLoopBackOff, ImagePullBackOff, or Pending state, a systematic debugging approach saves hours of frustration. This guide covers the essential techniques for diagnosing pod issues on your Breeze cluster.
Step 1: Check Pod Status
Start by listing pods and their statuses:
kubectl get pods -o wide
kubectl get pods --field-selector=status.phase!=Running
Step 2: Describe the Pod
The describe command shows events, conditions, and configuration details:
kubectl describe pod failing-pod-xyz
Look for key sections:
- Events — reveals scheduling failures, image pull errors, and probe failures
- Conditions — shows PodScheduled, Initialized, ContainersReady, Ready
- State — for each container, shows Waiting/Running/Terminated with reason codes
Step 3: Check Container Logs
# Current container logs
kubectl logs failing-pod-xyz
# Previous container logs (if restarted)
kubectl logs failing-pod-xyz --previous
# Logs from a specific container in a multi-container pod
kubectl logs failing-pod-xyz -c sidecar-container
# Follow logs in real time
kubectl logs -f failing-pod-xyz
Common Failure Scenarios
CrashLoopBackOff
The container starts, crashes, and Kubernetes keeps restarting it with exponential backoff:
# Check exit code and reason
kubectl get pod failing-pod-xyz -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
Common causes: application error, missing environment variable, incorrect command, out-of-memory kill.
ImagePullBackOff
Kubernetes cannot pull the container image:
# Verify the image exists and tag is correct
# Check imagePullSecrets if using a private registry
kubectl get pod failing-pod-xyz -o jsonpath='{.spec.containers[0].image}'
Pending State
The pod cannot be scheduled. Use describe to see why:
# Common reasons:
# - Insufficient CPU/memory (scale up your Breeze nodes)
# - No matching node selectors or tolerations
# - PVC not bound (storage class issue)
Step 4: Interactive Debugging
Exec into a running container to inspect its environment:
kubectl exec -it running-pod-xyz -- /bin/sh
# If the container keeps crashing, use an ephemeral debug container
kubectl debug failing-pod-xyz -it --image=busybox --target=app-container
Step 5: Check Resource Usage
kubectl top pod failing-pod-xyz
kubectl top nodes
If a pod is being OOMKilled, increase its memory limits or optimize the application.
Step 6: Examine Events Cluster-Wide
kubectl get events --sort-by='.lastTimestamp' --field-selector type=Warning
Debugging Checklist
- Verify the container image exists and is accessible
- Check environment variables and ConfigMaps/Secrets are correctly mounted
- Confirm resource requests do not exceed Breeze node capacity
- Validate liveness and readiness probe configurations
- Inspect PVC binding if persistent storage is involved
- Review RBAC if the pod needs to access the Kubernetes API