Why Test Backup Integrity?
A backup that cannot be restored is not a backup. Studies show that up to 30% of restores fail due to corruption, configuration drift, or incomplete data. Automated restore testing ensures your backups are recoverable before you need them in an emergency.
The Verification Pipeline
- Restore — Restore the latest backup to a test environment
- Validate — Check data integrity, row counts, and checksums
- Report — Log results and alert on failures
- Cleanup — Tear down the test environment
MySQL Backup Verification
#!/bin/bash
# /usr/local/bin/verify-mysql-backup.sh
set -euo pipefail
BACKUP_DIR="/var/backups/mysql"
TEST_DB="backup_verify_test"
LOG="/var/log/backup-verify.log"
log() { echo "[$(date)] $1" >> "$LOG"; }
LATEST=$(ls -t "$BACKUP_DIR"/*.sql.gz 2>/dev/null | head -1)
if [ -z "$LATEST" ]; then
log "ERROR: No backup files found"
exit 1
fi
BACKUP_AGE=$(( ($(date +%s) - $(stat -c %Y "$LATEST")) / 3600 ))
log "Testing: $LATEST (${BACKUP_AGE}h old)"
mysql -e "DROP DATABASE IF EXISTS $TEST_DB; CREATE DATABASE $TEST_DB;"
RESTORE_START=$(date +%s)
if zcat "$LATEST" | mysql "$TEST_DB" 2>>"$LOG"; then
RESTORE_TIME=$(( $(date +%s) - RESTORE_START ))
log "Restore OK in ${RESTORE_TIME}s"
else
log "ERROR: Restore failed"
mysql -e "DROP DATABASE IF EXISTS $TEST_DB;"
exit 1
fi
ERRORS=0
TABLE_COUNT=$(mysql -N -e "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='$TEST_DB';")
[ "$TABLE_COUNT" -lt 5 ] && ERRORS=$((ERRORS + 1))
for TABLE in users orders products; do
COUNT=$(mysql -N -e "SELECT COUNT(*) FROM \`$TEST_DB\`.\`$TABLE\`;" 2>/dev/null || echo "0")
if [ "$COUNT" -eq 0 ]; then
log "ERROR: Table $TABLE is empty"
ERRORS=$((ERRORS + 1))
else
log "OK: $TABLE has $COUNT rows"
fi
done
mysql -e "DROP DATABASE IF EXISTS $TEST_DB;"
if [ "$ERRORS" -eq 0 ]; then
log "SUCCESS: $TABLE_COUNT tables verified in ${RESTORE_TIME}s"
curl -fsS "https://hc-ping.com/YOUR-UUID" >/dev/null 2>&1
else
log "FAILED: $ERRORS errors"
exit 1
fi
File Backup Verification
#!/bin/bash
RESTIC_REPO="rclone:b2-backup:my-server-backup"
export RESTIC_PASSWORD_FILE="/root/.restic-password"
VERIFY_DIR="/tmp/backup-verify-$$"
# Check repository integrity
restic -r "$RESTIC_REPO" check --read-data-subset=5%
# Test restore of critical files
mkdir -p "$VERIFY_DIR"
restic -r "$RESTIC_REPO" restore latest --target "$VERIFY_DIR" \
--include "/etc/nginx"
for f in /etc/nginx/nginx.conf; do
if [ -f "$VERIFY_DIR$f" ] && [ -s "$VERIFY_DIR$f" ]; then
echo "OK: $f restored"
else
echo "ERROR: $f missing"
fi
done
rm -rf "$VERIFY_DIR"
Scheduling
# Daily at 6 AM (after 2 AM backup)
0 6 * * * /usr/local/bin/verify-mysql-backup.sh
30 6 * * * /usr/local/bin/verify-file-backup.sh
Prometheus Metrics
cat > /var/lib/prometheus/node-exporter/backup_verify.prom << EOF
# HELP backup_verify_success Last verification result
# TYPE backup_verify_success gauge
backup_verify_success{type="mysql"} 1
# HELP backup_verify_duration_seconds Time to verify
# TYPE backup_verify_duration_seconds gauge
backup_verify_duration_seconds{type="mysql"} ${RESTORE_TIME}
EOF
Best Practices
- Run verification daily for critical databases, weekly for less critical data
- Always test on a separate database, never overwrite production
- Verify both integrity and restore performance
- Alert on failures AND missed verification runs
- Keep verification logs for audit trails
- Do full manual restores periodically to test complete procedures
- Track restore time trends to spot growing backup sizes