Log-based alerting triggers notifications when specific patterns, error rates, or anomalies appear in your logs. Unlike metric-based alerting that tracks numerical values, log alerting detects events in unstructured or semi-structured text — security incidents, application errors, and infrastructure failures that might not be captured by metrics. This guide covers setting up log-based alerts with Grafana Loki.
Loki Ruler Configuration
# Enable the ruler in Loki config
# loki-config.yaml
ruler:
storage:
type: local
local:
directory: /loki/rules
rule_path: /tmp/rules
alertmanager_url: http://alertmanager:9093
ring:
kvstore:
store: inmemory
enable_api: true
enable_alertmanager_v2: true
Alert Rules
# /loki/rules/fake/alerts.yml
groups:
- name: application-errors
rules:
- alert: HighErrorRate
expr: |
sum(rate({job="myapp"} |= "ERROR" [5m])) > 5
for: 5m
labels:
severity: critical
team: backend
annotations:
summary: "High error rate in application logs"
description: "More than 5 errors/second for 5 minutes"
- alert: DatabaseConnectionErrors
expr: |
count_over_time({job="myapp"} |= "Connection refused" |= "database" [5m]) > 0
for: 1m
labels:
severity: critical
annotations:
summary: "Database connection errors detected"
- alert: OutOfMemory
expr: |
count_over_time({job="myapp"} |~ "OutOfMemory|OOM|Cannot allocate memory" [10m]) > 0
labels:
severity: critical
annotations:
summary: "Out of memory error detected"
- alert: AuthenticationFailures
expr: |
sum(rate({job="auth"} |= "authentication failed" [5m])) > 10
for: 2m
labels:
severity: warning
team: security
annotations:
summary: "High rate of authentication failures"
- alert: SSLCertificateExpiry
expr: |
count_over_time({job="nginx"} |= "SSL certificate" |= "expire" [1h]) > 0
labels:
severity: warning
annotations:
summary: "SSL certificate expiry warning in Nginx logs"
Grafana Alert Rules (Alternative)
# Instead of Loki ruler, create alerts directly in Grafana:
# 1. Alerting → Alert Rules → Create alert rule
# 2. Choose Loki as data source
# 3. Enter LogQL query
# 4. Set threshold conditions
# 5. Configure notification channels
# Example Grafana alert:
# Query: count_over_time({job="myapp"} |= "FATAL" [5m])
# Condition: WHEN last() OF query IS ABOVE 0
# Evaluate every: 1m
# For: 0m (alert immediately on FATAL errors)
Security Alerting
groups:
- name: security-alerts
rules:
- alert: SSHBruteForce
expr: |
sum(rate({job="syslog"} |= "Failed password" [5m])) > 5
for: 2m
labels:
severity: warning
team: security
- alert: SuspiciousCommand
expr: |
count_over_time({job="syslog"} |~ "wget.*\|.*sh|curl.*\|.*bash|base64.*decode" [10m]) > 0
labels:
severity: critical
team: security
annotations:
summary: "Suspicious command execution detected"
- alert: RootLogin
expr: |
count_over_time({job="syslog"} |= "session opened" |= "root" [5m]) > 0
labels:
severity: warning
team: security
- alert: FirewallDropSpike
expr: |
sum(rate({job="iptables"} |= "DROP" [5m])) > 100
for: 5m
labels:
severity: warning
Infrastructure Alerting
groups:
- name: infrastructure-alerts
rules:
- alert: DiskSpaceWarning
expr: |
count_over_time({job="syslog"} |= "No space left on device" [5m]) > 0
labels:
severity: critical
- alert: ServiceCrashLoop
expr: |
count_over_time({job="systemd"} |= "service entered failed state" [15m]) > 3
for: 1m
labels:
severity: critical
- alert: NginxErrors
expr: |
sum(rate({job="nginx"} | json | status >= 500 [5m])) > 10
for: 5m
labels:
severity: warning
Alertmanager Integration
# alertmanager.yml
route:
receiver: default
routes:
- match:
team: security
receiver: security-team
- match:
severity: critical
receiver: pagerduty
receivers:
- name: default
slack_configs:
- api_url: "https://hooks.slack.com/services/xxx"
- name: security-team
slack_configs:
- api_url: "https://hooks.slack.com/services/security"
- name: pagerduty
pagerduty_configs:
- routing_key: "your-key"
Best Practices
- Use log-based alerts for events that metrics cannot capture (specific error messages, security events)
- Set appropriate
fordurations — log spikes can be transient - Use
rate()overcount_over_time()for rate-based thresholds - Keep alert queries simple — complex LogQL queries can be expensive to evaluate every minute
- Combine log alerts with metric alerts for comprehensive monitoring coverage
- Route security alerts to a dedicated channel for immediate attention
- Test alert rules by generating matching log entries in a staging environment