Docs / Monitoring & Logging / Log-Based Alerting with Grafana Loki

Log-Based Alerting with Grafana Loki

By Admin · Mar 15, 2026 · Updated Apr 24, 2026 · 428 views · 3 min read

Log-based alerting triggers notifications when specific patterns, error rates, or anomalies appear in your logs. Unlike metric-based alerting that tracks numerical values, log alerting detects events in unstructured or semi-structured text — security incidents, application errors, and infrastructure failures that might not be captured by metrics. This guide covers setting up log-based alerts with Grafana Loki.

Loki Ruler Configuration

# Enable the ruler in Loki config
# loki-config.yaml
ruler:
  storage:
    type: local
    local:
      directory: /loki/rules
  rule_path: /tmp/rules
  alertmanager_url: http://alertmanager:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true
  enable_alertmanager_v2: true

Alert Rules

# /loki/rules/fake/alerts.yml
groups:
  - name: application-errors
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({job="myapp"} |= "ERROR" [5m])) > 5
        for: 5m
        labels:
          severity: critical
          team: backend
        annotations:
          summary: "High error rate in application logs"
          description: "More than 5 errors/second for 5 minutes"

      - alert: DatabaseConnectionErrors
        expr: |
          count_over_time({job="myapp"} |= "Connection refused" |= "database" [5m]) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Database connection errors detected"

      - alert: OutOfMemory
        expr: |
          count_over_time({job="myapp"} |~ "OutOfMemory|OOM|Cannot allocate memory" [10m]) > 0
        labels:
          severity: critical
        annotations:
          summary: "Out of memory error detected"

      - alert: AuthenticationFailures
        expr: |
          sum(rate({job="auth"} |= "authentication failed" [5m])) > 10
        for: 2m
        labels:
          severity: warning
          team: security
        annotations:
          summary: "High rate of authentication failures"

      - alert: SSLCertificateExpiry
        expr: |
          count_over_time({job="nginx"} |= "SSL certificate" |= "expire" [1h]) > 0
        labels:
          severity: warning
        annotations:
          summary: "SSL certificate expiry warning in Nginx logs"

Grafana Alert Rules (Alternative)

# Instead of Loki ruler, create alerts directly in Grafana:
# 1. Alerting → Alert Rules → Create alert rule
# 2. Choose Loki as data source
# 3. Enter LogQL query
# 4. Set threshold conditions
# 5. Configure notification channels

# Example Grafana alert:
# Query: count_over_time({job="myapp"} |= "FATAL" [5m])
# Condition: WHEN last() OF query IS ABOVE 0
# Evaluate every: 1m
# For: 0m (alert immediately on FATAL errors)

Security Alerting

groups:
  - name: security-alerts
    rules:
      - alert: SSHBruteForce
        expr: |
          sum(rate({job="syslog"} |= "Failed password" [5m])) > 5
        for: 2m
        labels:
          severity: warning
          team: security

      - alert: SuspiciousCommand
        expr: |
          count_over_time({job="syslog"} |~ "wget.*\|.*sh|curl.*\|.*bash|base64.*decode" [10m]) > 0
        labels:
          severity: critical
          team: security
        annotations:
          summary: "Suspicious command execution detected"

      - alert: RootLogin
        expr: |
          count_over_time({job="syslog"} |= "session opened" |= "root" [5m]) > 0
        labels:
          severity: warning
          team: security

      - alert: FirewallDropSpike
        expr: |
          sum(rate({job="iptables"} |= "DROP" [5m])) > 100
        for: 5m
        labels:
          severity: warning

Infrastructure Alerting

groups:
  - name: infrastructure-alerts
    rules:
      - alert: DiskSpaceWarning
        expr: |
          count_over_time({job="syslog"} |= "No space left on device" [5m]) > 0
        labels:
          severity: critical

      - alert: ServiceCrashLoop
        expr: |
          count_over_time({job="systemd"} |= "service entered failed state" [15m]) > 3
        for: 1m
        labels:
          severity: critical

      - alert: NginxErrors
        expr: |
          sum(rate({job="nginx"} | json | status >= 500 [5m])) > 10
        for: 5m
        labels:
          severity: warning

Alertmanager Integration

# alertmanager.yml
route:
  receiver: default
  routes:
    - match:
        team: security
      receiver: security-team
    - match:
        severity: critical
      receiver: pagerduty

receivers:
  - name: default
    slack_configs:
      - api_url: "https://hooks.slack.com/services/xxx"
  - name: security-team
    slack_configs:
      - api_url: "https://hooks.slack.com/services/security"
  - name: pagerduty
    pagerduty_configs:
      - routing_key: "your-key"

Best Practices

  • Use log-based alerts for events that metrics cannot capture (specific error messages, security events)
  • Set appropriate for durations — log spikes can be transient
  • Use rate() over count_over_time() for rate-based thresholds
  • Keep alert queries simple — complex LogQL queries can be expensive to evaluate every minute
  • Combine log alerts with metric alerts for comprehensive monitoring coverage
  • Route security alerts to a dedicated channel for immediate attention
  • Test alert rules by generating matching log entries in a staging environment

Was this article helpful?