Grafana Loki is a log aggregation system designed to be cost-effective and easy to operate. Unlike Elasticsearch, Loki indexes only metadata (labels), not the full text of log messages, making it significantly cheaper to run while still supporting powerful query capabilities through LogQL. This guide covers LogQL query patterns and setting up log-based alerting.
Installation
# Docker Compose
services:
loki:
image: grafana/loki:2.9.4
ports:
- "3100:3100"
volumes:
- loki_data:/loki
command: -config.file=/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:2.9.4
volumes:
- /var/log:/var/log:ro
- ./promtail-config.yml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
volumes:
loki_data:
LogQL Query Basics
# Stream selectors (filter by labels)
{job="nginx"}
{job="myapp", level="error"}
{job=~"api|web"} # Regex match
{job="myapp"} |= "timeout" # Line contains "timeout"
{job="myapp"} != "healthcheck" # Line does NOT contain
{job="myapp"} |~ "error|fatal" # Regex line filter
{job="myapp"} !~ "debug|trace" # Negative regex filter
Parsing and Filtering
# JSON log parsing
{job="myapp"} | json | level="error"
{job="myapp"} | json | status >= 500
{job="myapp"} | json | duration > 1000 | line_format "{{.method}} {{.path}} took {{.duration}}ms"
# Logfmt parsing
{job="myapp"} | logfmt | level="error" | duration > 5s
# Regex extraction
{job="nginx"} | regexp "(?P\d+\.\d+\.\d+\.\d+).*(?P\d{3})" | status="500"
# Pattern matching
{job="nginx"} | pattern " - - [] \" \" "
| status >= 400
Metric Queries
# Count errors per minute
count_over_time({job="myapp"} |= "error" [1m])
# Error rate
sum(rate({job="myapp"} |= "error" [5m])) by (service)
# Request rate from access logs
sum(rate({job="nginx"} [5m]))
# P99 latency from parsed logs
quantile_over_time(0.99, {job="myapp"} | json | unwrap duration [5m]) by (endpoint)
# Top 10 error messages
topk(10, sum(count_over_time({job="myapp"} | json | level="error" [1h])) by (message))
Alerting with Loki
# Loki ruler configuration for alerting
# /etc/loki/rules/alerts.yml
groups:
- name: application-alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate({job="myapp"} |= "error" [5m])) > 10
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "More than 10 errors per second for 5 minutes"
- alert: SlowRequests
expr: |
quantile_over_time(0.99, {job="myapp"} | json | unwrap duration [5m]) > 5000
for: 10m
labels:
severity: warning
annotations:
summary: "P99 latency exceeds 5 seconds"
- alert: OutOfMemoryErrors
expr: |
count_over_time({job="myapp"} |= "OutOfMemoryError" [15m]) > 0
labels:
severity: critical
annotations:
summary: "OOM error detected in application logs"
Grafana Dashboard Panels
# Log volume over time
sum(count_over_time({job="myapp"}[$__interval])) by (level)
# Visualization: time series, stacked bars
# Error log table
{job="myapp"} | json | level="error"
# Visualization: logs panel
# Request latency heatmap
{job="myapp"} | json | unwrap duration
# Visualization: heatmap
Promtail Configuration
# promtail-config.yml
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets: [localhost]
labels:
job: syslog
__path__: /var/log/syslog
- job_name: nginx
static_configs:
- targets: [localhost]
labels:
job: nginx
__path__: /var/log/nginx/access.log
- job_name: myapp
static_configs:
- targets: [localhost]
labels:
job: myapp
__path__: /var/log/myapp/*.log
pipeline_stages:
- json:
expressions:
level: level
- labels:
level:
Best Practices
- Design labels carefully — high-cardinality labels (user_id, request_id) should not be labels, use log parsing instead
- Use structured logging (JSON) in applications for easier querying with
| json - Keep label count low (5-10 labels max per stream) for optimal Loki performance
- Use
rate()andcount_over_time()for alerting rather than raw log queries - Set retention policies to manage storage costs
- Use Promtail pipeline stages to extract labels and structure data at ingestion time