PagerDuty Integration with Prometheus Alertmanager

By Admin · Mar 15, 2026 · Updated Apr 23, 2026 · 239 views · 3 min read

PagerDuty integration with Prometheus Alertmanager creates a robust on-call alerting pipeline — Prometheus detects issues, Alertmanager routes and deduplicates alerts, and PagerDuty handles escalation, on-call scheduling, and notification delivery. This guide covers configuring the complete alerting pipeline.

Alertmanager Configuration

# /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  pagerduty_url: "https://events.pagerduty.com/v2/enqueue"

route:
  receiver: default
  group_by: [alertname, instance]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    # Critical alerts → PagerDuty (immediate)
    - match:
        severity: critical
      receiver: pagerduty-critical
      repeat_interval: 1h

    # Warning alerts → PagerDuty (low urgency)
    - match:
        severity: warning
      receiver: pagerduty-warning
      repeat_interval: 4h

    # Info alerts → Slack only
    - match:
        severity: info
      receiver: slack-info

receivers:
  - name: default
    slack_configs:
      - api_url: "https://hooks.slack.com/services/xxx"
        channel: "#alerts"

  - name: pagerduty-critical
    pagerduty_configs:
      - routing_key: "your-pagerduty-integration-key"
        severity: critical
        description: '{{ template "pagerduty.default.description" . }}'
        details:
          firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
          resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
    slack_configs:
      - api_url: "https://hooks.slack.com/services/xxx"
        channel: "#alerts-critical"

  - name: pagerduty-warning
    pagerduty_configs:
      - routing_key: "your-pagerduty-integration-key"
        severity: warning

  - name: slack-info
    slack_configs:
      - api_url: "https://hooks.slack.com/services/xxx"
        channel: "#alerts-info"

inhibit_rules:
  # If critical is firing, suppress warning for same alertname
  - source_match:
      severity: critical
    target_match:
      severity: warning
    equal: [alertname, instance]

PagerDuty Setup

In PagerDuty: Services → New Service → "Production Monitoring"
Integration: select "Prometheus" or "Events API v2"
Copy the Integration Key (routing_key)
Configure escalation policy and on-call schedule

Alert Rules

# /etc/prometheus/rules/alerts.yml
groups:
  - name: infrastructure
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"
          description: "{{ $labels.instance }} has been unreachable for 2 minutes"

      - alert: HighCPU
        expr: instance:node_cpu_utilization:ratio > 0.9
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU usage is {{ $value | humanizePercentage }}"

      - alert: DiskAlmostFull
        expr: instance:node_filesystem_utilization:ratio > 0.9
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Disk almost full on {{ $labels.instance }}"
          description: "Disk usage is {{ $value | humanizePercentage }}"

      - alert: HighMemory
        expr: instance:node_memory_utilization:ratio > 0.9
        for: 10m
        labels:
          severity: warning

Custom PagerDuty Templates

# /etc/alertmanager/templates/pagerduty.tmpl
{{ define "pagerduty.default.description" }}
{{ range .Alerts.Firing }}
[FIRING] {{ .Labels.alertname }}: {{ .Annotations.summary }}
{{ end }}
{{ range .Alerts.Resolved }}
[RESOLVED] {{ .Labels.alertname }}: {{ .Annotations.summary }}
{{ end }}
{{ end }}

{{ define "pagerduty.default.instances" }}
{{ range . }}
  - {{ .Labels.instance }}: {{ .Annotations.description }}
{{ end }}
{{ end }}

Testing the Pipeline

# Send a test alert to Alertmanager
curl -X POST http://localhost:9093/api/v1/alerts \
    -H "Content-Type: application/json" \
    -d '[{
        "labels": {
            "alertname": "TestAlert",
            "severity": "critical",
            "instance": "test-server:9100"
        },
        "annotations": {
            "summary": "Test alert for PagerDuty integration",
            "description": "This is a test alert to verify the alerting pipeline"
        }
    }]'

# Verify in PagerDuty that an incident was created
# Then resolve:
curl -X POST http://localhost:9093/api/v1/alerts \
    -H "Content-Type: application/json" \
    -d '[{
        "labels": {"alertname": "TestAlert", "severity": "critical", "instance": "test-server:9100"},
        "endsAt": "2025-01-15T00:00:00Z"
    }]'

Best Practices

Use severity labels consistently — critical for immediate action, warning for investigation
Set appropriate for durations — do not page on transient issues
Use inhibition rules to suppress related lower-severity alerts
Send critical alerts to PagerDuty AND Slack for visibility
Set repeat_interval based on severity — critical every 1h, warning every 4h
Test the alerting pipeline monthly to ensure it works end-to-end
Include runbook links in alert annotations for faster incident response