Docs / Automation & IaC / How to Set Up Infrastructure Monitoring with Prometheus and Alertmanager

How to Set Up Infrastructure Monitoring with Prometheus and Alertmanager

By Admin · Mar 2, 2026 · Updated Apr 23, 2026 · 25 views · 4 min read

How to Set Up Infrastructure Monitoring with Prometheus and Alertmanager

Prometheus is an open-source monitoring and alerting toolkit designed for reliability. Paired with Alertmanager, it provides a complete monitoring solution for your Breeze infrastructure, collecting metrics, evaluating alert rules, and sending notifications when things go wrong.

Installing Prometheus

Download and install Prometheus on a dedicated monitoring Breeze instance:

# Create a system user
sudo useradd --no-create-home --shell /bin/false prometheus

# Create directories
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

# Download and extract
PROM_VERSION="2.51.0"
wget "https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz"
tar xzf "prometheus-${PROM_VERSION}.linux-amd64.tar.gz"
cd "prometheus-${PROM_VERSION}.linux-amd64"

# Install binaries
sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus

Configuring Prometheus

Create the main configuration file:

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets:
          - 'breeze-web-01:9100'
          - 'breeze-web-02:9100'
          - 'breeze-db-01:9100'
    relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):\d+'
        target_label: instance

  - job_name: 'nginx'
    static_configs:
      - targets: ['breeze-web-01:9113']

Installing Node Exporter on Targets

Install Node Exporter on each Breeze instance you want to monitor:

NODE_VERSION="1.7.0"
wget "https://github.com/prometheus/node_exporter/releases/download/v${NODE_VERSION}/node_exporter-${NODE_VERSION}.linux-amd64.tar.gz"
tar xzf "node_exporter-${NODE_VERSION}.linux-amd64.tar.gz"
sudo cp "node_exporter-${NODE_VERSION}.linux-amd64/node_exporter" /usr/local/bin/

# Create systemd service
sudo cat <<'EOF' > /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
Restart=always

[Install]
WantedBy=multi-user.target
EOF

sudo useradd --no-create-home --shell /bin/false node_exporter
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

Defining Alert Rules

Create alert rules to detect problems:

# /etc/prometheus/alert_rules.yml
groups:
  - name: infrastructure
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% for 5 minutes (current: {{ $value }}%)"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"
          description: "Root filesystem has less than 15% free space"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"

      - alert: InstanceDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"

Setting Up Alertmanager

Install and configure Alertmanager to route notifications:

# /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'smtp-password'

route:
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'email-team'
  routes:
    - match:
        severity: critical
      receiver: 'email-critical'
      repeat_interval: 1h

receivers:
  - name: 'email-team'
    email_configs:
      - to: 'team@example.com'
  - name: 'email-critical'
    email_configs:
      - to: 'oncall@example.com'
    webhook_configs:
      - url: 'https://hooks.slack.com/services/xxx/yyy/zzz'

Best Practices

  • Start with Node Exporter — get CPU, memory, disk, and network metrics first
  • Use recording rules — pre-compute expensive queries for dashboard performance
  • Set meaningful alert thresholds — avoid alert fatigue with well-tuned thresholds and for durations
  • Add labels — use labels to identify environments, teams, and services for routing
  • Retain data wisely — set --storage.tsdb.retention.time=30d based on your disk capacity

Prometheus and Alertmanager give your Breeze infrastructure proactive monitoring, so you catch problems before your users do.

Was this article helpful?