How to Set Up Infrastructure Monitoring with Prometheus and Alertmanager

By Admin · Mar 2, 2026 · Updated Apr 23, 2026 · 25 views · 4 min read

How to Set Up Infrastructure Monitoring with Prometheus and Alertmanager

Prometheus is an open-source monitoring and alerting toolkit designed for reliability. Paired with Alertmanager, it provides a complete monitoring solution for your Breeze infrastructure, collecting metrics, evaluating alert rules, and sending notifications when things go wrong.

Installing Prometheus

Download and install Prometheus on a dedicated monitoring Breeze instance:

# Create a system user
sudo useradd --no-create-home --shell /bin/false prometheus

# Create directories
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

# Download and extract
PROM_VERSION="2.51.0"
wget "https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz"
tar xzf "prometheus-${PROM_VERSION}.linux-amd64.tar.gz"
cd "prometheus-${PROM_VERSION}.linux-amd64"

# Install binaries
sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus

Configuring Prometheus

Create the main configuration file:

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets:
          - 'breeze-web-01:9100'
          - 'breeze-web-02:9100'
          - 'breeze-db-01:9100'
    relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):\d+'
        target_label: instance

  - job_name: 'nginx'
    static_configs:
      - targets: ['breeze-web-01:9113']

Installing Node Exporter on Targets

Install Node Exporter on each Breeze instance you want to monitor:

NODE_VERSION="1.7.0"
wget "https://github.com/prometheus/node_exporter/releases/download/v${NODE_VERSION}/node_exporter-${NODE_VERSION}.linux-amd64.tar.gz"
tar xzf "node_exporter-${NODE_VERSION}.linux-amd64.tar.gz"
sudo cp "node_exporter-${NODE_VERSION}.linux-amd64/node_exporter" /usr/local/bin/

# Create systemd service
sudo cat <<'EOF' > /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
Restart=always

[Install]
WantedBy=multi-user.target
EOF

sudo useradd --no-create-home --shell /bin/false node_exporter
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

Defining Alert Rules

Create alert rules to detect problems:

# /etc/prometheus/alert_rules.yml
groups:
  - name: infrastructure
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% for 5 minutes (current: {{ $value }}%)"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"
          description: "Root filesystem has less than 15% free space"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"

      - alert: InstanceDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"

Setting Up Alertmanager

Install and configure Alertmanager to route notifications:

# /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'smtp-password'

route:
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'email-team'
  routes:
    - match:
        severity: critical
      receiver: 'email-critical'
      repeat_interval: 1h

receivers:
  - name: 'email-team'
    email_configs:
      - to: 'team@example.com'
  - name: 'email-critical'
    email_configs:
      - to: 'oncall@example.com'
    webhook_configs:
      - url: 'https://hooks.slack.com/services/xxx/yyy/zzz'

Best Practices

Start with Node Exporter — get CPU, memory, disk, and network metrics first
Use recording rules — pre-compute expensive queries for dashboard performance
Set meaningful alert thresholds — avoid alert fatigue with well-tuned thresholds and for durations
Add labels — use labels to identify environments, teams, and services for routing
Retain data wisely — set --storage.tsdb.retention.time=30d based on your disk capacity

Prometheus and Alertmanager give your Breeze infrastructure proactive monitoring, so you catch problems before your users do.

How to Set Up Infrastructure Monitoring with Prometheus and Alertmanager