How to Monitor Kubernetes with Prometheus
Prometheus is the de facto monitoring solution for Kubernetes. It scrapes metrics from your cluster components, applications, and infrastructure, storing them in a time-series database for querying and alerting. Combined with Grafana for visualization, it gives you complete observability into your Breeze Kubernetes workloads.
Installing the Prometheus Stack with Helm
The kube-prometheus-stack Helm chart bundles Prometheus, Grafana, Alertmanager, and pre-configured dashboards:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set grafana.adminPassword='YourGrafanaPass' \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
Verifying the Installation
kubectl -n monitoring get pods
kubectl -n monitoring get svc
You should see pods for Prometheus, Grafana, Alertmanager, kube-state-metrics, and node-exporter.
Accessing Grafana
Port-forward to access the Grafana dashboard from your local machine:
kubectl -n monitoring port-forward svc/monitoring-grafana 3000:80
Open http://localhost:3000 and log in with admin and the password you configured. The stack includes pre-built dashboards for cluster health, node resources, pod metrics, and more.
Querying with PromQL
Access the Prometheus UI to run PromQL queries:
kubectl -n monitoring port-forward svc/monitoring-kube-prometheus-prometheus 9090:9090
Example queries:
# CPU usage per pod
sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod)
# Memory usage percentage per node
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))
# HTTP request rate
sum(rate(http_requests_total{job="web-app"}[5m])) by (status_code)
# Pod restart count
kube_pod_container_status_restarts_total{namespace="production"}
Adding Custom Metrics
Instrument your application to expose a /metrics endpoint and create a ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: web-app-monitor
namespace: monitoring
labels:
release: monitoring
spec:
namespaceSelector:
matchNames:
- production
selector:
matchLabels:
app: web-app
endpoints:
- port: metrics
interval: 15s
path: /metrics
Setting Up Alerts
Define alerting rules with PrometheusRule resources:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-alerts
namespace: monitoring
labels:
release: monitoring
spec:
groups:
- name: app.rules
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status_code=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "More than 10% of requests are failing"
Best Practices
- Set appropriate retention periods based on your Breeze storage capacity
- Use recording rules for expensive queries that run frequently
- Configure Alertmanager to route alerts to Slack, email, or PagerDuty
- Monitor Prometheus itself — watch for scrape failures and high cardinality metrics
- Use labels consistently across all services for effective filtering and aggregation