Grafana dashboards transform raw metrics into actionable visual insights. This guide covers creating custom dashboards from scratch, including panel types, query patterns, variables, and layout techniques for building monitoring dashboards for VPS infrastructure, applications, and business metrics.
Dashboard Structure
# A good dashboard includes:
# 1. Overview row — key metrics at a glance (stat panels, gauges)
# 2. Resource utilization — CPU, memory, disk, network (time series)
# 3. Application metrics — request rate, error rate, latency (time series)
# 4. Logs — recent errors or events (log panel)
# 5. Alerts — current alert state (alert list panel)
Essential Panel Types
Stat Panel (Single Value)
# CPU Usage percentage
Query: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Thresholds: 0=green, 70=yellow, 90=red
Unit: percent (0-100)
Time Series (Graphs)
# Memory usage over time
Query: node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
Legend: {{instance}} - Used Memory
Unit: bytes (IEC)
# Network traffic
Query A: rate(node_network_receive_bytes_total{device="eth0"}[5m]) * 8
Query B: rate(node_network_transmit_bytes_total{device="eth0"}[5m]) * 8
Legend: Inbound / Outbound
Unit: bits/sec
Gauge Panel
# Disk usage percentage
Query: (node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_avail_bytes{mountpoint="/"}) / node_filesystem_size_bytes{mountpoint="/"} * 100
Min: 0, Max: 100
Thresholds: 0=green, 75=yellow, 90=red
Table Panel
# Top 10 containers by CPU
Query: topk(10, sum(rate(container_cpu_usage_seconds_total[5m])) by (name))
Format: Table
Transform: Organize fields, rename columns
Template Variables
# Create dynamic dashboards with variables
# Instance selector
Name: instance
Type: Query
Query: label_values(node_cpu_seconds_total, instance)
# Use in queries:
100 - (avg(rate(node_cpu_seconds_total{instance="$instance", mode="idle"}[5m])) * 100)
# Job selector
Name: job
Type: Query
Query: label_values(up, job)
# Interval variable (for rate calculations)
Name: interval
Type: Interval
Values: 1m, 5m, 15m, 1h
Row Organization
# Organize panels into collapsible rows:
Row 1: "Overview" (collapsed: false)
- Stat: Uptime
- Stat: CPU Usage
- Stat: Memory Usage
- Stat: Disk Usage
- Stat: Active Connections
Row 2: "CPU & Memory" (collapsed: true)
- Time Series: CPU by mode (user, system, iowait)
- Time Series: Memory breakdown (used, cached, buffered)
- Time Series: Load average (1m, 5m, 15m)
Row 3: "Disk & Network" (collapsed: true)
- Time Series: Disk I/O (reads/writes per second)
- Time Series: Disk space over time
- Time Series: Network traffic in/out
Row 4: "Application" (collapsed: true)
- Time Series: HTTP request rate
- Time Series: Error rate (4xx, 5xx)
- Heatmap: Response time distribution
Annotations
# Mark deployments, incidents, and changes on all graphs
# Dashboard Settings → Annotations → Add
# Deployment markers from Grafana annotations API:
curl -X POST http://grafana:3000/api/annotations \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{"text": "Deployed v2.1.0", "tags": ["deploy"]}'
Dashboard as Code (JSON/Terraform)
# Export dashboard as JSON
# Dashboard → Share → Export → Save to file
# Import via API
curl -X POST http://grafana:3000/api/dashboards/db \
-H "Authorization: Bearer api-key" \
-H "Content-Type: application/json" \
-d @dashboard.json
# Provisioning from files (auto-load on startup)
# /etc/grafana/provisioning/dashboards/default.yaml
apiVersion: 1
providers:
- name: default
folder: ""
type: file
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: true
Best Practices
- Start with overview metrics at the top — users should see system health at a glance
- Use consistent color schemes: green=healthy, yellow=warning, red=critical
- Set meaningful units on all panels (bytes, percent, requests/sec)
- Use template variables for multi-server dashboards
- Keep dashboards focused — one per service or system, not everything on one page
- Version control dashboard JSON for change tracking and rollback
- Add annotations for deployments to correlate changes with metric shifts