How to Set Up Horizontal Pod Autoscaling
Horizontal Pod Autoscaling (HPA) automatically scales the number of pod replicas based on observed CPU utilization, memory usage, or custom metrics. This ensures your applications running on Breeze instances handle traffic spikes without manual intervention while scaling down during quiet periods to save resources.
Prerequisites
- Metrics Server installed in your cluster
- Resource requests defined on your containers (required for CPU/memory-based scaling)
kubectlconnected to your Breeze Kubernetes cluster
Installing the Metrics Server
If the Metrics Server is not already deployed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Verify it is running:
kubectl top nodes
kubectl top pods
Deploying an Application with Resource Requests
HPA requires resource requests to calculate utilization percentages:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 2
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: myapp:latest
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
ports:
- containerPort: 8080
Creating an HPA
Scale between 2 and 10 replicas, targeting 50% average CPU utilization:
kubectl autoscale deployment web-app \
--min=2 --max=10 --cpu-percent=50
HPA YAML Manifest with Multiple Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 120
Monitoring the HPA
kubectl get hpa web-app-hpa -w
This displays current and target utilization, plus the current replica count.
Load Testing
Generate load to trigger scaling on your Breeze cluster:
kubectl run load-test --rm -it --image=busybox --restart=Never -- sh -c \
"while true; do wget -qO- http://web-app-service:8080; done"
Watch the HPA increase replicas as CPU rises above the target threshold, and scale back down after the load stops.
Tuning Tips
- Set the stabilization window to prevent rapid scaling oscillations
- Use
behaviorpolicies to control how aggressively the HPA scales up or down - Combine CPU and memory metrics for a more balanced scaling strategy
- Set realistic resource requests — overestimating delays scaling, underestimating causes premature scaling