Docs / Kubernetes & Orchestration / How to Set Up Horizontal Pod Autoscaling

How to Set Up Horizontal Pod Autoscaling

By Admin · Mar 2, 2026 · Updated Apr 24, 2026 · 32 views · 2 min read

How to Set Up Horizontal Pod Autoscaling

Horizontal Pod Autoscaling (HPA) automatically scales the number of pod replicas based on observed CPU utilization, memory usage, or custom metrics. This ensures your applications running on Breeze instances handle traffic spikes without manual intervention while scaling down during quiet periods to save resources.

Prerequisites

  • Metrics Server installed in your cluster
  • Resource requests defined on your containers (required for CPU/memory-based scaling)
  • kubectl connected to your Breeze Kubernetes cluster

Installing the Metrics Server

If the Metrics Server is not already deployed:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify it is running:

kubectl top nodes
kubectl top pods

Deploying an Application with Resource Requests

HPA requires resource requests to calculate utilization percentages:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: myapp:latest
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
        ports:
        - containerPort: 8080

Creating an HPA

Scale between 2 and 10 replicas, targeting 50% average CPU utilization:

kubectl autoscale deployment web-app \
  --min=2 --max=10 --cpu-percent=50

HPA YAML Manifest with Multiple Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 120

Monitoring the HPA

kubectl get hpa web-app-hpa -w

This displays current and target utilization, plus the current replica count.

Load Testing

Generate load to trigger scaling on your Breeze cluster:

kubectl run load-test --rm -it --image=busybox --restart=Never -- sh -c \
  "while true; do wget -qO- http://web-app-service:8080; done"

Watch the HPA increase replicas as CPU rises above the target threshold, and scale back down after the load stops.

Tuning Tips

  • Set the stabilization window to prevent rapid scaling oscillations
  • Use behavior policies to control how aggressively the HPA scales up or down
  • Combine CPU and memory metrics for a more balanced scaling strategy
  • Set realistic resource requests — overestimating delays scaling, underestimating causes premature scaling

Was this article helpful?