Service Level Agreements (SLAs) define the minimum level of service a hosting provider commits to delivering. Understanding what is actually covered — and what is not — helps you set realistic expectations and plan your own redundancy strategy.
What Is an SLA?
An SLA is a contractual commitment from a hosting provider that specifies measurable service guarantees, typically including network uptime, hardware availability, and the remedies (usually service credits) when those guarantees are not met.
Understanding Uptime Percentages
# What uptime percentages actually mean in annual downtime:
# 99.0% = 3 days, 15 hours, 36 minutes
# 99.5% = 1 day, 19 hours, 48 minutes
# 99.9% = 8 hours, 45 minutes, 36 seconds
# 99.95% = 4 hours, 22 minutes, 48 seconds
# 99.99% = 52 minutes, 33.6 seconds
# 99.999% = 5 minutes, 15.36 seconds (five nines)
# Monthly equivalent:
# 99.9% = ~43 minutes downtime per month
# 99.95% = ~22 minutes downtime per month
# 99.99% = ~4.3 minutes downtime per month
What SLAs Typically Cover
Network Uptime
The most common SLA metric. It guarantees that the network infrastructure — switches, routers, and uplinks — will be available. This is usually the highest percentage (99.99% or more) because providers have redundant network paths.
Hardware/Server Uptime
Guarantees that the physical server hardware will be functional. Typical SLA: 99.9-99.99%. Covers hardware failures like disk crashes, memory errors, and power supply failures.
What SLAs Do Not Cover
- Software issues — Your application crashing is not an SLA event
- Configuration errors — Misconfigured firewalls or services
- DDoS attacks — Often excluded or handled separately
- Scheduled maintenance — Usually excluded with advance notice
- Force majeure — Natural disasters, wars, government actions
- Third-party services — DNS, CDN, or upstream provider outages
- OS/software vulnerabilities — If your unpatched server gets hacked
Understanding Service Credits
When an SLA is breached, the typical remedy is service credits — a percentage discount on your next bill.
# Typical SLA credit structure:
# 99.9% - 99.5% uptime: 10% credit
# 99.5% - 99.0% uptime: 25% credit
# 99.0% - 95.0% uptime: 50% credit
# Below 95.0% uptime: 100% credit
# Important: Credits are usually capped at 100% of one month
# and must be claimed within a specific timeframe (often 30 days)
Building Your Own High Availability
No single server — regardless of SLA — can guarantee 100% uptime. If your application requires near-zero downtime, you need to build redundancy yourself.
Application-Level Redundancy
# Strategy 1: Multiple servers with a load balancer
# - 2+ application servers in different locations
# - Database replication (primary to replica)
# - Shared session storage (Redis)
# - Health checks and automatic failover
# Strategy 2: DNS-based failover
# - Multiple A records pointing to different servers
# - Health check service removes failed servers from DNS
# - Cloudflare load balancing handles this well
# Strategy 3: Active-passive standby
# - Primary server handles all traffic
# - Standby server receives database replication
# - Manual or scripted failover when primary fails
Monitoring Your Own Uptime
# Set up independent external monitoring:
# Free options:
# - UptimeRobot (50 monitors free, 5-minute checks)
# - Hetrixtools (15 monitors free, 1-minute checks)
# - Freshping (50 monitors free)
# What to monitor:
# 1. HTTP/HTTPS endpoint (is your app responding?)
# 2. TCP port checks (is the port open?)
# 3. SSL certificate expiry
# 4. DNS resolution
# 5. Response time thresholds
Questions to Ask Your Provider
- What is the network uptime SLA?
- What is the hardware/server uptime SLA?
- What constitutes downtime for SLA purposes?
- How do I claim service credits?
- What is the maximum credit available?
- Are scheduled maintenance windows excluded?
- How much advance notice is given for maintenance?
- What DDoS protection is included?
- What is the mean time to repair (MTTR) for hardware failures?
The Bottom Line
An SLA is a commitment, not a guarantee of zero downtime. For business-critical applications, combine a reliable provider with your own redundancy strategy, monitoring, and incident response plan.