Skip to content
CBT Nuggets

Operations & Resilience · Intermediate

High Availability

High availability is the design discipline of keeping services running even when individual components fail. It combines redundancy, automated failover, monitoring, and tested recovery procedures.

For IT leaders

HA designs that have never been failed-over in anger usually do not work; tested failover is the only credible measure of resilience.

Why IT teams care

Where this shows up at the team level

  • Cyber insurance, customer SLAs, and audit reviews increasingly require evidence of tested failover.
  • Cloud HA primitives (multi-AZ, multi-region, Anycast) are different from on-prem HA; teams need cross-domain fluency.
  • HA design directly affects on-call burden; weak designs translate into pages every outage.

In production

Where teams encounter it

  • Active/active or active/passive firewall, load balancer, and database pairs
  • Multi-AZ and multi-region cloud designs
  • RAID, clustered file systems, and storage replication

How it works

How High Availability actually works

  1. 01Redundancy removes single points of failure: redundant power, paths, NICs, hosts, sites.
  2. 02Automated failover detects failures (heartbeats, health checks) and shifts traffic to the surviving component.
  3. 03Load balancers, anycast routing, and DNS health checks distribute and re-route traffic.
  4. 04Regular failover testing (game days, chaos engineering) is required to keep HA real.

In practice

Common team use cases

  • Active/passive firewall pairs at the perimeter
  • Multi-AZ databases and stateless services across cloud regions
  • Storage replication for critical data

Build the capability

Each link routes to a hub that goes deeper than this definition.

Close the team gap

Turn this concept into team capability

CBT Nuggets builds expert-led team training that closes the gaps definitions only describe. Talk to sales about a plan that fits your team.