IT Infrastructure Architecture - Infrastructure Building Blocks and Concepts 4th Edition - Copy_3.pdf

5 AVAILABILITY CONCEPTS 5.1 Introduction Everyone expects their infrastructure to be available all the time. In this age of global, always-on, always connected systems, disturbances in availability are noticed immediately. A 100% guaranteed availability of an infrastructure, however, is impossible. No matter how much effort is spent on creating high available infrastructures, there is always a chance of downtime. It's just a fact of life. As Werner Vogels, CIO of Amazon.com keeps reminding us: “Everything fails, all the time”. This chapter discusses the concepts and technologies used to create high available systems. It includes calculating availability, managing human factors, the reliability of infrastructure components, how to design for resilience, and – if everything else fails – business continuity management and disaster recovery. Figure 11: Availability in the infrastructure model 5.2 Calculating availability In general, availability cannot be calculated or guaranteed in advance. It can only be reported after the fact, when a system has been running for a few years. This makes designing for high availability a complicated task. Fortunately, much knowledge and experience has been gained over the years on how to design highly available systems, using design patterns such as failover, redundancy, structured programming, avoiding single points of failure (SPOFs), and implementing sound system management. But first, let's discuss how availability is expressed in numbers. 5.2.1 Availability percentages and intervals The availability of a system is usually expressed as a percentage of uptime in a given time period (usually one year or one month). The following table shows the maximum downtime for a particular percentage of availability. Availability % 99.8% 99.9% ("three nines") 99.99% ("four nines") 99.999% ("five nines") Downtime Downtime per year per month 17.5 hours 86.2 minutes 8.8 hours 43.2 minutes 52.6 4.3 minutes minutes 5.3 minutes 25.9 seconds Downtime per week 20.2 minutes 10.1 minutes 1.0 minutes 6.1 seconds Table 1: Availability levels Typical requirements used in service level agreements today are 99.8% or 99.9% availability per month for a full IT system. To meet this requirement, the availability of the underlying infrastructure must be much higher, typically in the range of 99.99% or higher. 99.999% uptime is also known as carrier grade availability; this level of availability originates from telecommunication system components (not full systems!) that need an extremely high availability. Higher availability levels for a complete system are very uncommon, as they are almost impossible to reach. To compare, the electricity supply in my home country, the Netherlands, is very reliable. In recent years, the average outage per household has been 24 minutes per year. This is equivalent to an availability of 99.995%. The average availability of electricity in the USA is 99.96%. While 99.9% uptime means 525 minutes of downtime per year, this downtime should not occur in one event, nor should one-minute downtimes occur 525 times a year. It is therefore good practice to agree on the maximum frequency of unavailability. An example is shown in Table 2. Unavailability (minutes) 0–5 5 – 10 Number of events (per year)

IT Infrastructure Architecture - Infrastructure Building Blocks and Concepts 4th Edition - Copy_3.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue