IT Infrastructure Architecture - Infrastructure Building Blocks and Concepts 4th Edition.pdf
Document Details
Uploaded by IntuitiveVulture
Tags
Related
Full Transcript
Table of Contents Introduction Preface PART I - INTRODUCTION TO IT INFRASTRUCTURE 1 The definition of IT infrastructure 1.1 Introduction 1.2 What is IT infrastructure? 1.3 What is IT architecture? 1.3.1 Solution architects 1.3.2 Domain architects 1.3.3...
Table of Contents Introduction Preface PART I - INTRODUCTION TO IT INFRASTRUCTURE 1 The definition of IT infrastructure 1.1 Introduction 1.2 What is IT infrastructure? 1.3 What is IT architecture? 1.3.1 Solution architects 1.3.2 Domain architects 1.3.3 Enterprise architects 2 The infrastructure model 2.1 IT building blocks 2.2 Processes / Information building block 2.3 Applications building block 2.4 Application Platform building block 2.5 Infrastructure building blocks 2.6 Non-Functional attributes 3 Cloud computing and infrastructures 3.1 Cloud definition 3.2 Cloud characteristics 3.3 Cloud deployment models 3.4 Cloud service models 3.5 Infrastructure as a Service (IaaS) 3.6 Edge computing PART II - NON FUNCTONAL ATTRIBUTES 4 Introduction to Non-functional attributes 4.1 Introduction 4.2 Non-functional Requirements 5 Availability concepts 5.1 Introduction 5.2 Calculating availability 5.2.1 Availability percentages and intervals 5.2.2 MTBF and MTTR 5.2.2.1 Mean Time Between Failures (MTBF) 5.2.2.2 Mean Time To Repair (MTTR) 5.2.3 Some calculation examples 5.3 Sources of unavailability 5.3.1 Human errors 5.3.2 Software bugs 5.3.3 Planned maintenance 5.3.4 Physical defects 5.3.5 Environmental issues 5.3.6 Complexity of the infrastructure 5.4 Availability patterns 5.4.1 Redundancy 5.4.2 Failover 5.4.3 Fallback 5.4.3.1 Hot site 5.4.3.2 Cold site 5.4.3.3 Warm site 5.4.4 Availability in the cloud 5.4.4.1 Regions and Availability zones 5.4.4.2 Update and fault domains 5.4.5 Business Continuity 5.4.5.1 Business Continuity Management 5.4.5.2 Disaster Recovery Planning 5.4.5.3 RTO and RPO 6 Performance Concepts 6.1 Introduction 6.2 Perceived performance 6.3 Performance during infrastructure design 6.3.1 Benchmarking 6.3.2 Using vendor experience 6.3.3 Prototyping 6.3.4 User profiling 6.3.5 Scalable cloud environments 6.4 Performance of a running system 6.4.1 Managing bottlenecks 6.4.2 Performance testing 6.5 Performance patterns 6.5.1 Increasing performance on upper layers 6.5.2 Caching 6.5.2.1 Disk caching 6.5.3 Web proxies 6.5.4 Operational data store 6.5.5 Front-end servers 6.5.6 In-memory databases 6.5.7 Edge servers 6.5.8 Scalability 6.5.9 Load balancing 6.5.10 High performance computing 6.5.11 Design for use 6.5.12 Capacity management 7 Security Concepts 7.1 Introduction 7.1.1 Core infrastructure security 7.1.2 Crime against IT infrastructures 7.1.3 Malicious code 7.2 Security exploits 7.2.1 Social engineering 7.2.2 Phishing 7.2.3 Baiting 7.3 Security attacks 7.3.1 Denial of service attack 7.3.2 Ransomware 7.3.3 Integrity attacks 7.4 Cloud security 7.5 Security Patterns 7.5.1 Prevention 7.5.1.1 Security policies 7.5.1.2 Zero Trust 7.5.1.3 Segregation of duties and least privilege 7.5.1.4 Privileged Access Management (PAM) 7.5.1.5 Layered security 7.5.1.6 Identity and Access Management 7.5.1.7 Cryptography 7.5.1.7.1 Symmetric key encryption 7.5.1.7.2 Asymmetric key encryption 7.5.1.7.3 Hash functions and digital signatures 7.5.1.7.4 Cryptographic attacks 7.5.2 Detection 7.5.2.1 Malware detection 7.5.2.2 IDS/IPS 7.5.2.3 SIEM and SOC 7.5.3 Response 7.5.3.1 Computer Emergency Response Team (CERT) PART III - ARCHITECTURE BUILDING BLOCKS 8 Datacenters 8.1 Introduction 8.2 Datacenter building blocks 8.2.1 Datacenter categories 8.2.2 Cloud datacenters 8.2.3 Location of the datacenter 8.2.4 Physical structure 8.2.4.1 Floors 8.2.4.2 Walls, windows, and doors 8.2.4.3 Water and gas pipes 8.2.4.4 Layout of the datacenter 8.2.5 Power supply 8.2.5.1 Power density 8.2.5.2 Uninterruptable Power Supply (UPS) 8.2.5.3 Power generators 8.2.5.4 Battery powered UPS systems 8.2.5.5 Flywheel UPS systems 8.2.5.6 UPS maintenance 8.2.5.7 Power distribution 8.2.6 Cooling 8.2.6.1 Operating temperatures 8.2.6.2 Airflow 8.2.6.3 Liquid cooling 8.2.6.4 Humidity and dust 8.2.7 Fire prevention, detection, and suppression 8.2.7.1 Fire prevention 8.2.7.2 Passive fire protection 8.2.7.3 Fire detection systems 8.2.7.4 Fire suppression systems 8.2.8 Equipment racks 8.2.8.1 Rack design tips 8.2.8.2 KVM switches 8.2.9 Datacenter cabling and patching 8.2.9.1 Demarcation point 8.2.10 Datacenter energy efficiency 8.3 Datacenter availability 8.3.1 Availability tiers 8.3.2 Redundant datacenters 8.3.3 Floor management 8.4 Datacenter performance 8.5 Datacenter security 9 Networking 9.1 Introduction 9.2 Network topologies 9.3 Networking building blocks 9.3.1 OSI Reference Model 9.3.2 Physical layer 9.3.2.1 Cables 9.3.2.1.1 Twisted pair cables 9.3.2.1.2 Coax cable 9.3.2.1.3 Fiber optic cable 9.3.2.1.4 Vertical and horizontal cabling and patch panels 9.3.2.2 Leased lines 9.3.2.2.1 T and E carrier lines 9.3.2.2.2 SONET and SDH 9.3.2.2.3 Dark fiber 9.3.2.3 Internet access 9.3.2.3.1 Cable internet access 9.3.2.3.2 DSL 9.3.2.4 Network Interface Controllers (NICs) 9.3.3 Data link layer 9.3.3.1 PAN, LAN, MAN and WAN 9.3.3.2 Ethernet 9.3.3.3 WLAN (Wi-Fi) 9.3.3.4 Switching 9.3.3.5 WAN 9.3.3.6 Public wireless networks 9.3.3.6.1 1G and 2G: GSM, CDMA, GPRS and EDGE 9.3.3.6.2 3G: UMTS 9.3.3.6.3 4G and 5G: LTE 9.3.4 Network layer 9.3.4.1 The IP protocol 9.3.4.2 IPv4 9.3.4.2.1 Subnetting 9.3.4.2.2 Private IP ranges 9.3.4.3 IPv6 9.3.4.4 ICMP 9.3.4.5 Routing 9.3.4.5.1 Routing protocols 9.3.4.5.2 Distance vector protocols 9.3.4.5.3 Link state protocols 9.3.4.5.4 Path vector routing 9.3.4.6 MPLS 9.3.5 Transport layer 9.3.5.1 TCP and UDP 9.3.5.2 Network Address Translation (NAT) 9.3.6 Session layer 9.3.6.1 Virtual Private Network (VPN) 9.3.7 Presentation layer 9.3.7.1 SSL and TLS 9.3.8 Application layer 9.3.8.1 DHCP 9.3.8.2 DNS 9.3.8.2.1 DNSSEC 9.3.8.3 IPAM systems 9.3.8.4 Network Time Protocol (NTP) 9.3.8.5 POP 9.3.8.6 SMTP 9.3.8.7 FTP 9.3.8.8 HTTP and HTTPS 9.4 Network virtualization 9.4.1 Virtual LAN (VLAN) 9.4.2 VXLAN 9.4.3 Virtual routing and forwarding (VRF) 9.4.4 Virtual NICs 9.4.5 Virtual switch 9.4.6 Software Defined Networking 9.4.7 Network Function Virtualization 9.5 Network availability 9.5.1 Layered network topology 9.5.2 Spine and Leaf topology 9.5.3 Network teaming 9.5.4 Spanning Tree Protocol 9.5.5 Multihoming 9.6 Network performance 9.6.1 Throughput and bandwidth 9.6.2 Latency 9.6.3 Quality of Service (QoS) 9.6.4 WAN link compression 9.7 Network security 9.7.1 Network encryption 9.7.2 Firewalls 9.7.3 Network segmentation 9.7.3.1 Microsegmentation 9.7.4 DMZ 9.7.5 RADIUS 10 Storage 10.1 Introduction 10.2 Storage building blocks 10.2.1 Disks 10.2.1.1 Command sets 10.2.1.2 Mechanical hard disks 10.2.1.3 Solid State Drives (SSDs) 10.2.1.4 Disk capacity - Kryder's law 10.2.2 Tapes 10.2.2.1 Tape library 10.2.2.2 Virtual tape library 10.2.3 Controllers 10.2.3.1 RAID (Redundant Array of Independent Disks) 10.2.3.1.1 RAID 0 - Striping 10.2.3.1.2 RAID 1 - Mirroring 10.2.3.1.3 RAID 10 - Striping and mirroring 10.2.3.1.4 RAID 5 - Striping with distributed parity 10.2.3.1.5 RAID 6 - Striping with distributed double parity 10.2.3.2 Data compression 10.2.3.3 Data deduplication 10.2.3.4 Clones and snapshots 10.2.3.5 Thin provisioning 10.2.4 Direct Attached Storage (DAS) 10.2.5 Storage Area Network (SAN) 10.2.5.1 SAN connectivity protocols 10.2.5.1.1 Fibre Channel 10.2.5.1.2 FCoE 10.2.5.1.3 iSCSI 10.2.6 Network Attached Storage (NAS) 10.2.7 Object Storage 10.2.8 Software Defined Storage 10.3 Storage availability 10.3.1 Redundancy and data replication 10.3.2 Backup and recovery 10.3.2.1 Consistent backups 10.3.2.2 Backup schemes 10.3.2.3 Backup data retention time 10.3.3 Archiving 10.4 Storage performance 10.4.1 Disk performance 10.4.1.1 IOPS 10.4.1.2 RAID penalty 10.4.2 Interface throughput 10.4.3 Caching 10.4.4 Storage tiering 10.4.5 Load optimization 10.5 Storage security 10.5.1 Protecting data at rest 10.5.1.1 Disk encryption 10.5.1.2 Tape encryption 10.5.2 SAN zoning and LUN masking 11 Compute 11.1 Introduction 11.2 Compute building blocks 11.2.1 Computer housing 11.2.2 Processors 11.2.2.1 Intel x86 processors 11.2.2.2 AMD x86 processors 11.2.2.3 Itanium and x86-64 processors 11.2.2.4 ARM processors 11.2.2.5 CPUs created by computer manufacturers 11.2.2.6 GPUs 11.2.3 Memory 11.2.3.1 RAM 11.2.3.2 BIOS 11.2.4 Interfaces 11.2.4.1 USB 11.2.4.2 Thunderbolt 11.2.4.3 PCI and PCIe 11.2.5 Virtual machines 11.2.5.1 Virtual Machine Management 11.2.5.2 Disadvantages of computer virtualization 11.2.5.3 Virtualization technologies 11.2.5.3.1 Emulation 11.2.5.3.2 Logical Partitions (LPARs) 11.2.5.3.3 Hypervisors 11.2.5.4 Virtual memory management 11.2.5.4.1 Memory overcommit 11.2.5.4.2 Memory sharing 11.2.6 Container technology 11.2.6.1 Container orchestration 11.2.7 Serverless computing 11.2.8 Mainframes 11.2.8.1 History 11.2.8.2 Mainframe architecture 11.2.8.2.1 Processing Units 11.2.8.2.2 Main Storage 11.2.8.2.3 Channels, ESCON and FICON 11.2.8.2.4 Control units 11.2.8.3 Mainframe virtualization 11.2.9 Midrange systems 11.2.9.1 History 11.2.9.2 Midrange architecture 11.2.9.2.1 UMA 11.2.9.2.2 NUMA 11.2.9.3 Midrange virtualization 11.2.10 x86 servers 11.2.10.1 History 11.2.10.2 x86 architecture 11.2.10.3 x86 virtualization 11.2.11 Supercomputers 11.2.12 Quantum computers 11.3 Compute availability 11.3.1 Hot swappable components 11.3.2 Parity and ECC memory 11.3.3 Virtualization availability 11.3.3.1 Admission 11.4 Compute performance 11.4.1 Moore's law 11.4.2 Increasing CPU and memory performance 11.4.2.1 Increasing clock speed 11.4.2.2 CPU Caching 11.4.2.3 Pipelines 11.4.2.4 Prefetching and branch prediction 11.4.2.5 Superscalar CPUs 11.4.2.6 Multi-core CPUs 11.4.2.7 Hyperthreading 11.4.3 Virtualization performance 11.5 Compute security 11.5.1 Physical security 11.5.2 Data in use 11.5.3 Virtualization security 11.5.3.1 DMZ 11.5.3.2 Systems management console 12 Operating systems 12.1 Introduction 12.2 Popular operating systems 12.2.1 z/OS 12.2.2 IBM i (OS/400) 12.2.3 UNIX 12.2.4 Linux 12.2.4.1 Linux support 12.2.5 BSD 12.2.5.1 FreeBSD 12.2.5.2 NetBSD 12.2.5.3 OpenBSD 12.2.6 Windows 12.2.6.1 Support 12.2.7 MacOS 12.2.8 Operating systems for mobile devices 12.2.8.1 iOS 12.2.8.2 Android 12.2.9 Special purpose operating systems 12.3 Operating System building blocks 12.3.1 Process scheduling 12.3.2 File systems 12.3.3 APIs and system calls 12.3.4 Device drivers 12.3.5 Memory management 12.3.5.1 DMA 12.3.5.2 Paging and swapping 12.3.6 Shells, CLIs and GUIs 12.3.7 Operating system configuration 12.4 Operating system availability 12.4.1 Failover clustering 12.4.1.1 Voting and quorum disks 12.4.1.2 Cluster-aware applications 12.5 Operating system performance 12.5.1 Increasing memory 12.6 Operating system security 12.6.1 Patching 12.6.2 Hardening 12.6.3 Malware scanning 12.6.4 Host-based firewalls 12.6.5 Limiting user accounts 12.6.6 Hashed passwords 12.6.7 Decreasing kernel size 13 End User Devices 13.1 Introduction 13.2 End user device building blocks 13.2.1 Desktop PCs and laptops 13.2.2 Mobile devices 13.2.3 Bring Your Own Device (BYOD) 13.2.4 Printers 13.2.4.1 Laser printers 13.2.4.2 Inkjet printers 13.2.4.3 Multi-Functional Printers (MFPs) 13.2.4.4 Specialized printers 13.2.4.4.1 Dot Matrix printers 13.2.4.4.2 Line printers 13.2.4.4.3 Thermal printers 13.3 Desktop virtualization 13.3.1 Application virtualization 13.3.2 Server Based Computing 13.3.3 Virtual Desktop Infrastructure (VDI) 13.3.4 Thin clients 13.3.4.1 PXE boot 13.4 End user device availability 13.4.1 Reliability of devices 13.4.2 Software stack 13.4.3 Printers and other equipment 13.5 End user device performance 13.5.1 RAM 13.5.2 Hard disk 13.5.3 Network connectivity 13.6 End user device security 13.6.1 Physical security 13.6.2 Malware protection 13.6.3 Disk encryption 13.6.4 Mobile device management 13.6.5 Network Access Control (NAC) 13.6.6 End user authorizations and awareness PART IV - INFRASTRUCTURE MANAGEMENT 14 Infrastructure Deployment options 14.1 Introduction 14.2 Hosting options 14.3 (Hyper) Converged Infrastructure 14.4 Private cloud 14.5 Public cloud 14.6 Hybrid cloud 15 Automation 15.1 Introduction 15.2 Infrastructure as code 15.2.1 Declarative vs imperative languages 15.2.2 Versioning 15.2.3 Commonly used IaC languages 15.3 Configuration management tools 15.4 Pipelines 16 Documenting the infrastructure 16.1 Introduction 16.2 CMDB 16.3 Diagrams 16.4 IaC tools 16.5 Documenting procedures 17 Assembling and testing 17.1 Assembling the infrastructure 17.2 Testing the infrastructure 17.2.1 Test scope 17.2.2 Test stages 17.3 Go live scenarios 18 Maintaining the infrastructure 18.1 Introduction 18.2 Systems management processes 18.2.1 TOGAF 18.2.2 ITIL 18.2.3 DevOps for infrastructure 18.2.4 Site Reliability Engineering 18.2.5 FinOps 18.3 Monitoring 18.4 Management using SNMP 18.5 Logging 18.6 Capacity management 19 Deploying applications 19.1 DTAP environments 19.2 Blue-Green deployment 19.3 Continuous Delivery 20 Decommissioning infrastructures 20.1 Preparation 20.2 Execution 20.3 Cleanup PART V - APPENDICES Infrastructure checklist Abbreviations IS 2020.3 Curriculum reference matrix Further reading End notes Sjaak Laan IT Infrastructure Architecture Infrastructure Building Blocks and Concepts th 4 Edition Title: IT Infrastructure Architecture – Infrastructure Building Blocks and Concepts 4th Edition Author: Sjaak Laan Publisher: Lulu Press Inc. ISBN: 978-1-4477-8093-9 Edition: 4th edition, 2023 Copyright: © Sjaak Laan, 2023 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the author. The views expressed in this document are those of the author and not necessarily of his employer or his clients. TRADEMARKS All trademarks used in this book are the property of their respective owners. · AIX is a trademark of IBM Corp., registered in the U.S. and other countries. · ArchiMate is a registered trademark of The Open Group. · AWS (Amazon Web Services) is a trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries. · AMD Opteron, the AMD logo, the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. · Apache®, Apache Tomcat, and Apache Mesos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks. · Apple, Mac, iOS, and Mac OS are trademarks of Apple Inc., registered in the U.S. and other countries. · Cisco is a registered trademark of Cisco in the U.S. and other countries. · Citrix, XenServer, XenMotion XenServer Marathon everRun, MetaFrame Presentation Server, XenApp, and XenDesktop are trademarks of Citrix Systems, Inc. and/or one or more of its subsidiaries, and may be registered in the United States Patent and Trademark Office and in other countries. · DEC™, DECnet™, VMS™, and VAX™ are trademarks of Digital Equipment Corporation. · Docker and the Docker logo are trademarks or registered trademarks of Docker, Inc. in the United States and/or other countries. Docker, Inc. and other parties may also have trademark rights in other terms used herein. · Gartner Hype Cycle is a registered trademark of Gartner, Inc. and/or its affiliates and is used herein with permission. All rights reserved. · Google, Android, Google App Engine, and Kubernetes are registered trademarks of Google Inc · HP and HPE are a registered trademark of Hewlett-Packard Company in the U.S. and other countries. · IBM, AIX, IBM MQ, DB2, and ibm.com® are trademarks or registered trademarks of International Business Machines Corporation in the United States, and/or other countries. · Intel, Intel Core, Xeon, and Thunderbolt are trademarks of Intel Corp. in the U.S. and other countries. · IOS is a trademark or registered trademark of Cisco in the U.S. and other countries. · Java and all Java-based trademarks are trademarks of Oracle, Inc. in the United States, other countries, or both. · Linux is a registered trademark of Linus Torvalds. · Microsoft®, Hyper-V, Windows, Windows NT®, Microsoft Azure Cloud Service, Windows.Net, Microsoft Internet Information Services, BizTalk, Microsoft SQL Server, and the Windows logo are trademarks of Microsoft Corporation in the United States and other countries. · Oracle, Sun Microsystems, and Java are registered trademarks of Oracle Corporation and/or its affiliates. · The Pivotal CloudFoundry trademark is the property of Pivotal Software, Inc. and its subsidiaries and affiliates (collectively “Pivotal”). · PowerPC™ and the PowerPC logo™ are trademarks of International Business Machines Corporation. · Red Hat Enterprise Linux and Red Hat JBoss are trademarks of Red Hat, Inc. in the United States and other countries. · SPEC® is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). See http://www.spec.org for more information. · TOGAF is a registered trademark of The Open Group in the United States and other countries. · UNIX is a registered trademark of The Open Group in the United States and other countries. · VMware, VMware tools, VMware Workstation, VMware Fault Tolerance, Sphere, GSX, ESX, ESXi, vCenter, and VMotion are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. Other company, product, or service names may be trademarks or service marks of others. All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. While every precaution was made in the preparation of this book, the author can assume no responsibility for errors or omissions. If you feel the author has not given you proper credit or feel your rights were violated, please notify the author so corrective actions can be taken. Pictures used in this book are created by the author of this book or are freely distributable pictures, retrieved from the internet. Most of the used pictures are from the public domain. When a picture is used that contained copyrights, a link to the source of the picture and its copyright notice is provided. If you feel a picture used in this book is not freely distributable, or any other copyright is violated, please inform the author, so it can be corrected in the next version of the book. INTRODUCTION In the summer of 2011, Sjaak showed me the first ever printed version of the book he had been working on for quite some time. It was also the first time we discussed in detail the reasons for writing it, and the target audience that he was aiming for. After some hours of discussing the contents of the book Sjaak asked me if I was willing to write the introduction for it. Needless to say, I was flattered and proud he suggested this. At first, I was puzzled about his decision to have the introduction written by someone who is not directly involved in core IT infrastructures, but much more into software development. For 17 years, my company has been developing back and front office applications for large multinationals, and is currently involved in SaaS solutions in the field of Social Media Intelligence, Online Health Applications, Event Management, and Storage Management. As I am much more into the "soft-side" of automation, it was like if a car designer asked a road designer to write an introduction on engine mechanics. But from a different point of view, it seemed very obvious. Where would software application developers be when the infrastructures their applications run on were not working flawlessly? How many times have we been in meetings with customers trying to figure out why software applications were not performing as they were supposed to? How many times did it occur that after implementation of a new software system we were confronted with unforeseen costs because the underlying computer systems had difficulties running the developed software? How many times did we accuse the infrastructure guys of not understanding the requirements and vice versa? I strongly believe that most of these problems originate from a lack of knowledge software engineers have about the problems and challenges infrastructure specialists face when setting up a system for running the software we develop. We as software engineers are primarily concerned with functionality required by the customer. Customer assume that when they talk about an application handling 100,000 visitors a day, or running large reports on millions of records, software engineers fully understand their needs. And indeed, software engineers understand everything regarding select statements, thread handling, and database calls, but they will also assume that the hardware and operating systems they build upon are capable of supporting this. I think there is a great need for software engineers to understand more about IT infrastructures, in order to allow them to communicate with the infrastructure architects on a professional level. What really appealed to me about this book was that it was written from experience, rather than just presenting theoretical knowledge. Far too often we see that decisions about IT infrastructures lack realism. For the sake of security and availability, systems are often made far too complex, and thus far too expensive, resulting in a system that is even less secure and less reliable than intended. Sjaak has hit the nail right on the head with his chapters on availability and security. What this book really shows is that the biggest risks for failures and security breaches are not in the infrastructure itself, “but it sits between the chair and the keyboard”. A good example, from my own experience, was the case where we needed to implement a contingency plan, including an emergency response team. In order to minimize downtime in case of a failure, we found it was much more effective to see who of the systems management team was living closest to the datacenter, than to focus on "putting the best man on the job". We were better off training the guy living next door, then to have the chief infrastructure manager drive 1.5 hours to the datacenter. The entire book is an excellent piece of work to be read by each software developer, it is an outstanding educational tool for system engineers and it is great reference material for IT consultants, regardless of the specific area they expertise on. If you take IT seriously, you have to read this book! Herman Vissia PhD, M.Sc. Herman Vissia is the CEO and owner of Byelex Multimedia Products B.V. Together with his colleagues from Minsk he has written more than 10 scientific articles on software related technologies, more specifically on Artificial intelligence and sentiment analysis on the World Wide Web. In 2012 he earned his PhD from the State University in Minsk, Belarus. PREFACE What this book is about This book is about information technology (IT) infrastructure architecture. Infrastructure refers to all the hardware and system software components required to run IT applications. And infrastructure architecture describes the overall design and evolution of that infrastructure. This book explains how infrastructure components work at the architectural level. This means that components are described in building blocks that are tied to specific infrastructure technologies. Decisions made at this level are architecturally relevant, which means that once decisions are made at the building block level, it is relatively difficult to change them later. For example, the decision to use a particular cabling infrastructure in a data center cannot be easily changed once the data center is in operation. This book does not provide the level of detail required by engineers, but rather describes the most important architectural building blocks and concepts. IT infrastructures are complex by nature and provide non-functional attributes such as performance, availability, and security to applications. This book describes each infrastructure building block and its specific performance, availability, and security concepts. Until now, there has been no single publication that describes the entire field of IT infrastructure. Books and papers exist on each part of IT infrastructure, such as networking, installing and managing operating systems, storage, and virtualization, but no publication has yet described IT infrastructure as a whole. This book aims to fill that gap. Intended audience This book is intended for infrastructure architects and designers, software architects, systems managers, and IT managers. It can also be used in education, for example in a computer science class. This book is very suitable for beginners, as almost every term is explained, while for experts and professionals this book is more of a review and overview. Infrastructure architects and designers can use this book to learn more about infrastructure design that is not their core competency. For example, network designers will probably not learn anything new about networking, but they will probably learn a lot about all the other parts of the infrastructure, such as data centers, storage, and servers. The same is true for other designers. Software architects build software that runs on infrastructure. Software architects who understand the challenges an infrastructure architect faces can optimize their software for certain infrastructure characteristics. Understanding infrastructure helps software architects build more reliable, faster, manageable, and secure applications. Systems managers learn to identify key architectural choices and principles in an infrastructure, as well as ways to update and change a running infrastructure without compromising the architecture as a whole. IT managers gain a complete view of IT infrastructures and IT architecture. This will help them work with system administrators and infrastructure architects to better understand their concerns. Students of computer science will find a wealth of information about IT infrastructures that will provide a solid foundation for their computer science studies. This book is used by a number of universities around the world as part of their IT architecture curricula. It is particularly suitable for courses based on the Association for Computing Machinery (ACM) IS 2020.3 curriculum. A reference matrix of the curriculum topics and the relevant sections in this book is provided in the appendix IS 2020.3 Curriculum reference matrix. Some basic IT knowledge is needed to read this book, but the reader is introduced to each topic in small steps. Acknowledgements I would like to thank my wife, Angelina, for the patience she showed when I was working again on this book for a whole evening or weekend, without giving her the attention she deserves, and my three children Laura, Maarten, and Andreas, who I love. Jan van Til inspired me to think more thoroughly about the definition of infrastructure. His (Dutch) work on information management can be found at www.emovere.nl. I want to thank Robert Elsinga, Olav Meijer, Esther Barthel, Raymond Groenewoud, Emile Zweep, Cathy Ellis, Jacob Mulder, Robbert Springer, Marc Eilander, and Jan van der Zanden for their criticism, useful suggestions, and hard work when reviewing this book. For the transformation of this book to the online training on MyEducator I would like to thank VP Sales Scott Pectol and Content Team Lead Aspen Moore. Especially I want to thank Lodewijk Bogaards, who reviewed the book’s first edition and provided literally hundreds of useful tips on the described topics. He also made many corrections on my English grammar. The photo on the cover is based on the Shutterstock photo https://www.shutterstock.com/image-photo/server-rack-cluster-data-center- shallow-71676715, taken by a photographer named Lightpoet. Courseware This book is used in a number of universities around the world as a resource for their IT infrastructure courses. For more information about using this book in a university course, please contact the author at [email protected]. Courseware can be downloaded from www.sjaaklaan.com/book. It contains all the figures used in the book in both Visio and high-resolution PNG format, the list of abbreviations, a PowerPoint slide deck for each chapter (over 700 slides in total), a set of test questions for each chapter (over 200 questions in total), and the infrastructure checklist from the appendix. This book is also available as online training from MyEducator. Please check https://app.myeducator.com/reader/web/1957a/. Note to the fourth edition In the fourth edition of this book, a number of corrections have been made, some terminology has been clarified, and several typographical and syntax errors have been corrected. In addition, the following changes have been made: · The content has been updated to reflect the new Association for Computing Machinery (ACM) IS 2020.3 Curriculum - Competency Area - IT Infrastructure. · A new chapter on cloud computing has been added, and cloud- related content has been added throughout the rest of the book. · A new chapter on documenting infrastructures was added. · New technologies such as serverless computing, edge computing and quantum computing have been added. · The security chapter has been rewritten and restructured to better reflect infrastructure-related security concerns. · The Infrastructure as Code chapter has been rewritten to reflect current working practices and a chapter on automation has been added as this has become more important over the years. · The chapter on Purchasing Infrastructure and Services has been removed as it was too general and not specific to infrastructure. The chapter was mandatory for the IS 2010.4 syllabus, but has been removed from the IS 2020.3 syllabus. · The networking chapter has been expanded to include POP, SMTP, FTP, HTTP, and HTTPS protocols. This is a requirement from the IS 2020.3 syllabus. · An appendix has been added that describes a high-level checklist that can be used to ask the right questions when learning about an existing infrastructure in the field. · More than 100 edits were made throughout the book to clarify and update content, and to remove outdated content. · Finally, as technology has advanced in recent years, the book has been updated to include the most current information. About the Author Sjaak Laan (1964) leads CGI's Cloud and Infrastructure practice in the Netherlands. After studying electronics in the 1980s, he started his career in the IT industry at a PC repair company, where he repaired thousands of IBM PS/2 system boards at chip level. He later became an IT infrastructure specialist in networking, storage and computing. He now has more than 30 years of IT experience. Mr. Laan joined CGI in 2000 and is now a Director Consulting Expert in the government, financial, and energy markets. He is an expert in cloud, infrastructure and security and has extensive knowledge of systems management processes and integrations. As an architect, he is certified by The Open Group as a Master IT Architect and is TOGAF certified. In the area of cloud, he is an AWS Certified Solution Architect and Certified Azure Solutions Architect Expert. His information security knowledge is supported by his CISSP and CRISC certifications. Sjaak Laan has been writing about cloud and infrastructure on www.sjaaklaan.nl since 2006, has a number of publications to his name and regularly gives trainings and presentations. Mr. Laan usually works for clients as a lead architect or consultant on complex projects. PART I - INTRODUCTION TO IT INFRASTRUCTURE Infrastructure is much more important than architecture. Rem Koolhaas, one of the world's most famous architects 1 THE DEFINITION OF IT INFRASTRUCTURE 1.1 Introduction In the early decades of IT development, most infrastructures were relatively simple. As applications grew in functionality and complexity, hardware basically just got faster. In recent years, IT infrastructures have become more complex due to the rapid development and deployment of new types of applications, such as big data, artificial intelligence (AI), machine learning, the Internet of Things (IoT), and cloud computing. These applications require new and more sophisticated infrastructure services that are secure, highly scalable, and available 24/7. 1.2 What is IT infrastructure? IT infrastructure has been around for a long time. But surprisingly, there does not seem to be a universally accepted definition of IT infrastructure. I have found that many people are confused by the term IT infrastructure, and a clear definition would help them understand what IT infrastructure is and is not. In literature, many definitions of IT infrastructure can be found. Some of them are: · IT infrastructure is defined broadly as a set of information technology (IT) components that are the foundation of an IT service; typically physical components (computer and networking hardware and facilities), but also various software and network components. Wikipedia · All of the hardware, software, networks, facilities, etc., that are required to develop, test, deliver, monitor, control, or support IT services. The term IT Infrastructure includes all of the Information Technology but not the associated people, processes and documentation. ITILv3. · IT infrastructure refers to the composite hardware, software, network resources and services required for the existence, operation and management of an enterprise IT environment. IT infrastructure allows an organization to deliver IT solutions and services to its employees, partners and/or customers and is usually internal to an organization and deployed within owned facilities. Techopedia · IT infrastructure is the system of hardware, software, facilities and service components that support the delivery of business systems and IT-enabled processes. Gartner · IT infrastructure refers to the combined components needed for the operation and management of enterprise IT services and IT environments. IBM · IT infrastructure are the components required to operate and manage enterprise IT environments. IT infrastructure can be deployed within a cloud computing system, or within an organization's own facilities. These components include hardware, software, networking components, an operating system (OS), and data storage, all of which are used to deliver IT services and solutions. Red Hat Based on these definitions, the term infrastructure may seem a bit arbitrary. Let's try to clear things up. The word infrastructure comes from the words infra (Latin for "underneath") and structure. It encompasses all the components that are "underneath" the structure, where the structure may be a city, a house, or an information system. In the physical world, infrastructure often refers to public utilities such as water pipes, power lines, gas pipes, sewers, and telephone lines – components that literally lie beneath the structure of a city. Figure 1: Views on IT infrastructure For most people, infrastructure is invisible and taken for granted. When a business analyst describes business processes, the information used in the process is very important. How that information is managed by IT systems is "below the surface" to the business analyst. They think of IT systems as infrastructure. For users of IT systems, applications are important because they use them every day, but how they are implemented or where they are physically located is invisible (below the surface) to them and is therefore considered infrastructure. For systems managers, the building that houses their servers and the utility company that provides the power are considered infrastructure. So what infrastructure is depends on who you ask and their point of view. The scope of infrastructure as used in this book is described in more detail in chapter 2. 1.3 What is IT architecture? Most of today's infrastructure landscapes are the result of a history of application implementation projects that brought in their own specialized hardware and infrastructure components. Mergers and acquisitions have made matters worse, leaving many organizations with multiple sets of the same infrastructure services that are difficult to interconnect, let alone integrate and consolidate. Organizations benefit from infrastructure architecture when they want to be more flexible and agile because a solid, scalable, and modular infrastructure provides a solid foundation for agile adaptations. The market demands a level of agility that can no longer be supported by infrastructures that are inconsistent and difficult to scale. We need infrastructures built with standardized, modular components. And to make infrastructures consistent and aligned with business needs, architecture is critical. Architecture is the philosophy that underlies a system and defines its purpose, intent, and structure. Different areas of architecture can be defined, including business architecture, enterprise architecture, data architecture, application architecture, and infrastructure architecture. Each of these areas has certain unique characteristics, but at their most basic level, they all aim to map IT solutions to business value. Architecture is needed to govern an infrastructure as it is designed, as it is used, and as it is changed. We can broadly categorize architects into three groups: enterprise architects, domain architects, and solution architects, each with their own role. 1.3.1 Solution architects Solution architects create IT solutions, usually as a member of a project team. A solution architect is finished when the project is complete. Solution architects are the technical conscience and authority of a project, are responsible for architectural decisions in the project, and work closely with the project manager. Where the project manager manages the process of a project, the solution architect manages the technical solution of the project, based on business and technical requirements. 1.3.2 Domain architects Domain architects are experts on a particular business or technology topic. Because solution architects cannot always be fully knowledgeable about all technological details or specific business domain issues, domain architects often assist solution architects on projects. Domain architects also support enterprise architects because they are aware of the latest developments in their field and can inform enterprise architects about new technologies and roadmaps. Examples of domain architects are cloud architects, network architects, and VMware architects. Domain architects most often work for infrastructure or software vendors, where they help customers implement the vendor's technologies. 1.3.3 Enterprise architects Enterprise architects continuously align an organization's entire IT landscape with the business activities of the organization. Using a structured approach, enterprise architects enable transformations of the IT landscape (including the IT infrastructure). Therefore, an enterprise architect is never finished (unlike the solution architect in a project, who is finished when the project is finished). Enterprise architects typically work closely with the CIO and business units to align the needs of the business with the current and future IT landscape. Enterprise architects build bridges and act as advisors to the business and IT. 2 THE INFRASTRUCTURE MODEL 2.1 IT building blocks The definition of infrastructure as used in this book is based on the building blocks in the model as shown in Figure 2. In this model, processes consume information, and that information is stored and managed by applications. Applications require application platforms and infrastructure to run. All of this is managed by different categories of systems management. Figure 2: The infrastructure model A model is always a simplified version of reality, useful to explain a certain point; not covering all details. Therefore, the infrastructure model is not perfect. As George E. P. Box once said: “Essentially, all models are wrong, but some are useful.” The following sections provide a high-level description of the building blocks in the infrastructure model. 2.2 Processes / Information building block Figure 3: Processes / Information building block Organizations implement business processes to fulfil their mission and vision. These processes are organization specific – they are the main differentiators between organizations. As an example, some business processes in an insurance company could be claim registration, claim payment, and create invoice. Business processes create and use information. In our example, information could be the claim’s date or the number of dollars on an invoice. Information is typically entered, stored and processed using applications. Functional management is the category of systems management that ensures the system is configured to perform the required business functions. 2.3 Applications building block Figure 4: Applications building block The Applications building block includes several types of applications based on the following characteristics: · Usage: Applications can be single-user or multi-user. A single-user application typically runs on end-user devices such as PCs and laptops. Examples include web browsers, word processors, and email clients. Examples of multi-user applications include mail servers, portals, collaboration tools, and instant messaging servers. · Source: Applications can be purchased as commercial off-the-shelf (COTS) products or developed as custom software. · Architecture: Applications can be designed as standalone applications or as multi-tier applications. A multi-tier application consists of a number of layers, such as a JavaScript application in a browser that communicates with an on-premises web server, which communicates with an application server, which communicates with a database. · Timeliness: Interactive applications respond to user actions, such as mouse clicks. They typically respond in the range of 100 to 300 ms. Real-time systems, such as Supervisory Control And Data Acquisition (SCADA) systems, are used in manufacturing, logistics, or other environments where timeliness is critical. These systems must respond in less than 10 ms. At the other end of the spectrum are batch-based systems that process data for hours at a time. Each of these types of applications requires a different type of underlying infrastructure. Applications management is responsible for the configuration and technical operations of the applications. 2.4 Application Platform building block Figure 5: Application Platform building block Most applications need some additional services, known as application platforms, that enable them to work. We can identify the following services as part of the application platform building block: · Application servers provide services to applications. Examples are Java or.Net application servers and frameworks like IBM WebSphere, Apache Tomcat, and Red Hat JBoss. · Container platforms like Kubernetes, Azure Container Instances, and Amazon Elastic Container Service, that run docker containers. · Connectivity entails Enterprise Service Buses (ESBs) like Microsoft BizTalk, the TIBCO Service Bus, IBM MQ, and SAP NetWeaver PI. · Databases, also known as database management systems (DBMSs), provide a way to store and retrieve structured data. Examples are Oracle RDBMS, IBM DB2, Microsoft SQL Server, PostgreSQL, MySQL, Apache CouchDB, and MongoDB. Application platforms are typically managed by systems managers specialized in the specific technology. 2.5 Infrastructure building blocks Figure 6: Infrastructure building block This book uses the selection of building blocks as depicted in Figure 6 to describe the infrastructure building blocks and concepts – the scope of this book. The following infrastructure building blocks are in scope: · End User Devices are the devices used by end users to work with applications, like PCs, laptops, thin clients, mobile devices, and printers. · Operating Systems are collections of programs that manage a computer’s internal workings: its memory, processors, devices, and file system. · Compute are the physical and virtual computers in the datacenter, also known as servers. · Storage are systems that store data. They include hard disks, tapes, Direct Attached Storage (DAS), Network Attached Storage (NAS), and Storage Area Networks (SANs). · Networking is used to connect all infrastructure components. This building block includes routers, switches, firewalls, WANs (wide area networks), local area networks (LANs), internet access, and VPNs (Virtual Private Network), and (on the network application level) networking services like DNS, DHCP, and time services, necessary for the infrastructure to work properly. · Datacenters are locations that host most IT infrastructure hardware. They include facilities like uninterruptible power supplies (UPSs), Heating, Ventilation, and Air Conditioning (HVAC), computer racks, and physical security measures. Please note that these building blocks are not per definition hierarchically related. For instance, servers need both networking and storage, and both are equally important. Infrastructure management includes processes like ITIL and DevOps, and tools for monitoring, backup, and logging. 2.6 Non-Functional attributes Figure 7: Non-Functional attributes An IT system does not only provide functionality to users; functionality is supported by non-functional attributes. Non-functional attributes result from the configuration of all IT system components, both at the infrastructure level and above. Although many other non-functional attributes are defined, as described in chapter 4, availability, performance, and security are almost always the essential ones in IT infrastructure architectures (Figure 7). 3 CLOUD COMPUTING AND INFRASTRUCTURES In recent years, we have seen the widespread adoption of cloud computing. Cloud computing can be seen as one of the most important paradigm shifts in computing in recent years. Many organizations now have a cloud-first strategy and are taking steps to move applications from their own on- premises datacenters to the cloud managed by cloudproviders. The term cloud is not new. In 1997, Ramnath Chellappa of the University of Texas already stated: Computing has evolved from a mainframe-based structure to a network- based architecture. While many terms have appeared to describe these new forms, the advent of electronic commerce has led to the emergence of 'cloud computing‘. While there are many public cloud service providers today, the three largest are Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Together, these three have 66% of the market share and have a large number of datacenters around the world. Figure 8 shows when each of these cloud providers started. Figure 8: Cloud time line The three major cloud providers offer similar services, but sometimes under different names. For instance, a virtual machine in Azure is just called a virtual machine, but in GCP it is called a Compute Engine and in AWS it is called an EC2 instance. While cloud computing can be seen as the new infrastructure, many organizations will be using on-premises infrastructure for many years to come. Migrating a complex application landscape to a cloud provider is no simple task and can take years. And maybe an organization is not allowed to take all its applications to the cloud. In many cases, there will be a hybrid situation, with part of the infrastructure on-premises and another part in one or more clouds. Please be aware that the cloud is just a number of datacenters that are still filled with hardware – compute, networking and storage. Therefore, it is good to understand infrastructure building blocks and principles even when moving to the cloud, 3.1 Cloud definition The most accepted definition of cloud computing is that of the National Institute of Standards and Technology (NIST): Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources(e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.. It is important to realize that cloud computing is not about technology; it is an outsourcing business model. It enables organizations to cut cost while at the same time focusing on their primary business – they should focus on running their business instead of running a mail server. Clouds are composed of five essential characteristics, four deployment models, and three service models. 3.2 Cloud characteristics Essential cloud characteristics are: · On demand self-service – As a result of optimal automation and orchestration, minimal systems management effort is needed to deploy systems or applications in a cloud environment. In most cases, end uses can configure, deploy, start and stop systems or applications on demand. · Rapid elasticity – A cloud is able to quickly scale-up and scale- down resources. When temporarily more processing power or storage is needed, for instance as a result of a high-exposure business marketing campaign, a cloud can scale-up very quickly on demand. When demand decreases, cloud resources can rapidly scale down, leading to elasticity of resources. · Resource pooling – Instead of providing each application with a fixed amount of processing power and storage, cloud computing provides applications with resources from a shared pool. This is typically implemented using virtualization technologies. · Measured service – In a cloud environment the actual resource usage is measured and billed. There are no capital expenses, only operational expenses. This in contrast with the investments needed to build a traditional infrastructure. · Broad network access – Capabilities are available over the network and accessed through standard mechanisms. Be aware that when using public cloud based solutions, the internet connection becomes a Single Point of Failure. Internet availability and internet performance becomes critical and redundant connectivity is therefore key. 3.3 Cloud deployment models A cloud can be implemented in one of four deployment models. · A public cloud deployment is delivered by a cloud service provider, is accessible through the internet, and available to the general public. Because of their large customer base, public clouds largely benefit from economies of scale. · A private cloud is operated solely for a single organization, whether managed internally or by a third-party, and hosted either on premises or external. It extensively uses virtualization and standardization to bring down systems management cost and staff. · A community cloud is much like a private cloud, but shared with a community of organizations that have shared concerns (like compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination, and it may exist on or off premises. · In a hybrid cloud deployment, a service or application is provided by a combination of a public cloud, and a community cloud and/or a private cloud. This enables running generic services (like email servers) in the public cloud while hosting specialized services (like a business specific application) in the private or community cloud. 3.4 Cloud service models Clouds can be delivered in one of three service models: · Software-as-a-Service (SaaS) delivers full applications that can be used by business users, and need little or no configuration. Examples are Microsoft Office365, LinkedIn, Facebook, Twitter, and Salesforce.com. · Platform-as-a-Service (PaaS) delivers a scalable, high available, open programming platform that can be used by developers to build bespoke applications that run on the PaaS platform. Examples are Microsoft Azure Cloud Service and Google App Engine. · Infrastructure-as-a-Service (IaaS) delivers (virtual) machines, networking, and storage. The user needs to install and maintain the operating systems and the layers above that. Examples are Amazon Elastic Cloud (EC2 and S3) and Microsoft Azure IaaS. The following figure shows the responsibility of the cloud provider for each service model. Figure 9: Cloud provider responsibilities In the context of this book, IaaS is the most relevant service model. When we combine both deployment and service models, we get the following picture. Figure 10: Cloud models Because of the scope of this book, the next section describes Infrastructure as s Service in more detail. 3.5 Infrastructure as a Service (IaaS) Infrastructure as a Service provides virtual machines, virtualized storage, virtualized networking and the systems management tools to manage them. IaaS can be configured using a graphical user interface (GUI), a command line interface (CLI), or application programming interfaces (APIs). IaaS is typically based on cheap commodity white label hardware. The philosophy is to keep the cost down by allowing the hardware to fail every now and then. Failed components are either replaced or simply removed from the pool of available resources. IaaS provides simple, highly standardized building blocks to applications. It does not provide high availability, guaranteed performance or extensive security controls. Consequently, applications running on IaaS should be robust to allow for failing hardware and should be horizontally scalable to increase performance. In order to use IaaS, users must create and start a new server, and then install an operating system and their applications. Since the cloud provider only provides basic services, like billing and monitoring, the user is responsible for patching and maintaining the operating systems and application software. Not all operating systems and applications can be used in an IaaS cloud; some software licenses prohibit the use of a fully scalable, virtual environment like IaaS, where it is impossible to know in advance on which machines software will run. 3.6 Edge computing The goal of edge computing is to bring computing power and data storage closer to where it is needed, rather than relying on a cloud or on-premises datacenter. In edge computing, compute and storage take place on devices at the edge of the network, such as routers, gateways, switches, and sensors. Edge computing can be a viable option where low latency, high bandwidth, and real-time processing are critical. For example, in the case of autonomous vehicles, real-time decision making is critical for safety. In this scenario, edge computing can enable the vehicle to process data and make decisions locally, rather than sending all sensor data to a centralized datacenter. Edge computing is also gaining popularity in Internet of Things (IoT) applications, where a large number of devices generate data that must be processed in real time. By using edge computing, organizations can reduce the amount of data that needs to be sent to the cloud, which can reduce costs and improve performance. PART II - NON FUNCTONAL ATTRIBUTES It's hardware that makes a machine fast. It's software that makes a fast machine slow. Craig Bruce 4 INTRODUCTION TO NON- FUNCTIONAL ATTRIBUTES 4.1 Introduction IT infrastructures provide services to applications. Some of these infrastructure services can be well defined, like providing disk space, or routing network messages. Non-functional attributes, on the other hand, describe the qualitative behavior of a system, rather than specific functionalities. Some examples of non-functional attributes are scalability, reliability, stability, testability, and recoverability. But in my experience, the three most important non-functional attributes for IT infrastructures are security, performance, and availability. Therefore, for each topic described in this book, these three non-functionals attributes are explicitly addressed. Non-functional attributes are very important for the successful implementation and use of an IT infrastructure, but in projects, they rarely get the same attention as the functional services. Not everybody is aware of the value of pursuing non-functional attributes. The name suggests they have no function. But of course, these attributes do have a function in the business process, and usually a fairly large one. For instance, when the infrastructure of a corporate website is not performing well, the visitors of the website will leave, which has a direct financial impact on the business. When credit card transactions are not stored in a secure way in the infrastructure, and as a result leak to hackers, the organization that stored the credit card data will have a lot of explaining to do to their customers. So, non-functional attributes are very functional indeed, but they are not directly related to the primary functionalities of a system. Instead of the term non-functional requirement, it would be much better to use the term quality attributes or implicit requirements. Although these terms much better reflect the nature and importance of, for example, performance, security, and availability, the term non-functional requirement (as expressed in non-functional requirements or NFRs) is more commonly used and widely known. Therefore, in this book I keep on using the term non- functional attribute, although I do realize that the term could be misleading. While architects and certainly infrastructure specialists are usually very aware of the importance of non-functional attributes of their infrastructure, many other stakeholders may not have the same feelings about it. Users normally think of functionalities, while non-functional attributes are considered a hygiene factor and taken for granted (“Of course, the system must perform well”). Users of systems most of the time don’t state non- functional attributes explicitly, but they do have expectations about them. An example is the functionality of a car. A car has to bring you from A to B, but many quality attributes are taken for granted. For instance, the car has to be safe to drive in (leading to the implementation of anti-lock brakes, air bags, and safety belts) and reliable (the car should not break down every day), and the car must adhere to certain industry standards (the gas pedal must be the right-most pedal). All of these extras cost money and might complicate the design, construction, and maintenance of the car. While all clients have these non- functional requirements, they are almost never expressed as such when people are ordering a new car. 4.2 Non-functional Requirements It is the IT architect or requirements engineer’s job to find implicit requirements on non-functional requirements. This can be very hard, since what is obvious or taken for granted by customers or end users of a system is not always obvious to the designers and builders of that system. Not to mention the non-functional requirements of other stakeholders, such as the existence of service windows or monitoring capabilities, which are important requirements for systems managers. It is important to remember that the acceptance of a system is largely dependent on the implemented non-functional requirements. A website can be very beautiful and functional, but if loading the site (performance, a non- functional requirement) takes 30 seconds, most customers are gone! A large part of the budget for building an infrastructure is usually spent in fulfilling non-functional requirements that are not always clearly defined ("The system must, of course, work seamlessly with existing systems" or "The website should always be available"). Most stakeholders have no idea how difficult it can be to achieve a particular non-functional requirement. It sometimes helps to quantify these requirements; to make them explicit: “How bad would it be if the website was not available for 5 minutes per day?” or “What if it will take $500,000 to implement this requirement? Is it still important then?” Many of the non-functional attributes of an application are delivered by the infrastructure. An application using an IT infrastructure built with several single points of failure will probably not reach very high availability figures, no matter how well the application is built. And when the IT infrastructure is not designed to be scalable, the applications built upon it cannot introduce scalability as an afterthought. The other way around is also true. When an IT infrastructure is setup to be highly available, a badly designed application can make the end result highly unreliable. Similarly, security flaws on the processes level can undo all security measures taken in the infrastructure. This makes it very important to consider the all design decisions when it comes to non-functional attributes. It is not unusual to have conflicting non-functional requirements in a system. A classic example is security versus user friendliness. Users expect highly secured systems, but really don’t want to be bothered by password changes, smart card authentication, and other annoying security measures. The same goes for performance and cost. Getting a high-performance system usually means getting more and faster hardware, and using strict implementation rules. This leads to higher cost, which is usually not in line with some requirement about the cost of the infrastructure. It is the infrastructure architect’s responsibility to balance these conflicting non-functional requirements. The architect must present the stakeholders with these conflicting requirements and their consequences, so they can make well informed decisions. In the following chapters the three most important infrastructural non- functional attributes are discussed in more detail: availability, performance, and security. Each of these topics are too complex to fully explain in this book. Many good books and articles are written about them, some of which are recommended in the appendix. In this book, I only describe those aspects of availability, performance and security that are strongly related to IT infrastructures. 5 AVAILABILITY CONCEPTS 5.1 Introduction Everyone expects their infrastructure to be available all the time. In this age of global, always-on, always connected systems, disturbances in availability are noticed immediately. A 100% guaranteed availability of an infrastructure, however, is impossible. No matter how much effort is spent on creating high available infrastructures, there is always a chance of downtime. It's just a fact of life. As Werner Vogels, CIO of Amazon.com keeps reminding us: “Everything fails, all the time”. This chapter discusses the concepts and technologies used to create high available systems. It includes calculating availability, managing human factors, the reliability of infrastructure components, how to design for resilience, and – if everything else fails – business continuity management and disaster recovery. Figure 11: Availability in the infrastructure model 5.2 Calculating availability In general, availability cannot be calculated or guaranteed in advance. It can only be reported after the fact, when a system has been running for a few years. This makes designing for high availability a complicated task. Fortunately, much knowledge and experience has been gained over the years on how to design highly available systems, using design patterns such as failover, redundancy, structured programming, avoiding single points of failure (SPOFs), and implementing sound system management. But first, let's discuss how availability is expressed in numbers. 5.2.1 Availability percentages and intervals The availability of a system is usually expressed as a percentage of uptime in a given time period (usually one year or one month). The following table shows the maximum downtime for a particular percentage of availability. Availability % Downtime Downtime Downtime per year per month per week 99.8% 17.5 hours 86.2 20.2 minutes minutes 99.9% ("three 8.8 hours 43.2 10.1 nines") minutes minutes 99.99% ("four 52.6 4.3 minutes 1.0 minutes nines") minutes 99.999% ("five 5.3 minutes 25.9 6.1 seconds nines") seconds Table 1: Availability levels Typical requirements used in service level agreements today are 99.8% or 99.9% availability per month for a full IT system. To meet this requirement, the availability of the underlying infrastructure must be much higher, typically in the range of 99.99% or higher. 99.999% uptime is also known as carrier grade availability; this level of availability originates from telecommunication system components (not full systems!) that need an extremely high availability. Higher availability levels for a complete system are very uncommon, as they are almost impossible to reach. To compare, the electricity supply in my home country, the Netherlands, is very reliable. In recent years, the average outage per household has been 24 minutes per year. This is equivalent to an availability of 99.995%. The average availability of electricity in the USA is 99.96%. While 99.9% uptime means 525 minutes of downtime per year, this downtime should not occur in one event, nor should one-minute downtimes occur 525 times a year. It is therefore good practice to agree on the maximum frequency of unavailability. An example is shown in Table 2. Unavailability (minutes) Number of events (per year) 0–5