Cloud Computing Course Book PDF
Document Details
Uploaded by AltruisticForgetMeNot1092
IU Internationale Hochschule
2024
Christian Müller-Kett, Max Guhl, Ayman Kahlil, Robert Henri Horrion, Dominik Friedel
Tags
Related
- Unit 01 - Introduction to Cloud Computing and Cloud Service Models.pdf
- 3-Module 2 Cloud Service Models Introduction-25-07-2024.pdf
- ICT922 Digital Transformation and Cloud Computing Lecture 1 PDF
- Cloud Database & Analytics Services PDF
- AWS Architectures ING-4-SSIR-Part 1 PDF
- Data Architecture Lecture PDF
Summary
This course book covers the fundamentals of cloud computing, including cloud service models, risks, benefits, enabling technology (virtualization, containerization, storage, networks), serverless computing, and established cloud platforms (AWS, Azure, GCP). It also explores data science in the cloud, focusing on provider-independent tools.
Full Transcript
CLOUD COMPUTING DLBDSCC01 CLOUD COMPUTING MASTHEAD Publisher: IU Internationale Hochschule GmbH IU International University of Applied Sciences Juri-Gagarin-Ring 152 D-99084 Erfurt Mailing address: Albert-Proeller-Straße 15-19 D-86675 Buchdorf [email protected]...
CLOUD COMPUTING DLBDSCC01 CLOUD COMPUTING MASTHEAD Publisher: IU Internationale Hochschule GmbH IU International University of Applied Sciences Juri-Gagarin-Ring 152 D-99084 Erfurt Mailing address: Albert-Proeller-Straße 15-19 D-86675 Buchdorf [email protected] www.iu.de DLBDSCC01 Version No.: 001-2024-0812 Christian Müller-Kett Cover Image: Adobe Stock. © 2024 IU Internationale Hochschule GmbH This course book is protected by copyright. All rights reserved. This course book may not be reproduced and/or electronically edited, duplicated, or dis- tributed in any kind of form without written permission by the IU Internationale Hoch- schule GmbH (hereinafter referred to as IU). The authors/publishers have identified the authors and sources of all graphics to the best of their abilities. However, if any erroneous information has been provided, please notify us accordingly. 2 TABLE OF CONTENTS CLOUD COMPUTING Introduction Signposts Throughout the Course Book............................................. 6 Basic Reading.................................................................... 7 Further Reading.................................................................. 8 Learning Objectives.............................................................. 10 Unit 1 Introduction to Cloud Computing 11 Author: Max Guhl, Christian Müller-Kett 1.1 Fundamentals of cloud computing............................................ 12 1.2 Cloud Service Models........................................................ 23 1.3 Risks and Benefits........................................................... 30 Unit 2 Enabling Technology 45 Author: Christian Müller-Kett, Ayman Kahlil 2.1 Virtualization and Containerization............................................ 46 2.2 Storage Technologies........................................................ 55 2.3 Networks and RESTful services................................................ 61 Unit 3 Serverless Computing 79 Author: Robert Henri Horrion 3.1 Introduction to serverless computing.......................................... 80 3.2 Benefits..................................................................... 92 3.3 Limitations.................................................................. 93 Unit 4 Established Cloud Platforms 97 Author: Dominik Friedel, Christian Müller-Kett 4.1 General overview............................................................ 99 4.2 Amazon Web Services....................................................... 103 4.3 Microsoft Azure............................................................. 108 4.4 Google Cloud Platform...................................................... 114 4.5 Platform comparison........................................................ 121 3 Unit 5 Data Science in the Cloud 133 Author: Christian Müller-Kett 5.1 Provider-independent services and tools...................................... 137 5.2 Google data science and machine learning services............................ 142 5.3 Amazon Web Services data science and machine learning services.............. 146 5.4 Microsoft Azure data science and machine learning services.................... 154 Backmatter List of References............................................................... 164 List of Tables and Figures........................................................ 181 4 INTRODUCTION WELCOME SIGNPOSTS THROUGHOUT THE COURSE BOOK This course book contains the core content for this course. Additional learning materials can be found on the learning platform, but this course book should form the basis for your learning. The content of this course book is divided into units, which are divided further into sec- tions. Each section contains only one new key concept to allow you to quickly and effi- ciently add new learning material to your existing knowledge. At the end of each section of the digital course book, you will find self-check questions. These questions are designed to help you check whether you have understood the con- cepts in each section. For all modules with a final exam, you must complete the knowledge tests on the learning platform. You will pass the knowledge test for each unit when you answer at least 80% of the questions correctly. When you have passed the knowledge tests for all the units, the course is considered fin- ished and you will be able to register for the final assessment. Please ensure that you com- plete the evaluation prior to registering for the assessment. Good luck! 6 BASIC READING Lisdorf, A. L. (2021). Cloud Computing Basics: A non-technical introduction. Apress, NY DOI: 10.1007/978-1-4842-6921-3 Chellammal A., & Pethuru R. C. (2019). Essentials of Cloud Computing : A Holistic Perspec- tive. Springer. 7 FURTHER READING UNIT 1 Jayeola, O., Sidek, S., Rahman, A. A., Mahomed, A. S. B. & Hu, J. (2022). Cloud Computing Adoption in Small and Medium Enterprises (SMEs): A Systematic Literature Review and Directions for Future Research. International Journal of Business and Society, 23(1), 226–243. https://doi.org/10.33736/ijbs.4610.2022 Jahangard, L. R. & Shirmarz, A. (2022). Taxonomy of green cloud computing techniques with environment quality improvement considering: a survey. International Journal of Energy and Environmental Engineering. https://doi.org/10.1007/s40095-022-00497-2 UNIT 2 Leszko, R. (2022). Continuous Delivery with Docker and Jenkins: Create secure applications by building complete CI/CD pipelines, 3rd Edition (3rd ed.). Packt Publishing. Chapter 2 Zadka, M. (2019). DevOps in Python: Infrastructure as Python. Apress, NY. Chapters 12, 13 UNIT 3 Schleier-Smith, J., Sreekanti, V., Khandelwal, A., Carreira, J., Yadwadkar, N. J., Popa, R. A., Gonzalez, J. E., Stoica, I. & Patterson, D. A. (2021). What serverless computing is and should become. Communications of the ACM, 64(5), 76–84. https://doi.org/10.1145/340 6011 Baldini, I., Castro, P., Chang, K., Cheng, P., Fink, S., Ishakian, V., Mitchell, N., Muthusamy, V., Rabbah, R., Slominski, A. & Suter, P. (2017). Serverless computing: Current trends and open problems. In Research advances in cloud computing (pp. 1-20). Springer, Singa- pore. UNIT 4 Bisong, E. (2019). Building machine learning and deep learning models on Google cloud platform. Berkeley, CA, USA: Apress. Chapter 2 Amazon Web Services. (n. d.). Amazon Web Services Documentation. Amazon Web Services. h ttps://docs.aws.amazon.com/ Google Cloud. (n.d.). Google Cloud Documentation. Google Cloud. https://cloud.google.co m/docs 8 Microsoft Azure. (n.d.). Azure Documentation. Microsoft Docs. https://docs.microsoft.com/e n-us/azure/?product=popular UNIT 5 Jainani, P. (2021, 14. December). Azure Machine Learning Service: Part 1 — An Introduction. h ttps://towardsdatascience.com/azure-machine-learning-service-part-1-an-introductio n-739620d1127b Nair, V. (2021, 27. December). AWS SageMaker - Towards Data Science. https://towardsdata science.com/aws-sagemaker-db5451e02a79 9 LEARNING OBJECTIVES We live in the data age, and collecting, storing, and analyzing data are so closely inte- grated with our everyday lives that it is hard to imagine what our societies looked like without these technologies only a few years ago. Whether we plan our trip for our next vacation, check our banking account, order something online, communicate with each other, or take a picture of our family. We constantly generate data and use services that store and analyze massive datasets. This cannot be done on a single device, and the data revolution would not have been pos- sible without the distribution of storage and computation over multiple machines. This concept is taken to the extreme in the cloud, with multiple data centers managed by large companies, such as Amazon, Google, Microsoft, or others, providing storage and computa- tional resources distributed over almost unlimited virtual machines. Going into the cloud means we can leverage the advantages of data and computational partitioning over multi- ple nodes, such as increased reliability and scalability of our systems. In this course, we will obtain the knowledge and skills to store and analyze our data in the cloud. We start with an introduction to cloud computing. Considering the fundamentals of cloud computing, we will understand how the cloud works in principle. We will see differ- ent cloud service models and discuss the benefits and risks of moving to the cloud. We will then dive deeper into more technical aspects and enabling technology typically used in the cloud context. Here, we will learn how to use virtualization and containeriza- tion as layers of abstraction, which are particularly useful in cloud systems. We will also learn about storage technologies, networks, and RESTful services. As one of the easiest and quickest ways of deploying a simple custom application in the cloud, we will learn how to use serverless computing. After an introduction to serverless computing and some practical exercises, we will also discuss the benefits and limitations of this technology. Finally, we will investigate popular cloud providers and the services offered on the respec- tive platforms. There are many cloud providers, but three of them are considered "hyper- scalers" due to their popularity and market share: Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. We will obtain a general overview of the serv- ices offered by each cloud provider and conduct a systematic platform comparison. As all providers offer specialized services for data science in the cloud, we investigate these data science and machine learning services in more detail and learn how to use provider-inde- pendent services and tools. Upon completing this course, we will have a broad overview of how the cloud works in principle, which technologies are typically used, and how the big providers offer their services. These are highly demanded skills on the job market and will also change the way you think whenever you upload a picture to the cloud. 10 UNIT 1 INTRODUCTION TO CLOUD COMPUTING STUDY GOALS On completion of this unit, you will be able to... – describe the main concepts of cloud computing. – explain the cloud delivery and service models. – decide on a cloud service model for your purpose. – discuss what to consider when choosing a cloud provider. 1. INTRODUCTION TO CLOUD COMPUTING Case Study A couple of friends start the music streaming service MyStream. This service offers 5 mil- lion songs, which require 1TB of storage. Making the storage available from local infra- structure does not seem to be a problem. So, they decided to market the service using their local infrastructure in Berlin. Their direct environment and customers from the Berlin area are enthusiastic. The requests grow within a few weeks from 1,000 calls/day to 10,000 calls/day. New customers from Denmark and France are also interested in the service but complain about long loading times. Therefore, MyStream has decided to invest the newly raised capital in new hardware for new locations in Paris and Copenhagen. However, it quickly becomes clear that the ordered hardware will only be available in a few months and will not be sufficient for the increasing demand. After several months, customer demand for music from MyStream has decreased. In the meantime, the team was mainly busy building the infrastructure and service in the new regions from scratch. An infrastruc- ture configuration mistake caused a temporary failure of the servers in Berlin and caused the company to stumble. Together with the investors, the teams decide to withdraw the service from the market and plan a new market approach. A scenario like this or some- thing similar happens more often than you might think. As exaggerated as the example of MyStream may sound, cloud technologies can avoid many of the mistakes in the example, and knowing about cloud concepts is a critical factor for the success of digital business models. To avoid finding yourself in the same situation as MyStream, in this unit you will learn: What is the basic terminology in the cloud context? Which cloud and infrastructure solutions are available? What special skills are needed? Which service models are available in the cloud? Which advantages are expected from moving to the cloud, and which disadvantages must be considered? 1.1 Fundamentals of cloud computing In practice, we will encounter many terms taken for granted in specialist circles and some- times misused by non-technically oriented people. In the following, we discuss some of the most widely used terms, such as local computing, On-Premises, Private, Public, Multi, Hybrid, and Poly-cloud. These will help us match solutions and their variations correctly and communicate appropriately. 12 Definition of the fundamental terms We start our journey to the cloud by clarifying what we mean by cloud at different levels, delimiting local from cloud computing. As can be seen in the following figure, we distin- guish between local computing, on-premises, private cloud, public cloud, multi-cloud, hybrid cloud, and poly-cloud. 13 Figure 1: Overview of cloud delivery models Source: Max Guhl, 2023 14 Local Compute Local computing refers to using computer hardware and software locally in a specific geo- graphic location. This can include everything from personal computers and laptops to servers. These can be operated centrally at a workplace or in a small, dedicated and cooled room. A connection to the internet is not mandatory. However, if it is required, it must be managed locally. The uncomplicated access speaks for local computing, but its performance and possibilities for collaboration are limited. On-Premises We talk about on-premises when servers are operated professionally and locally, or in a data center, mostly at geographic locations close to the organization's centre of opera- tions. Depending on whether the data center is self-operated or rented, we must take care of the supply of the IT components (electricity, cooling, cabling, etc.) ourselves. There is complete control over the IT components when selecting and providing them. Since resources such as storage (e.g., EMC, Netapp), compute (e.g., HPE, IBM, Fujitsu), and net- work (e.g., Cisco, Brocade, Juniper Networks) are not operated at the locations of the users and administrators, a remote connection is required. A connection via the internet is not mandatory. However, if needed, it must be managed at the on-premises location, which acts as a gateway. The main advantage of an on-premises option is that companies have complete control over their hardware and data and how they use them. There is also no dependency on an internet connection to access data and applications (for example, by intranet access). The main disadvantage of on-premises solutions is that it is more expensive and slow to set up than cloud-based solutions. Setting up infrastructure on- premises requires hardware to be bought or rented, which is more costly than deploying pay-as-you-go cloud services, at least in the starting phase. This might or might not change according to long-term usage patterns; still, setting up on-premises solutions is typically more expensive to begin with. We will return to this aspect when we discuss the risks and benefits of cloud computing in more detail. Also, security aspects remain entirely the responsibility of the organization. This can be seen as an advantage and disadvantage, an aspect we will discuss in more detail later in this unit, too. Private Cloud Private clouds are enterprise data center resources that are provisioned for exclusive use by a single organization via a service provider such as IONOS, HPE, IBM, T-Systems, or AWS. The provider is responsible for securing the desired resources in one of its own or customer-specified locations. In practice, however, there are two standard provision mod- els offered by providers: All physical resources (server, storage, network, etc.) will be dedicated to a single organ- ization. This is often called “private cloud” Resources will be provided via a shared platform, in which individual areas, such as storage or network components, are used for several customers, as can be found at AWS and T-Systems. This is often called “public cloud” 15 Users and administrators must access the services hosted in the private cloud remotely. This could be an internet connection, but it is not mandatory. A private or virtual private network can also be used. However, if an internet connection is required, it must be man- aged at the private cloud location, which acts as a gateway. Private clouds can also be more flexible and customizable to an organization's needs. They are mainly used to expand resources or increase the availability of applications for internal use (or decrease it dynamically by temporarily adapting to required computing or storage capacities). However, private clouds can also be more expensive to set up and maintain than public clouds. They may also need more IT resources to manage and oper- ate effectively. Public Cloud The public cloud delivers computing and storage services over the internet. These services are offered on a pay-as-you-go basis and are usually scalable and elastic. Public cloud services are provided by vendors, such as Amazon Web Service (AWS), Microsoft Azure, and Google Cloud Platform (GCP). These three cloud providers are considered Hyperscal- Hyperscalers ersThe IT resources are mainly provided using an “as–a–service” model. A more detailed Since AWS is leading the description will follow later in this unit. cloud computing market with 33%, Azure with 21%, and GCP with 10%, Since public cloud providers are responsible for delivering the services, they also have these three providers are complete control over the physical IT components and facilities used. However, providers also called "Hyperscal- ers" (Richter, 2022). must ensure contractually guaranteed quality features with the selected components. As the services are delivered over the internet, an internet connection is almost always a prerequisite for the accessibility of IT resources. However, public cloud providers also offer connection options equivalent to dedicated connections with private clouds, e.g., virtual private network connections. In addition, internet access via the public cloud can be restricted or prohibited altogether. Therefore, in practice, the definition of private and public may not be entirely clear. Sometimes, this distinction is made based on the physi- cal hardware dedicated to one organization only (private cloud) or shared amongst organ- izations via virtualization techniques (public cloud), but, in practice, also this distinction is not precisely actual for all use cases. There are many advantages to using public cloud services. They are usually cheaper than traditional on-premises services (some reasons for this are discussed at the end of this unit), more flexible, and more scalable. Public cloud services are ideal for businesses that have fluctuating or unpredictable workloads. However, there are also some disadvantages to using public cloud services. They can be less secure than local computing, on-premises, or private cloud services, and they can also be less reliable (while the opposite might also be true depending on the use case). Public cloud services can also be complex to manage and migrate to other platforms. The following are some of the most popular public cloud providers: 16 AWS: Created in 2006, it is a cloud services platform by Amazon that has lasted the lon- gest. What started as an internal IT infrastructure solution has become the world’s most extensive and broadly used cloud platform, offering over 200 managed services from data centers around the globe.(Amazon Web Services, n.d.-1a) Azure: In 2008, Microsoft launched its public cloud, Azure. Initially, it was intended to provide only Platform-as-a-Service (PaaS - explained in the next section) offerings to the clients. After the client's demand has raised, Infrastructure-as-a-Service (IaaS - also dis- cussed later), offerings have been added. (Foley, 2018) Today, it hosts all of Microsoft's cloud-based products, such as Office 365, Teams, Bing, and Xbox Live. (Foley, 2021) GCP: In 2008, Google also started into the public cloud market with the Google Cloud platform. Google released a service named “App Engine” to allow developers to run their applications while the App Engine takes care of the scaling. (Paul, 2023) Today, GCP is often used for machine learning (ML), deep learning (DL), and artificial intelli- gence (AI). For this purpose, Google offers its own AI processors (Tensor Processing Unit – TPU) and dedicated services, such as Vortex AI. Also, many open-source contribution have been made by the GCP developer teams to the community. (Shanthi, 2021) Alibaba Cloud: Alibaba Cloud is a subsidiary of Alibaba Group. Founded in 2009, it is now the market leader in China, with a market share of 37% (CIW Team, 2022). In recent years, it has also become one of the most important partners for non-Chinese compa- nies in Asia and Europe. In addition, there are many providers specialized in specific industries or competencies, such as Stack IT by Schwart IT for the Retail sector, the Open Telekom Cloud by T-Systems International GmbH as a sovereign European cloud, or OVHCloud by OVH SAS as a public cloud-oriented hosting provider. Hybrid Cloud A hybrid cloud is a computing environment combining multiple technologies, such as on- premises, private, or public cloud. It is comparable to a hybrid vehicle that uses two differ- ent power technologies: an electric motor and a combustion engine. The critical aspect is orchestrating and integrating the individual solutions. A seamless transition should be guaranteed. Depending on the selected cloud model, the connection to the IT infrastruc- ture and services via the internet or intranet must be handled in the same way as with a public cloud concept. In the following, some of the most known products of Hybrid Cloud solutions are listed: Azure Stack: Azure Stack is a subset of the Azure cloud platform designed to run entirely or in partly independently on a local network. Azure Stack includes the same features as the Azure cloud platform, including creating and managing virtual machines, storage accounts, and networking resources. AWS Outpost: AWS Outposts brings native AWS services, infrastructure, and operating models to on-premise data centers. Still, the local infrastructure cannot be operated without being connected to the cloud. VMware - Hybrid Cloud Solutions: VMware provides a virtualized infrastructure on- premises and in the public cloud, allowing virtual servers to be moved, outsourced, or secured between the two infrastructures. 17 Self-defined: Self-written scripts can also be used for a hybrid scenario. Infrastructure Infrastructure as Code as Code (IaC) software (e.g., Terraform, Ansible, Morpheus) is recommended for this, as (IaC) it allows reproducible systematic deployments and is compatible with the APIs of the Instead of considering infrastructure provision public cloud providers and most on-premises infrastructure. as a one-time operation, infrastructure architec- A typical scenario for applying a Hybrid Cloud solution could be: tures can be defined in standardized configura- tion files and deployed in Edge computing: Processing data from an autonomous test vehicle. The vehicle col- varying environments in lects up to 4 TB of data daily, stored locally. Before these data are transferred to the an automated way. Edge computing cloud, they are evaluated and aggregated on-site to reduce the amount of data for Edge computing refers to transfer. data processing by the Public cloud as overflow basin: Imagine our local computing resources are sized for smallest devices, such as cameras, individual serv- everyday use, and we expect out-of-normal situations, such as Black Friday sales. We ers, or mobile devices, at must quickly and temporarily adjust to an unknown demand. Here, the loss of business the location where the and the investment in new hardware are usually opposed. With the temporary use of data are generated. These devices act as a kind of the cloud, the additional load can be tackled without making a long-term investment. pre-qualifier for the data. The local devices can also Multi-Cloud be controlled remotely. Multi-Cloud means two or more cloud providers of the same technology, private or public, are used. The use of two or more public clouds is assumed in everyday life. It can be com- pared to a car fleet with vehicles from different manufacturers. The main intention is often to be cloud-agnostic, meaning an organization can switch between providers as they offer comparable services that can be used interchangeably. The connection must be handled in the same way as with a public cloud concept. This way, the risk of a vendor lock-in can be reduced, and the performance of individual applications can be maximized. In addi- tion, availability can also be increased: if one provider faces global problems, using the second provider is possible quickly. A possible scenario could be to operate an Azure SQL Database on Microsoft Azure and a Google Cloud SQL database in parallel. Besides avoiding vendor lock-in, another motivation for a multi-cloud approach can be Data gravity gravity of the data. For example, an organization might use different analysis tools for Data gravity refers to its particular use cases offered by two different providers. To bring these analyses as close to importance or value. Cer- tain circumstances may the data as possible, we might use well-integrated storage solutions from both providers bring applications as in parallel. close to the data as possi- ble. However, the increased number of public cloud providers results in the disadvantage of the growing complexity of the entire IT landscape. An increased need for data and applica- tions to be protected against unauthorized access accompanies this. Furthermore, it is necessary to know each used platform. Poly-Cloud Poly-cloud computing is a term used to describe the practice of using only individual com- ponents of different cloud platforms and providers for one application. “For instance, a business that uses a poly-cloud strategy will have all of its data on one cloud service and 18 all of its virtual machines on another cloud service.” (Isberto, 2021) It can be compared to a car with parts from different brands. The best parts of the respective brands were used in each case. This approach can give organizations greater flexibility and choice when selecting the right cloud solution. It can also help to reduce costs by allowing organizations to take advantage of the best deals and discounts from multiple providers. As one example, an organization might use Amazon Web Services for its computing and storage needs, while Microsoft Azure might be used for its database needs. This approach can help ensure that an organization uses the best solution for each need rather than being forced to use a one-size-fits-all solution. However, one disadvantage is the need for a high-speed, reliable, and stable data connec- tion between the cloud providers' locations. For example, a system using resources from different continents or countries can become a significant challenge. Even if tests within a country have already shown promising results, using different providers at the exact geo- graphic location has been the only safe option. Also, the user's connection to the cloud must be handled similarly to a public cloud concept but needs to be extended by the con- nectivity across all used cloud locations and providers. Another disadvantage is the increased costs of the network connection and the data traffic, which may be subject to a fee. Lastly, operations (e.g., maintenance and troubleshooting) are more complex due to the many providers involved and the need for a consistent security concept across all plat- forms. A typical scenario could be an outsourced database. Some providers specialize in the pro- vision of individual services. Oracle does the same with the Oracle Cloud Infrastructure (OCI). For cost reasons, operating an Oracle database on the OCI can make sense, while all other services that rely on the data are operated on AWS or Azure. This can be done via the internet or a virtual private network (VPN). Advantages & Disadvantages The table below gives an overview of the advantages and disadvantages of all cloud mod- els. One of the significant advantages of all cloud solutions is that computing and storage capacity can be distributed across multiple machines. This is called horizontal scaling, as opposed to increasing the resources of one machine, which is called vertical scaling. Com- plex tasks can be parallelized using a cluster of machines, allowing for the analysis of vast amounts of data. Also, storing redundant copies of data on several machines in a cluster increases the availability and reliability of these systems. Table 1: Overview of advantages and disadvantages of computing delivery models Pro Con Local Computing cheap, fast, and easily available accessibly only by local users more secure from external attacks minimal resources in amount and without an internet connection technology 19 Pro Con On-Premises complete control and a higher high upfront investment and amount of infrastructure maintenance efforts implementation of own security minimal resources in amount and standards technology long shipping times and planning horizon Private Cloud no risk of unexpected mainte- sometimes long shipping times nance costs and planning horizon higher availability limited by the provider resources secure expansion of IT landscapes in amount and technology Public Cloud no upfront costs and a usage- require always an internet connec- based billing model tion limitless scaling and access to the costs and safety traps in case of latest technology improper use reduced effort by As-a-Service vendor locki-n possible offerings Hybrid Cloud reduced upfront costs and a by chosen solution vendor-lock in usage-based billing model possible reduced outbound data traffic increased setup effort complete control of the storage location of data when stored on- premises Multi Cloud access to a wide range of scalable in addition to those of the public technologies cloud: increased Complexity in managing the IT landscape high effort to comply with security and compliance standards (although all providers offer sup- port and built-in security mecha- nisms) Poly-Cloud in addition to those of the public in addition to those of the public cloud: cloud: single components of an applica- high complexity to manage the IT tion get the infrastructure it needs landscape highest cost efficiency require very fast and low-latency access to a wide range of scalable connectivity technologies high effort to comply with security and compliance standards Source: Max Guhl, 2023 Cloud-native Even if cloud native is not a cloud delivery model, this term describes the successful use of cloud computing. Cloud-native architectures are designed to take advantage of the bene- fits of cloud computing. They are typically built using microservices, small, self-contained services that can be easily deployed and scaled. Cloud-native architectures are designed to be highly scalable, resilient, and, in the best case, even cloud-agnostic. This means they can be easily deployed across multiple cloud environments and automatically adapt to 20 changes in workloads or resources. Cloud-native architectures can help accelerate devel- opment processes and improve application performance. They can also make it easier to adopt new technologies and respond to changes in the market. In practical use, this would mean making use of: Technologies ◦ Container - to remain flexible when choosing the cloud platform ◦ Serverless – to trigger computing resources without managing infrastructure ◦ Automation – automate recurring tasks or deployments to reduce manual effort Services ◦ Ready-to-use databases or AI frameworks by a Platform-as-a-Service (PaaS) ◦ Advanced backup and high availability features to share workload automatically Redundancy ◦ Using multiple geographic locations and sharing the workload between various data centers, availability zones (AZ) and regions. Availability Zones (AZ) Availability Zones (AZ) define a set of two or New fundamental roles more data centers within one region. The data cen- ters are connected within As we have seen, the cloud's variety of delivery and service models is many times more an AZ with high-speed, extensive than on-premises or local computing environments. So, traditional IT adminis- low-latency connections. trators, even with their wide range of skills in the context of local and on-premises solu- Region tions, these skills might not fit for cloud solutions. As a result, a more narrow distinction of A region defines a geo- graphic location. It can be tasks assigned to several roles is common in cloud systems. linked to a country or continent and contains In the following, we focus on the roles that occur frequently in cloud contexts: Cloud Solu- one or more AZs. Confin- ing a system to a region tion Architects, Cloud Architects, Cloud Engineers, Cloud Developers, and Cloud Consul- helps fulfill the data pro- tants. It must also be pointed out that, in practice, these roles overlap regarding tasks and tection regulations of individual countries. naming. Also, the perspective on what can be expected by certain roles is somewhat sub- jective, culturally shaped, and use case-specific. Still, some characteristics are widespread and might help us to develop a more elaborate perspective on the collaborative structure of typical cloud projects. Cloud Solution Architect / Enterprise Architect A cloud solution architect is a professional responsible for designing an organization's cloud computing concept. A solution architect typically has a wide range of responsibili- ties, including assessing business needs and designing cloud architectures on a high level. They have a broad knowledge of individual cloud services' concepts and general functions and can combine them to fulfill the required business needs. Also known as Enterprise Architect, the result of their work is to describe the What and Why. They do not go into details but rather describe the properties and needs to be achieved with the architecture. Their work result can be in text or a diagram illustrating the data flow and dependencies between cloud components and services. For example, a Cloud Solution Architect might be assigned the task of designing the archi- tecture for a system with the following requirements: 21 The system needs to be shared across two Availability Zones (AZ) The primary data must be mirrored in both AZs. The preliminary test dataset will be held only in one of the AZs An internet connection is required in both AZs Server and storage for the front end can be placed in one AZ Server and Storage for the backend can be placed in one AZ, different from the front end... The Cloud Solution Architect considers these requirements and creates a solution that ful- fills as many of these constraints as possible. Cloud Architect The responsibilities of a Cloud Architect are similar to the ones of a Cloud Solution Archi- tect, such as assessing business needs, designing cloud architectures, implementing cloud solutions, and managing cloud resources. As a matter of fact, in many projects and organizations, these are not considered two separate roles. However, while a Cloud Solu- tion Architect is more focused on the system's design, Cloud Architects are more engaged in the implementation. They usually have hands-on experience with one or two cloud pro- viders, container technologies (e.g., Kubernetes, Docker), and command script languages (e.g., PowerShell, Bash). Their architectures are very detailed and describe an exact imple- mentation of the high-level design specified by a Solution Architect. Regardless of the size of the project or the company, a partial or complete overlap with the solution architect is possible. For example, a Cloud Architect might be assigned the task to implement an architecture for a system with the following requirements: The system will be placed in AZ1 and AZ2 in the region “North Europe” The used database is a Cosmos DB ◦ It needs to be placed in VPC 2 ◦ Its internal IP is 127.0.0.1 ◦ High availability and disaster recovery are activated ◦... Frontend ◦ In AZ1, VPC 9 ◦ Server size 4Core / 8GB Ram / 2x100GB standard storage ◦ External IP 127.0.0.2 ◦ …. HINT This is only an illustrative example, and the reader is not required to understand the mentioned specifications fully. 22 Cloud Engineering / System Administrator Cloud Engineers are responsible for building, operating, and maintaining an organiza- tion's cloud computing environment. They typically have a deep knowledge of cloud serv- ices and excellent hands-on experience with infrastructure on an operating system level (e.g., Windows, Linux). Their responsibilities include maintaining services and servers, executing changes and applying updates, and automating services for reliable operational processes and tasks. In many cases, Cloud Engineers are also the operational backbone of a system and operate service desks or support. Cloud Developer Cloud Developers do not necessarily have anything to do with the cloud at first sight. In the strict sense, a developer uses the provided services to create applications. The focus and responsibility are not on the underlying infrastructure but on developing and pro- gramming applications. In the cloud context, the developer is to be understood as some- one who develops applications or services that use APIs of the services operated in the cloud. To do this, it takes an in-depth knowledge of the command sets of the cloud provid- ers used for the provided services. Cloud Consultant / Cloud Strategist A cloud consultant is an individual who provides expert advice and guidance to organiza- tions on how to use cloud computing technologies best to achieve their goals. A Cloud Consultant is typically well-versed in various cloud platforms and services and profoundly understands how these technologies can support business objectives. Even if there are clear overlaps with the solution architect, the cloud consultant is more oriented towards the business point of view. They usually define concepts for new processes, build road- maps to determine how new services can be used for the client's benefit, help define a cloud strategy, or support clients in decision-making. 1.2 Cloud Service Models With the use of public clouds, the way required resources are provided is also changing towards a service delivery model, better known as “…-as-a-Service” (...aaS). As-a-service is a cloud service model that cloud providers use to make a specific application or service available to customers on a subscription basis. This type of service is typically delivered over the internet, making it accessible from anywhere in the world. The as-a-service model has become increasingly popular in recent years, offering several advantages over traditional software models. Perhaps the most considerable advantage is that it allows customers to pay only for consumed services instead of making a significant upfront investment in software licenses. Another advantage of as-a-service is that it is much easier to scale up or down (or in and out) as needed since no software is needed to install or maintain it. This makes it ideal for organizations with fluctuating or unpredictable needs. 23 Scale up / down Finally, as-a-service providers typically offer a high level of service and support, which can Scaling up/down be important for businesses that do not have in-house IT expertise. Here are a few things describes adding/remov- ing computing resources to remember when considering an as-a-service solution. to/from a (virtual) machine. First, it is essential to define the needed level of individual responsibility. Based on needs Scale in / out and skills, a higher- or lower-level service can be chosen. Second, we must ensure that the Scaling in/out describes adding/removing com- services are configured according to the organization's own and externally required secur- puting resources to/from ity guidelines. We should read the terms of the service carefully to understand what we are a cluster of machines by adding/removing getting and the provider's obligations. machines to/from the cluster. For a better understanding and overview of the as-a-services and their responsibilities and differences, the analogy about “Pizza-as-a-service" is very common. 24 Figure 2: Pizza-as-a-Service Source: Max Guhl, 2023, based on Kerrison 2017 25 On-premises refers to a situation where we do everything ourselves, like making a pizza from scratch. In the cloud computing context, this would be owning our own data center (or server). (Kerrison, 2017) Bare-metal as a Service (BMaaS) - It's like preparing a pizza at the campsite. Someone else provides only electricity for us to use with our own equipment. Cloud services using this model can usually be identified as Bare-metal or Dedicated hosts. Infrastructure as a Service (IaaS) - Comparable with sharing a kitchen, gas, electricity, and an oven are provided, but we have to make the pizza ourselves. Some cloud service exam- ples using this provisioning model are EC2 / Compute Engine / VMs, or Virtual Private Cloud (VPC). (Kerrison, 2017) Containers as a Service (CaaS) – We bring our own pizza to a pizza bakery, and they cook it for us. This includes cloud services such as Azure Kubernetes Service. (Kerrison, 2017) Platform as a Service - We order a pizza the restaurant prepares in its facilities. Typical cloud services using this model are Google App Engine and AWS Elastic Beanstalk. (Kerri- son, 2017) Function as a Service (FaaS / Serverless) - We and our friends go to a pizza place and order and eat pizza made by the restaurant. In addition, the restaurant takes care that our drinks are constantly refilled. Popular services using this provision model include AWS Lambda or Azure Functions. (Kerrison, 2017) Software as a Service (SaaS) – We are invited to someone’s house for a celebration. Pizza and drinks are provided, and our friends are also invited to meet with us. Conversations with other guests are still our job. Typical services using this model are Salesforce, Office 365, Gmail, and solutions provided via cloud providers‘ marketplaces. (Kerrison, 2017) In the following, we discuss four of the most popular provision models in more detail, i.e., IaaS, PaaS, SaaS, and Serverless. The provided examples should help us better under- stand the respective service model. However, the examples are not always 100% identical across providers. Infrastructure-as-a-Service (IaaS) Infrastructure as a service (IaaS) is the basic service of cloud computing. It provides vir- tualized computing resources that are accessible through the internet. Alongside plat- form as a service (PaaS) and software as a service (SaaS), it is one of the three primary services in cloud computing. IaaS providers allow customers to rent access to computing resources as needed. This gives organizations the flexibility to scale with their business demands without making a significant upfront investment in hardware. IaaS providers typically offer essential services, including storage, networking, and computing power. This allows businesses to run their applications and store their data in the cloud without managing any on-premises infrastructure, e.g., electricity, hardware maintenance, and facility management. IaaS is a popular option for businesses that want to migrate to the cloud, as it can be used to replace on-premises systems. 26 The disadvantage of these services is the high administration cost of the IT components. Extensive expertise and time are required to run and build a safe and reliable environ- ment. In addition, the cloud's price (e.g., pay-as-you-go) and functional advantages (e.g., autoscaling, cloud-native) are only used minimally. In other words, the cloud is used but not to its full potential. Example Services AWS EC2 / Azure Virtual Machine / GCP Compute Engine AWS EC2 ◦ These services represent cloud computing services providing users with on-demand EC2 is a playful acronym for Elastic Cloud Comput- access to virtual machines, which they can use to run applications and services. Vari- ing. It is also colloquially ous instance types are provided, which differ in CPU, memory, storage, GPU, and net- referred to as ECS, which working capacity. The service also allows users to select the operating system and is to be understood as Elastic Compute Service. software they wish to run on their instances. AWS & GCP Virtual Private Cloud / Azure Virtual Network ◦ These are examples of a connection between IaaS and PaaS, discussed in the follow- ing paragraph. A VPC is a private network that we can launch within the cloud. It is logically isolated from other virtual networks in the cloud. We can launch our instan- ces into a VPC and control the network traffic between them. With a VPC, we can define a network topology that resembles a traditional data center. We can also extend our VPC by connecting it to an on-premises network using a VPN or Direct Connect connection. Platform-as-a-Service (PaaS) PaaS, or Platform-as-a-Service, is a cloud service model that provides users features and functions as a platform to build, run, and manage applications. PaaS provides a ready-to- use environment in the cloud, allowing developers to concentrate on developing their applications rather than managing the infrastructure. PaaS providers offer a variety of services, such as storage (e.g., Object storage), databases (e.g., SQL or non-SQL), messag- ing (for machine-to-machine or machine-to-human communication), and application development frameworks (e.g., mobile app development). The disadvantage of these services is the higher risk of a vendor lock-in and the associ- Vendor lock-in ated hurdles of not being able to switch providers without great effort due to proprietary Vendor lock-in occurs when providers provide solutions. In addition, the terms of delivery by the cloud provider and the configuration unique solutions, func- must be considered to avoid compliance, security, or cost risks. tions, or licenses that should bind customers to them. This makes it much Example Services more challenging to switch to another pro- vider on purpose. GCP App Engine / AWS Elastic Beanstalk / Azure Web Apps (App Service) ◦ These PaaS solutions are designed to enable developers to implement web applica- tions quickly. They offer an additional abstraction layer on the infrastructure, provid- ing pre-installed and managed bundles of operating systems and development plat- forms, such as Python or Ruby. The main advantages are the ease of use and quick startup, enabling developers to focus on implementing the web app instead of man- aging the underlying infrastructure and setting up operating systems and software requirements. 27 AWS Dynamo DB/ Azure Cosmos DB ◦ In addition to services for app development, some database services are sometimes classified as PaaS. These solutions are built to reduce performance losses with large data amounts and offer high availability through replication into other regions. They are fully managed cloud databases and ready to use after initial straightforward con- figuration. Often, these solutions constitute multi-model databases, offering SQL and NoSQL APIs. AWS IoT Core / Azure IoT Hub / GCP Cloud IoT Core ◦ Sometimes, specialized platforms are also considered PaaS, such as the IoT solutions offered by cloud providers. These IoT services are fully managed services for securely connecting and managing IoT devices at scale. They provide a managed platform for registering, connecting, monitoring, managing, and controlling IoT devices such as sensors (e.g., light, pressure, temperature, humidity, movement), cameras, or lights. Note that these examples are supposed to broaden the understanding of what the term PaaS means, but a classification like this is somewhat subjective and use case-specific. There may also be a discrepancy in definitions on the part of customers and providers, which may not be surprising given the resulting responsibilities. In addition, a classifica- tion like this might not be as relevant as selecting the appropriate service for a particular use case. Be that as it may, we should realize the difference in responsibility and offered convenience levels between the different cloud service models. Software-as-a-Service (SaaS) Software-as-a-Service (SaaS) describes cloud-based software provided over the internet, allowing users to access and use the software remotely. It's a newer software delivery model in which users can use the software on all their local devices without needing to install it locally. The main benefit of SaaS is that it is much more convenient and flexible than traditional software. The only responsibility of the user is to configure the software to meet particular requirements. Apart from the fact that it can be used from any internet- enabled device anywhere, we usually can choose from various subscription plans. It can be purchased for a certain period or number of users; this allows a cost advantage for needs-based use and no upfront costs. Despite the many benefits of software-as-a-service, there are also some challenges. One challenge is that users depend entirely on the internet connection to use the software. If there is no internet connection, the software will work slowly or not at all. Another chal- lenge is that users cannot always customize the software to their specific needs. While most SaaS solutions offer a variety of subscription plans, users may be unable to find one that meets their particular needs. Despite the challenges, SaaS is a convenient and afford- able option for many users. Also, as updates are not the end-users reliable, this increases security by quickly closing security gaps. It is a flexible way to use software without having to install or maintain it, and it is usually much cheaper than traditional software. Some argue that these models are inherently cheaper. However, this needs to be investigated since fixed terms and high individual license costs can become a cost trap. 28 Example Services Microsoft Office 365 / Google Workplace ◦ These cloud-based productivity suites include office tools, email, calendaring, and drive storage. Businesses can improve communication and collaboration while accessing tools to increase productivity. In addition, some of the services can be used as a registration option for SaaS offers from other providers. Salesforce / SAP S/4HANA Cloud ◦ Salesforce Customer Relation Management (CRM) and SAP ERP are cloud-based plat- Customer relationship forms that allow businesses to manage their customer data, sales, marketing efforts, management (CRM) CRMs are systems for and resource planning. Both provide AI-supported features such as chatbots or ana- managing customer data lytics to provide further insights into the collected data. These features place unique and interactions. They demands on the hardware (e.g., fast CPU and GPU), so using them as a service can be can track customer behavior, sales, and mar- advantageous. keting campaigns. AWS & Azure Marketplace Enterprise Resource ◦ These marketplaces are not SaaS offers. However, like app stores, they allow third- Planning (ERP) ERPs are software sys- party providers to make their SaaS offerings available to cloud users. The market- tems that help businesses place or service providers handle the payment and offer various subscription plans manage their core opera- (e.g., pay as you go, per instance, per core). The range of services goes from custom- tions, such as accounting, human resources, and ized operating system images to VPN solutions and unique AI solutions. customer relationship management. Serverless Serverless is a way to build and run applications without managing or provisioning serv- ers. With serverless, we can concentrate on writing code that runs in the cloud without provisioning or managing servers. Our code is packaged into “functions” and then deployed to a cloud provider, where it runs in response to events. To fulfill the needed resources requested by our functions, the cloud provider's serverless service takes care of providing the right resources. Serverless is a relatively new concept often used with other cloud services. Examples include AWS Lambda, GCP Cloud Functions, and Azure Func- tions. A significant advantage of this service is the payment model. Since no infrastructure has to run permanently and functions only have to be paid for the execution time, there can be a significant cost advantage for the right application areas. Unfortunately, like everything else, this service also has some disadvantages. In serverless architectures, we have limited control over the underlying infrastructure, and changes to the system, e.g., the operating systems, are not possible. Also, it is recommended that we host only small pieces of code, self-contained functions, and those that do not require computationally heavy operations serverlessly. A possible application for serverless could be a personalized welcome message for a client when entering a store. When the member card is read upon entering, a sequence is initi- ated. This could trigger a database read, which can check the access authorization and display a personalized welcome message with the client's name and picture on the front end. This scenario could be implemented within minutes using serverless services by uploading a simple piece of code to the service, e.g., in Python. 29 Example Services AWS Lambda / Azure Functions / GCP Cloud Functions ◦ These examples of serverless platforms enable users to run code without provision- ing or managing servers. They can store and run code for virtually any application or backend service. All that is required is a basic understanding of event-driven pro- gramming. 1.3 Risks and Benefits So far, we have learned much about cloud computing and service models. In the following section, we will examine the main advantages and disadvantages of using the cloud. Sup- ported by examples, this section should help us meet our cloud requirements correctly and ask the right questions when choosing the right solution. Advantages Cost reduction Statements such as “cloud computing is cheaper” are prevalent in practice. These state- ments are valid if we know how to use the cloud appropriately. Fluctuations in capacity utilization are always to be expected when using applications daily. Depending on the application, the usage of the resources can vary between day and night, weekdays and weekends, public holidays, and year-end closing. A constant system load of 100% is realistic only in rare cases (e.g., with high-performance computing). This variability cannot be accounted for by self-purchased and operated hardware. For the dimensioning or calculation, the maximum expected workload is used for self-procured hardware - in contrast to the cloud. Custom-owned high-end hardware is not fully utilized most of the time, and the initial investment is high. In addition, there are costs for operat- ing the hardware, such as electricity, cooling, maintenance, and specialist staff to ensure the desired availability. When using the cloud, these expenses are not eliminated but included in a unit price (e.g., per server, per GB storage), which the provider can distribute across all platform customers. This way, the unit price can be considerably reduced com- pared to own operations. Furthermore, starting with a minimal configuration, it is possible to carry out the first tests and expand the resources as required. The concept is further developed in the public cloud. In the public cloud, it is possible to adapt the computing resources precisely to the business requirements, and we don't have to pay for unneces- sary infrastructure. This is rounded off by the consumption-based payment philosophy in the public cloud, also known as pay-as-you-go (PAYG). In this model, the cloud providers divide services into small units, which can then be specified in data volume or time-dependent prices. Hourly prices are often given for computing and PaaS resources. Storage is billed per GB for stored or transferred data. In both cases, we only have to pay for what we use at the end of the month. 30 Another payment model based on infrastructure is reservations/savings plans/commit- ments, where the terms vary between providers. With these models, we purchase obliga- tions for a specific resource and, in return, receive a discount of 40-60%—depending on the region and resource—on the regular price. In addition, we can obtain discounts for not executing a task instantaneously but when resources run idle. A further cost advantage arises through the economies of scale in hardware procurement. Cloud providers' data centers are standardized worldwide, from network components to racks and servers. This uniform configuration allows providers to increase their purchase volumes. Instead of hundreds or thousands, they can order tens of thousands of servers in the same configuration and receive higher discounts than smaller companies or individu- als could. This price advantage is also included in the calculation of unit prices. Scalability With local computing, on-premises, and partly private cloud, scaling is limited or requires long-term extensions. Therefore, scaling these setups can be described as strategic and can react little or not at all to ad-hoc events or be changed. 31 Figure 3: Business demand vs. available on-premises infrastructure Source: Max Guhl, 2023 We can scale computing resources in the public cloud to align with business requirements. This ensures that costs are only generated to provide a business value, and no capital is wasted on having idle resources available. Another significant advantage is that we can 32 scale beyond our planning. As seen in the graph, in months 4 and 9, the demand is above the available resources, corresponding to a lost potential or a technical failure. Neither of the two is an issue when using the public cloud since “unlimited” scaling is possible Unlimited scaling here. The public cloud also has scaling limits due to tech- nical limitations or qual- Flexibility ity assurance. However, these are far above every- day needs. Therefore, an Since there are no purchase obligations per se with the public cloud, resources can be almost unlimited scaling “returned” at any time. As quickly as resources can be made available, they can also be can be assumed. deleted again. Adjustments can also be made at any time without a planning horizon. If a server is too big, its resources will be reduced, or we lower the number of servers. If a database is required in another region, we can move it or create a new one quickly. If stor- age is scarce, it is automatically expanded. These scenarios are often termed “elasticity” and can only be implemented with great effort or not at all with local or on-premises resources. Easy access A credit card is all it takes to create an account on a public cloud to make computing resources and the newest technology available. The fact that the cloud is accessible from anywhere worldwide makes it possible for almost any interested developer or organiza- tion to use these resources. Many technologies are available once registered in the cloud, which would otherwise only be available with high investments. Be it simplifications through the...aaS offers, GPU-supported computation (e.g., Nvidia A100), ARM-based CPU (e.g., Graviton at AWS, Azure Ampere Altra), or, more recently, quantum computing resour- ces and platforms (e.g., AWS Braket, Azure Quantum, GCP Cirq). Reliability From the user's point of view, IT services should be available 24/7. For this, a great deal of effort is required on the IT side on various levels, such as data center structures with power and cooling, network, hardware, and software. With an availability of 99.9%, the annual downtime is only 8 hours and 46 minutes. With an increase in availability to 99.999%, the yearly downtime is reduced to 5 minutes and 15 seconds. (Moozakis, 2021) These values can only be realized with high investments in redundant and separate cool- ing circuits, power supplies, and network connections provided by public clouds. A further advantage results from the case of non-compliance with availability by the cloud provider. A refund must be paid if a cloud provider cannot deliver the contractually agreed availabilities, often called a Service Level Agreement (SLA). The reimbursement amount varies between the providers and depends on the shortfall in availability. For a 99.99% underrun, AWS, Azure, and GCP reimburse 10% of the monthly consumption of the dis- rupted service. For a 99% underrun, Azure and GCP reimburse 25%, and AWS 30% of the monthly consumption of the disrupted service. From a shortfall of 95%, all providers reim- burse 100% of the monthly consumption of the respective service. 33 Physical Safety Compared to local computing, simple physical access is nearly impossible in the cloud. Cloud offerings can typically surpass the protections against unauthorized access pro- vided by on-premises and local computing. Cloud providers‘ data centers are divided into protection zones. The closer a person gets to the operated infrastructure and data, the more critical the controls are. For example, entering a data center can only happen with prior registration and a corresponding ID document. Accompanied by a camera system, access to the building occurs exclusively with biometric data registration (e.g., retina scan, palm vein scan) and a so-called isolation system, which prevents unnoticed access in groups. Access to the actual server rooms is only granted to a small group of qualified employees. Due to the size of the cloud providers' data centers, they can spread the high costs over many customers and services. Running our services with comparable security levels is primarily unprofitable. At the infrastructure level, the providers offer ways to create high availability and disaster recovery across multiple locations. AWS, for example, operates over 20 (e.g., Northern Vir- ginia, Ireland, Singapore) regions worldwide with over 80 Availability Zones (AZ), with at least 2 AZs per region. Each AZ comprises many data centers and, thus, physically inde- pendent buildings. 34 Figure 4: Cloud provider data center organization hierarchy Source: Max Guhl, 2023 Further logical separations of environments can be implemented on a virtual level. Among other things, the Virtual Private Clouds (VPC) and automated Disaster Recovery (DR) func- tions can be used. A VPC can extend over several physical data centers or even AZ, and using DR functionalities often allows for automatic recovery from failures based on redun- dant copies in other data centers or AZs. For example, Cosmos DB can be configured to be distributed over several regions. However, we should consider regional legal regulations regarding data processing and storage. 35 Certificates All cloud providers hold relevant certificates that help us evaluate the extent to which standards are fulfilled internally. These standardized certifications make choosing the right cloud provider for our project more accessible. For example, emergency plans, train- ing procedures, cloud architectures, the type of data storage, and data encryption are evaluated to obtain these certifications. If a provider fulfills the criteria of a certificate, cus- tomers do not need to carry out an additional test at their own expense. In addition, a basis of trust is created, accelerating the use of the cloud. Later, we will continue with this topic and discuss risks regarding cloud provider certifications. Risks As with all things, cloud computing also has its drawbacks. In the following, we will dis- cuss some of the risks associated with the cloud. Expenses We saw cost savings as a significant benefit. Nevertheless, using the cloud also can poten- tially generate unnecessary costs. On average, there is about 32% wasted costs. (Adler, 2022) Services are not always cheaper per se compared to non-cloud solutions and can lead to unexpected costs if not used correctly. For example, if server systems are moved 1-to-1 from the data center to the cloud without adapted configurations, this is often expensive. This is partly because the servers are usually scaled too large and are not fully utilized in the cloud. However, the resources that are not required must also be paid for. Another reason for higher costs is easy access to high-performance resources. The cost difference between commodity and high-end hardware can be significant and have fatal business consequences should these resources be used unintentionally. Another factor is the cost, which is not immediately apparent. For example, AWS charges the data sent between the regions and AZ, as shown in the figure. An expensive option could be, for example, shared data transfer between two EC2 instances in two different regions. While this is not a problem with moderate use, it can become a cost risk if there is much data traffic between the two locations. However, this traffic can be free for some provided services. Such background data traffic is required, for example, to ensure redun- dancies between data centers if automated disaster recovery is selected. These simple examples demonstrate the necessity to ensure that the architectural design is as cost-effi- cient as possible. 36 Figure 5: VPC peering across Regions Source: Max Guhl, 2023, based on Pal et al., 2021 Another cost risk is the reservation/commitment payment model described above. These models work like pre-paid phone cards. A fixed amount is paid in advance for using a spe- cific server type for a particular time, e.g., one month. With improper configuration, we might obtain a server running for 730 hours straight or 730 servers running for one hour, regardless of whether we use the provided computational power. We must consider these risks, but cloud providers also give us tools to mitigate these issues. For example, cost calculators can help us estimate the expenses for an envisaged system before we implement it (e.g., Amazon Web Services, n.d.-1b). Other examples include monitoring capabilities, automated alerts, and service shutdowns once a pre- defined budget has been spent. Security What applies to costs also applies to security: Resources are not insecure in the cloud per se, but we have to know what we are doing and to whom to entrust our data. With the cloud's IT landscape expanded, more entry points to our systems must be protected. Whether an additional firewall service is installed to protect the connection to the inter- net, a Content delivery network (CDN) is used, a role-based authorization system with two-factor authentication is introduced, or insecure ports are blocked for remote mainte- nance. The cloud providers help with suitable default settings and best practices, but (depending on the delivery model) we, as the customer, are fully responsible for the cor- rect execution. The selected cloud provider should also be trustworthy. Since our organization's data are processed on the provider's hardware, this should match our compliance and regulatory requirements. 37 Content delivery Although this aspect should always be considered when operating in the cloud, it is also network worth noting that cloud providers have highly specialized teams in place, putting a consid- A Content Delivery Net- work (CDN) is a company- erable workforce into securing services. The knowledge and resources spent on this pur- run global network with pose usually far exceed what is affordable by smaller organizations installing local or on- regional hubs that store a premises solutions. copy of websites. This means that access can always be gained from Shared responsibility nearby hubs. CDNs are also used to ward off DDoS and other attacks. The public cloud is a shared responsibility platform, meaning that both the cloud provider Typical providers are and the customer are responsible for certain aspects of the system. For example, a pro- Cloudflare, Rackspace, Google Cloud CDN, and vider might take care of the security and infrastructure maintenance while we are respon- Stack Path. sible for the security of our data, applications, and access management. This shared responsibility model helps the cloud provider ensure a secure and reliable platform and clearly defines what we must take care of. Azure, for example, provides customers with an overview and expressly points out that data, endpoints, accounts, and access manage- ment are always the customer's responsibility without exception (e.g., Lanfear et al., 2023). What looks comfortable at first glance becomes complex regarding compliance with offi- cial and industry regulations. As customers who also might provide services, we must be able to provide information at all times in case of failures or data breakdowns to avert or contain the danger. We are not clear about how the providers' services function internally, what their downtimes are, or what technical solutions are used, and this can lead to seri- ous business issues. Therefore, it is recommended that the business and delivery condi- tions, the operating model, and the used software be considered for particular use cases. Certifications and transparency Cloud providers invest considerable time and money in obtaining external certification to underpin the shared responsibility model. However, the large number of certificates can be a risk. We must rely on our expertise to evaluate which certificates meet our safety and quality requirements. The expert should be able to ensure that the provider has the cor- rect certificates. Some providers also list confirmation letters from authorities or states to verify their certificates. However, sometimes, these only confirm that they are subject to specific legislation, such as EU law for European data centers. If we only look at the official certificates, we quickly realize that not every provider has every certificate that may be required. This can lead to additional costs for a technical solution or recertification of our services. If the selected provider has the required certificates, it is also essential to ensure they are valid in the regions where we plan to work. Furthermore, the services used must also be examined more closely; here, parts or entire services may not be subject to certifi- cation. Due to the many certificates, specialist expertise must be obtained for the legal or audit departments. Certifications can have regional, industry-specific, or international validity. The following certificates are examples of the variants and do not represent a complete overview of the existing certificates. 38 Regional certification: “The Information System Security Management and Assessment Program (ISMAP) is a cloud services assessment program administered by the Japanese government. The program was officially announced on 26 May 2020, and it was designed to ensure appropriate security in government cloud services procurement by evaluating and registering cloud services that meet the Japanese government security requirements. Cloud service providers intending to participate in public sector procurement programs can apply for ISMAP certification administered by an independent third-party auditing firm approved by ISMAP. Japanese government agencies can procure cloud services from cloud service providers registered with ISMAP instead of conducting their assessments.” (Johnson et al., 2021) International certification: The ISO/IEC 27001 “specifies the requirements for establishing, implementing, maintaining, and continually improving an information security manage- ment system within the organization's context. It also includes requirements for assessing and treating information security risks tailored to the organization's needs. The [...] requirements are generic and intended to apply to all organizations, regardless of type, size, or nature.” (ISO/IEC 27001) It is published by the International Organization for Stand- ardization (ISO) and the International Electrotechnical Commission (IEC). The certified cloud provider can be checked via a webpage operated by the German Fed- eral Ministry of Economics and Climate Protection. We can also verify that a provider is listed on the official certification list. Industry Certification: “The Swiss Financial Market Supervisory Authority (FINMA) is Swit- zerland’s independent financial markets regulator. Its mandate is to supervise banks, insurance companies, financial institutions, collective investment schemes, and their asset managers and fund management companies. It also regulates insurance intermedia- ries. It is charged with protecting creditors, investors, and policyholders. FINMA is respon- sible for ensuring that Switzerland’s financial markets function effectively.” (Swiss Finan- cial Market Supervisory Authority, n.d.) Regarding content, these certifications also deal with the responsibility for operating computer resources and the data processed and stored on them. For example, BaFin requires a financial service provider to be able to tell where customer data is currently located and processed at all times. In addition, it is nec- essary that the financial service provider has complete control over virtual and physical computing resources at any time and prevents access by third parties. These scenarios can only be implemented with much collaboration with cloud providers. Data sovereignty Data sovereignty means that the collected data are subject to the law of the country in which they were collected. What poses little risk when collecting and processing data locally can lead to significant legal challenges in the cloud. Since the interpretations and specifications differ in each country, it can be challenging to do justice to all relevant legis- lations, which is very much the case when using the public cloud globally. One of the laws for processing personal data in Europe is the General Data Protection Reg- ulation (GDPR). The GDPR is among the world's strictest data privacy and security laws. It was drafted and passed by the European Union (EU), and it imposes obligations on com- 39 panies and organizations anywhere so long as they process or collect data related to peo- ple (e.g., name, gender, phone number, address, political attitude, credit card, health data, …) in the EU (). The regulation was put into effect on May 25, 2018. (Wolford, 2023) A violation of the requirements can be fined “up to 20 million euros, or in the case of an undertaking, up to 4 % of their total global turnover of the preceding fiscal year, whichever is higher”. (Intersoft Consulting, 2021) Especially in the public cloud environment, the GDPR has gained importance in Europe since services are mainly offered by non-European providers. As already described, the GDPR is the presiding standard in Europe. A counterpart to this is the frequently discussed Cloud Act, “The United States enacted the Clarifying Lawful Overseas Use of Data (Cloud) Act in March 2018.” (The United States Department of Jus- tice, 2023) The bill allows the United States government to request data from U.S. compa- nies based in other countries. The bill also allows companies to share data with the gov- ernment if they believe it is necessary for national security. The GDPR and the Cloud Act are two very different pieces of legislation. The GDPR is intended to protect a single per- son and their data, while the Cloud Act requests data from companies about individuals. The Cloud Act allows explicitly for warrants to be issued requiring the handover of data EU-US MLAT without a Mutual Legal Assistance Treaty (MLAT), but the European Data Protection EU-US MLAT is a frame- Board has stated that this is unacceptable under EU law. From a GDPR compliance per- work agreement between law enforcement agen- spective, warrants issued under the Cloud Act are only reasonable if they are based on the cies regarding data EU-US MLAT. Under GDPR, disclosure of personal data is not permitted without informing exchange. It is intended the data subject, unlike under the Cloud Act, where this is possible without the data sub- to help ensure that per- sonal data is protected jects‘ consent in particular cases (Foitzick, 2020; Sepp, 2020). and that criminal or terro- rist offenses can be prose- As the GDPR and the Cloud Act show, they represent two sometimes contradictory cuted. approaches to data protection. It is up to the cloud users to configure architectures that adhere to one or another philosophy. It may also be advisable to seek legal advice in these cases. However, there are also some functions and measures that we can check. To meet regional regulatory requirements, many cloud providers have established different ways of providing their customers with compliant cloud computing services: Hosting services in particular regions: As we learned in the “Physical Safety” section, countries and legal areas are delimited by regions. The cloud provider ensures techni- cally and contractually that the storage and processing only take place in the regions the users select for this purpose. If not configured by the user, the cloud providers will not move the data to another region. Even in the event of a failure or to ensure high availability, only resources within one region are used. Regional access control: Many cloud services can be used globally, but access might be geographically restricted or adapted. For example, the Active Directory Service by Azure can manage access authorization to applications across regions. Furthermore, this serv- ice can set policies regulating where, when, and which data are stored or moved or whether the use can be restricted if necessary. 40 Further examples demonstrate cloud providers' attempts to offer regionally adapted serv- ices: Microsoft Azure Germany was a cloud platform physically isolated from the rest of Microsoft Azure and compliant with services critical to German data privacy regulations. Azure Germany, managed by T-Systems International, is an independent German com- pany and a subsidiary of Deutsche Telekom that provides essential cloud services (Lei- biger, 2021). These services span Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS). The core components cover infrastructure, net- work, storage, data management, identity management, and other services (Microsoft Azure, 2020). Microsoft closed Cloud Germany in October 2021 due to the low demand and different possibilities of using encryption services with custom key management and advanced security policy features. Today, we can use an appropriate GDPR-compliant Azure region instead. This became possible in 2018 when a portal for a Data Subject Request was introduced in the global Microsoft Azure platform, and security and policy features could be extended or established. In addition, Azure helps assess our environ- ment with specific tools and provides guidance to run a GDPR-compliant environment in the cloud. (Microsoft Azure, 2018) Google offers the Google Sovereign Cloud with its partner, T-Systems International GmbH in Germany and Thales in France. The challenges to comply with GDPR are met here with further technical solutions such as using Key Access Justifications (KAJ) and Cloud Exter- nal Key Manager (EKM). “Key Access Justifications works by adding a field to your Cloud EKM requests that allows you to view the reason for each request.” (Google Cloud, 2024-1a) The customer selects the EKM to be used. This can be a regional provider or our own hardware. This solution gives us complete control over when and why data is decrypted. Like Azure, AWS has adjusted contracts and existing services for GDPR compliance: Guard Duty can be used for threat detection and security monitoring. Macie helps discover and secure personal data. Inspector automates security assessments to ensure that applications conform to security guidance. Config Rules are designed to help clients to handle the GDPR requirements. Furthermore, the contract terms include a clause stating that cooperation with local crimi- nal authorities is based on local law. This puts data under the GDPR for EU applications instead of the US Clous Act. In addition to the different approaches to data protection between Europe and the USA, the particular situation regarding cloud services in China must also be considered. Since the Chinese market is becoming increasingly important for many companies, the chal- lenges associated with using cloud services should also be known. Under the Cybersecurity Law of the People‘s Republic of China (CSL), which has been in effect since June 1, 2017 (Wagner, 2017) by China´s National People´s Congress Standing Committee, networks and cyberspace activities in the People's Republic of China (PRC) are governed to safeguard network security, prevent illegal activities, and maintain confi- dentiality of network data. Critical information infrastructure (CII) operators in the PRC, in 41 particular, are subject to special requirements regarding procuring products and services and cross-border data transfer, including the data sovereignty requirement. “Personal information” and “important data” generated or collected by such an operator must be stored within the PRC, and cross-border data transfer is subject to security assessment and government approval. The CSL provides some guidance on how these cybersecurity assessments should be conducted. Another unique feature when using Chinese cloud regions is the operator of these regions. This has to be a local company, as non-Chinese companies cannot operate these data cen- ters. For example, it is not Microsoft but a local company called 21Vianet, which is respon- sible for operating Azure data centers in China. Integrating other companies into operating services means that the versions in China are often behind those of different areas. This applies to all non-Chinese cloud providers with a region in China. (Microsoft Azure, 2023-1a, 1b, 1c) SUMMARY Cloud computing delivers computing services—including servers, stor- age, databases, networking, software, analytics, and intelligence—over the internet (“the cloud”) to offer faster innovation, flexible resources, and economies of scale. It allows users to access technology resources on-demand, like electricity from a power grid. Cloud computing enables companies to consume resources as a utility—just like electricity— instead of in-house building and maintaining computing infrastructures. Several types of services are commonly provided via the cloud, includ- ing, for example: 1. Infrastructure as a Service (IaaS): virtualized computing resources, networking. 2. Platform as a Service (PaaS): ready-to-use databases and applica- tion servers. 3. Function as a Service (FaaS): Users can execute their code on demand without managing the underlying infrastructure. 4. Software as a Service (SaaS): Services such as email, customer rela- tionship management, and human resources management. These services allow organizations of all sizes to access the same power- ful technologies that large companies use without the need to invest in expensive hardware, software, and personnel. Cloud computing offers many benefits, including cost savings, scalabil- ity, flexibility, and agility. It allows organizations to pay for only their computing resources rather than investing in expensive hardware, soft- 42 ware, and personnel. Cloud resources can be easily scaled up or down to meet changing business needs, and new services and tools can be quickly and easily deployed. However, there are risks to consider, such as security, compliance, dependence, limited customization, internet connectivity, and cost. Organizations must