Document Details

Uploaded by VeritableAlgebra
Harrisburg University of Science and Technology
Full Transcript
$PNQ5*"¡$MPVE&TTFOUJBMT 4UVEZ(VJEF 4FDPOE&EJUJPO By 2VFOUJO%PDUFSBOE$PSZ'VDIT $PQZSJHIU¥CZ+PIO8JMFZ4POT *OD Chapter 2 Cloud Networking and Storage The following CompTIA Cloud Essentials+ Exam CLO-002 objectives are covered in this chapter: ✓✓ 1.2 Identify cloud networking co...
$PNQ5*"¡$MPVE&TTFOUJBMT 4UVEZ(VJEF 4FDPOE&EJUJPO By 2VFOUJO%PDUFSBOE$PSZ'VDIT $PQZSJHIU¥CZ+PIO8JMFZ4POT *OD Chapter 2 Cloud Networking and Storage The following CompTIA Cloud Essentials+ Exam CLO-002 objectives are covered in this chapter: ✓✓ 1.2 Identify cloud networking concepts. Connectivity types Direct connect VPN Common access types RDP SSH HTTPS Software-defined networking (SDN) Load balancing DNS Firewall ✓✓ 1.3 Identify cloud storage technologies. Storage features Compression Deduplication Capacity on demand Storage characteristics Performance Hot vs. cold Storage types Object storage File storage Block storage Software-defined storage Content delivery network Before the Internet was popular, cloud technology foundered in the periphery of computing technology. This is despite the core cloud technology—virtualization—having been in existence for nearly three decades. While cloud concepts were interesting, there was no practical way to implement them and realize their benefits. The Internet provided the needed environment for cloud usage; as the Internet grew, so too did the number of users who could benefit from cloud services. Clearly, the Internet enabled the cloud. While this relationship is a positive, there’s a flip side as well. Not only is the cloud Internet-enabled, it’s Internet-dependent. Without Internet access, cloud-based resources are likely to be unavailable. Even private clouds are often accessible via the Internet to allow users in remote locations access to network resources. From a cloud end-user standpoint, probably nothing is more important than having an Internet connection to access needed resources. Because networking and network connections are so important, we’ll spend a lot of time covering those topics in this chapter. Once connected to the cloud, a majority of users store files for easy sharing, access, and collaboration. Some storage solutions appear quite simple, such as uploading music to iTunes or keeping a few notes in Google Drive. Other cloud storage solutions are incredibly complex, with infrastructure that used to be hosted only in massive on-site data warehouses. Online apps and compute resources are great, but storage was the catalyst for the early growth of cloud services. In this chapter, we will finish up our discussion of CompTIA Cloud Essentials+ Domain 1.0 Cloud Concepts by covering two key elements—networking and storage. Within the “Understanding Cloud Networking” section, we will cover connectivity and access types, as well as services that enable additional functionality. Our coverage of storage in the “Understanding Cloud Storage Technologies” section includes features, characteristics, and types, along with services such as cloud-defined storage and content delivery networks. Understanding Cloud Networking Concepts As you know by now, without networking, the cloud doesn’t exist. Networking allows users to access virtualized servers somewhere on the Internet, enabling their company to save money by not needing to purchase hardware or software that will become obsolete in a few years. Understanding Cloud Networking Concepts 37 One question that might come to mind is, “How do cloud users access virtualized cloud resources?” In most use cases, it’s pretty simple, but of course there are more complex access methods as well. Before we look at specific methods of cloud access, we’re going to do a quick primer on the underlying technology that enables modern networking in the first place. Then we’ll provide details on five methods used to connect to the cloud. After that, we will finish this section by covering four networking services that give the cloud more functionality. Networking: A Quick Primer You probably use computer networks every day. Maybe you know intricate details of how information gets from point A to point B, but maybe you’ve never given it a second thought—as long as your email works and you can find the latest cat memes, you’re good. Regardless, a basic understanding of how networks work is a good foundation for learning cloud networking concepts. The networking material shared here will most likely not be on the Cloud Essentials+ exam, but it’s really good to know. First, let’s define a network. Simply enough, a network is two or more computers or devices that can communicate with each other. Those devices need to have software that enables them to talk to each other, called a network client, which is built into every operating system (OS). (Custom software packages are available for specific network communications as well.) From the actual networking side, though, three things are required. Network adapter (or network card) Transmission method Protocol Practically every computing device today has a built-in network adapter. (It’s probably actually every device, but once we claim that, someone will find an exception to the rule.) You’ll also hear this hardware component referred to as a network interface card (NIC). The NIC’s job is to enable the device to communicate on the network. Each NIC has the ability to communicate using one type of networking. For example, some use cellular, others use Bluetooth or Wi-Fi, and some use wired Ethernet. The differences between them isn’t important for the moment, but what is important is that for two devices to talk to each other, they need to use NICs using the same technology. Said differently, a cellular-only device can’t communicate on a Bluetooth network. The transmission method is how data travels from device to device. In the old days of computing, this meant copper or fiber-optic cables. While wired connections are still used today, fast wireless networking methods—cellular, Wi-Fi, and Bluetooth—are ubiquitous. The final component is a networking protocol. This is the language that computers speak to facilitate communication. Protocols are analogous to human languages in many ways, in that they provide the rules and standards for common understanding. For two devices to talk to each other, they need to speak the same language. To give a human example, let’s say that you are fluent in English and Japanese. If someone approaches you speaking Japanese, you can respond in kind. You speak the same protocol. 38 Chapter 2 Cloud Networking and Storage But if someone approaches you and tries to speak to you in French, the conversation is pretty much over before it begins. It’s the same with computers. Throughout the history of computers, there have been several dozen computer languages developed. Most are obsolete today. The only one you really need to know is Transmission Control Protocol/Internet Protocol (TCP/IP). It’s the protocol of the Internet. Although TCP/IP is often called a protocol, it’s actually a modular suite of protocols working together to enable communication. The two namesake protocols are Transmission Control Protocol (TCP) and Internet Protocol (IP), but there are dozens of others, as shown in Figure 2.1. For communication to happen, each device uses one component from each of the four levels. The good news is that all of this happens automatically, so you don’t need to think about it! f i g u r e 2.1 TCP/IP protocol suite DoD Model Process/ Application Telnet RDP SSH FTP DHCP SMTP HTTP HTTPS TCP Host-to-Host Internet Network Access UDP ICMP ARP RARP IP Ethernet Fast Ethernet Gigabit Ethernet 802.11 (Wi-Fi) TCP/IP structure is based on a U.S. Department of Defense (DoD) networking model; that’s why in Figure 2.1 we call the model the DoD model. Protocols at each layer within the model are responsible for different tasks. For example, one of the things that IP is responsible for is logical addressing, which is why you might be familiar with the term IP address. Any device on a TCP/IP network with an IP address, including computers, printers, and router interfaces, is called a host. This is a term you’ll hear again later in this chapter in the “Domain Name System” section. You might notice that the majority of protocols are at the Process/Application layer. These are the ones that provide the most customized functionality. You might also notice Understanding Cloud Networking Concepts 39 some exam objective acronyms listed there too, such as RDP, SSH, and HTTPS. There’s more to come on each of these in the next section. Connecting to the Cloud Depending on how many users need cloud services, connecting to the cloud can take many forms. If it’s just one user, the connection method is usually straightforward and requires little training. In most cases, if a user can open an Internet browser, then that user can connect to cloud resources. If an entire organization needs permanent access to cloud resources, then connectivity gets a bit more complicated. In this section, we will explore five different access or connectivity types. We’ll start by covering single-client methods and then scale up to full, persistent organizational access. Using Hypertext Transfer Protocol Secure Connecting a single client to cloud resources is most often done through a web browser and Hypertext Transfer Protocol Secure (HTTPS). We say single client, but it can also be dozens or hundreds of clients, each of them individually connecting to cloud resources from their own devices. A great thing about this access type is that every client device, including desktops, laptops, tablets, and smartphones, has a built-in web browser. Even better, nearly all users know how to use them! Figure 2.2 shows a connection to Google Drive using HTTPS. You can tell that it’s HTTPS by looking in the address bar—the website address starts with https://. Figure 2.2 Connecting to Google Drive with HTTPS HTTPS is a secure protocol in the Process/Application layer in the TCP/IP suite. It’s specifically designed to fetch web pages and encrypt data that is transmitted between a web server and its client. Because HTTPS secures transmissions, you can be confident in typing 40 Chapter 2 Cloud Networking and Storage in usernames, passwords, and other personal information to websites that use it with little fear of it being hacked. The alternative web protocol to HTTPS is Hypertext Transfer Protocol (HTTP). It does not encrypt transmissions—anyone scanning network traffic can read HTTP data just as you are reading these words. If the website address in the browser starts with http://, do not put in any information that you don’t want to be transmitted in plain text without encryption! HTTPS can be used with any commercial web browser, which is just a specialized piece of software designed to request web pages from web servers. There are several on the market, with the most popular being Google Chrome, Microsoft Edge, Internet Explorer, Safari, and Firefox. It’s important to point out that HTTPS encrypts data transmissions, but not data on the client or server. Said differently, it encrypts data in transit but not data at rest. Data on clients and servers needs to be encrypted using different methods. We’ll look at encryption methods for data at rest in Chapter 7, “Cloud Security.” There are two key takeaways from this section. One, HTTPS is used in conjunction with a web browser and can be used to access cloud resources. Two, HTTPS secures network transmissions between the client and the server. Using Remote Desktop Protocol One of the great features of clouds is the ability to create virtual instances of computers. For instance, using the Amazon Web Services (AWS) Elastic Compute Cloud (EC2) interface, a cloud administrator (or anyone with appropriate permissions) can create a virtual Windows server. That server will act like any other Windows server and can host and run apps, provide storage, and do anything else users need a server to do. The process to create a virtual computer takes only a few minutes and doesn’t require the client company to purchase any hardware! And of course, once users are done with the virtual computer, they can shut it off, and no physical resources are wasted. Throughout the rest of this chapter and elsewhere in the book, we will use AWS to show examples of cloud concepts. The Cloud Essentials+ exam is provider-agnostic, and we are not advocating for the use of one provider over another. AWS does allow for the creation of free accounts, though, which makes it convenient for us when we want to show examples. Other CSPs will generally have options similar to those that we show in this book. Understanding Cloud Networking Concepts 41 Virtual machines have a number of uses, including the following: Providing extra server capacity, almost instantaneously Allowing developers to test software on multiple OS platforms Creating a test environment for network administrators to install and configure changes before rolling them out to a production environment Once a virtual computer is created, someone will probably need to log in to that computer. For example, an administrator might need to start services for clients, or a developer might need to log in to install and test his app. Remote Desktop Protocol (RDP) is used to log into an online Windows instance. Once a user is logged in, they can use it just as they would use a computer sitting front of them. To use RDP, you must install an RDP client onto the device that will be logging in to the virtual instance. Windows includes an RDP client called Remote Desktop Connection, which is shown in Figure 2.3. Linux users can use the rdesktop command, and macOS users can download the Microsoft Remote Desktop app from the Mac App Store. Android and iOS users will find RDP clients in their respective app stores as well. Some are free, and others will charge for their services. Figure 2.3 Windows Remote Desktop Connection Upon opening the RDP client, a user will be required to supply several pieces of information to connect. That includes the ID or address of the instance (or computer name as shown in Figure 2.3), a username and password, and the location of the private security key. Like HTTPS, RDP uses encryption to secure communications between the client and the virtual instance. 42 Chapter 2 Cloud Networking and Storage Using Security Key Pairs RDP requires the use of a public/private key pair to authenticate users seeking to log in to remote instances. This is far more secure than requiring just a username and password. The premise is that a private, encrypted security key is created for a specific user. In practice, it’s a security code encrypted in a file stored on the user’s device. (For RDP, the private key will have a.pem file extension.) When the user presents that private key to another device holding the corresponding public key, the public key is used to decipher the private key and validate the holder’s identity. For security purposes, the user clearly needs to not let anyone else have access to their private key! The public key can be sent to anyone who needs to verify the identity of the private key holder. The public/private key pairs are generated by a security server known as a certificate authority (CA). Several commercial CA services are available on the Internet. If you are using a CSP, odds are they have their own CA that will generate keys as needed. For example, in AWS, security keys are created in the EC2 dashboard. Figure 2.4 shows an example of a remote instance of Windows Server 2019. Looking at Figure 2.4, you might think, “Big deal, it looks just like a Windows Server desktop.” That is a big deal, and the fact that it looks and acts just like Windows Server 2019 is the point. This screenshot was taken from a desktop computer running Windows 10, and the virtual machine is somewhere else in the world. But anyone sitting at the Windows 10 computer could manage the server, just as if they were sitting in front of it. Figure 2.4 Using RDP to manage Windows Server 2019 Understanding Cloud Networking Concepts 43 It may be too small to read in the printed Figure 2.4, but the host name, instance ID, public IP address, private IP address, and other information are in the upper-right corner of the desktop. As mentioned previously, you can also manage remote instances from mobile devices. Figure 2.5 shows a screenshot of managing the same server using the iOS free Microsoft RD client. Again, the picture might not seem impressive, but remember it’s taken from an iPhone. Managing a server from an iPhone isn’t the most convenient thing in the world, but know that with RDP, it’s an option. figure 2.5 Managing a Windows server from an iPhone Only one RDP client connection can be made to a virtual instance at one time. Using Secure Shell Secure shell (SSH) is to Linux instances what RDP is to Windows Server instances. It allows clients to remotely connect to a virtual Linux machine, securely, and act as if the user were sitting at the virtual computer. In its most basic form, SSH is run as a text command from a command prompt. For example, if you open a Windows command prompt to run the SSH command, specify the 44 Chapter 2 Cloud Networking and Storage name of the security key fi le, and provide a username and the address of a virtual Ubuntu Linux server hosted on AWS, the command wlll look like this: ssh -i "QDt1.pem" [email protected] Ubuntu is one of the many versions of Linux. Linux is an open source OS, meaning that people and companies can create their own versions as they want. In the sample SSH command, the -i option tells the computer that the next input will be the security fi le—in this case it’s QDt1.pem. Next is the username, Ubuntu, followed by the instance’s Internet address. It’s highly unlikely that you will need to know SSH syntax for the Cloud Essentials+ exam, but this gives you a taste of how it could be used. If your computer doesn’t have an SSH client and you’d rather use something besides a command prompt, there are plenty of options in the marketplace. Doing a Google search of the app store will reveal several for every common OS. For example, a free open source SSH client is called PuTTY. It can be used in Windows and many other OSs. Figure 2.6 shows the PuTTY for Windows connection screen. figure 2.6 Windows PuTTY SSH client To log in, provide the host name of the server and the location of the security key fi le (in the Connection SSH menu on the left) and click Open. The PuTTY client will open, and then you can log in. Once logged in, you will have a text-based command prompt from which you can manage your Linux server. It’s the last line shown in Figure 2.7. Understanding Cloud Networking Concepts f i g u r e 2.7 45 Logged into the Ubuntu Linux server Windows historically has not come with an SSH client, so users had to download something such as PuTTY. Another option is OpenSSH. With the release of Windows 10 build 1089 and Windows Server 2019, OpenSSH is included as an optional feature. To install it, go to Start Settings Apps Apps And Features Manage Optional Features. Using a Virtual Private Network The Internet is a public network. Data sent from point A bounces through several routers and eventually ends up at point B, but during that time, it can be read by the right people with the right equipment. Protocols such as HTTPS encrypt the data, making it considerably harder to read. Still, with enough time and dedication, that data can be decrypted and read. Realistically, the odds of someone hacking your personal data sent through HTTPS are incredibly small, so don’t waste too much mental energy worrying about it. Professional hackers rarely go after individual people anyway—winning small prizes is too inefficient. They’d rather go after bigger organizations. 46 Chapter 2 Cloud Networking and Storage Another way to secure Internet traffic is to use a virtual private network (VPN). A VPN is a secure (private) network connection that occurs through a public network. VPNs can be used to connect individual users to a corporate network or server or to connect company networks or clouds together across the Internet or other public networks. For an individual user, this might look like sending HTTPS data through a VPN—two layers of encryption are more secure than one! VPNs can secure more than just HTTPS traffic, though. For example, a company with a geo-redundant cloud might use a VPN to secure traffic between the physical cloud sites. Figure 2.8 illustrates how a VPN works. Figure 2.8 A VPN Broadband Connection Laptop at Home or at a Hotel Corporate Network VPN Server or Router Internet VPN Tunnel A VPN is a point-to-point secure tunnel through the Internet. The tunneled private network connection provides security over an otherwise unsecure connection. In Figure 2.8, one point is an end user, but it could be another corporate VPN server as well. All traffic going from one network to the other would have to flow through the VPN servers. In some cases, that can be a considerable amount of network traffic, so the VPN device needs to be configured appropriately. A VPN connection is a great option for users who work from home or travel for work. When using a VPN, the remote end appears to be connected to the network as if it were connected locally. From the server side, a VPN requires dedicated hardware or a software package running on a server or router. Clients use specialized VPN client software to connect, most often over a broadband Internet link. Windows 10 comes with its own VPN client software accessible through Start Settings Network & Internet VPN, as do some other OSs. Many third-party options are also available. Some may be free for individual users, but most corporately used VPN clients cost money. Figure 2.9 shows an example of Pulse Secure VPN client on an iPhone. Understanding Cloud Networking Concepts figure 2.9 47 Pulse Secure VPN client Don’t get VPNs confused with virtual local area networks (VLANs). VPN connections are made over the Internet. A VLAN is configured on a network switch and simply puts several computers together on their own local network segment. Using Direct Connect The fi nal connectivity method is called direct connect. It’s used to provide a physical connection between your company’s on-site network and the CSP’s network. If a large number of users need persistent access to cloud services or there is a large amount of data transferred between the cloud and the on-site network, direct connect is likely going to be the fastest and most cost-effective option. Most CSPs offer connections that support 100Gbps data transfers. Lower-bandwidth plans can be as cheap as about $50 per month. Faster plans with unlimited data transfers can cost more than $50,000 per month. With direct connect, you are literally connecting a router from your network directly to the CSP’s router. Direct connections often come with uptime guarantees such as three nines or four nines—you’ll recall we introduced these high availability standards in Chapter 1, “Cloud Principles and Design.” Examples of direct connect services are Azure ExpressRoute, AWS Direct Connect, and Google Cloud Interconnect. Cloud Networking Services The basic cloud services are compute, storage, networking, and database. Beyond that, cloud providers can provide extra networking services, for a charge, of course. In this section, we will look at four of those options: software-defined networking, load balancing, DNS, and 48 Chapter 2 Cloud Networking and Storage firewall. Software-defined networking is the only one that’s cloud-specific. The other three are networking services that can be provided outside of cloud environments as well. Software-Defined Networking To help illustrate what software-defined networking is, let’s first look at a relatively simple network layout, such as the one shown in Figure 2.10. F i g u r e 2.1 0 A sample network Switch Internet Router Switch Switch Router Switch The network in Figure 2.10 has two routers, including one that connects the corporate network to the Internet. Four switches manage internal network traffic, and client devices connect to the switches. New network clients can attach to existing switches, and if the switches run out of ports, more can be added. Of course, in today’s environment, we should draw in wireless access points and their clients as well. The wireless access points will connect to a switch or router with a network cable. Adding additional switches, routers, or other network control devices requires purchasing and installing the device and some configuration, but it’s nothing that a good net admin can’t handle. Large enterprise networks are significantly more complex, and include more routers and perhaps load balancers, firewalls, and other network appliances. Adding to the network becomes more complicated. In particular, adding more routers requires a lot of reconfiguration so the routers know how to talk to each other. Understanding Cloud Networking Concepts 49 Routers play a critical role in intra-network communications. The router’s job is to take incoming data packets, read the destination address, and send the packet on to the next network that gets the data closer to delivery. There are two critical pieces to the router’s job. One is the physical connections and internal circuitry that makes routing happen. The other is a logical component—each router has its own database, called a routing table, which it uses to determine where to send the packets. In a traditional networking environment, each router is responsible for maintaining its own table. While almost all routers have the ability to talk to their neighbor routers for route updates, the whole setup is still pretty complicated for administrators to manage. The complexity can really become a problem when troubleshooting data delivery problems. Enter software-defined networking (SDN). The goal of SDN is to make networks more agile and flexible by separating the forwarding of network packets (the infrastructure layer) from the logical decision-making process (the control layer). The control layer consists of one or more devices that make the decisions on where to send packets—they’re the brains of the operation. The physical devices then just forward packets based on what the control layer tells them. Figure 2.11 illustrates the logical SDN structure. F i g u r e 2.11 Software-defined networking Application Layer Application Programming Interfaces (APIs) Control Layer (SDN Controller) SDN APIs (OpenFlow, Cisco Open Network Environment, etc.) Infrastructure Layer In addition to agility and flexibility, a third advantage to using SDN is centralized network monitoring. Instead of running monitoring apps for each individual piece of network hardware, the SDN software can monitor all devices in one app. 50 Chapter 2 Cloud Networking and Storage The SDN controller acts as an abstraction layer. Applications that need to use the network actually interface with the SDN controller, thinking that they are working directly with the networking hardware. In the end, data still gets from point A to point B, so the distinction between how that happens isn’t important. Because the abstraction layer exists, though, the underlying network hardware and configuration can change, and it won’t affect how the applications work. It’s the job of the SDN controller to understand how to talk to the infrastructure. That’s Abstract Database systems make use of abstraction layers all the time. They act as translators between apps and the database itself, reducing the cost and complexity of reconfiguring data systems. They can also act as a security mechanism of sorts, blocking data from those who don’t need to see it. For example, say that your company has four different front-end applications that access a common database. There is a website where customers place orders, an app for the accounts receivable department to bill customers, an app for the warehouse to pull orders and check inventory, and a dashboard so management can see sales performance. Because of a change in data management policies, the database structure needs to be changed. If each of the apps interfaced directly with the database, then all four apps would need to be recoded to work properly. This could require a significant investment of time and money and may jeopardize business performance. Instead, if an abstraction layer exists, it’s the only thing that needs to be recoded before you can use the new database structure. As far as the apps are concerned, nothing has changed. In addition, say that the management dashboard has sales and profit information. The customers certainly shouldn’t see that from the website, but the customers do need to be able to see if something is in stock. The abstraction layer can help protect the sensitive information, acting as a security layer to ensure that sales and profit data doesn’t get passed to the web app, while passing through inventory data. So while an abstraction layer might seem like it increases complexity—and it can—know that there are good reasons to use one. It can help increase system agility, provide a layer of security, and ultimately keep costs down. To make things even more fun, SDN can be used to create virtual networks without any hardware at all. In Chapter 1, we introduced the concept of a virtual machine—a computer that exists without a specific one-to-one hardware relationship. SDN can accomplish the same idea with networking. Understanding Cloud Networking Concepts 51 Imagine having five logical servers running in a cloud, all using the same hardware. If they want to talk to each other, they will send data like they know how to—that is, to their network cards for delivery on the network, likely through a switch or router. But if they are using the same hardware, then they all have the same network adapter. That makes things weird, right? Well, not really, because SDN manages the communications between the servers. Each server will be assigned a logical NIC and communicate to the others via their logical NICs. SDN manages it all, and there are no communication issues. For many years, SDN was commonly associated with the OpenFlow protocol, because OpenFlow was the dominant technology in the marketplace. In more recent years, OpenFlow has run into issues, with some experts calling it inefficient at best. Other competitors have entered the market, including Cisco’s Open Network Environment (ONE) and VMware’s NSX. Load Balancing Imagine you want to do some online shopping. You open your browser, type amazon.com into the address bar, and the site appears. You’ve made a connection to the Amazon server, right? But is there only one Amazon server? Considering the millions of transactions Amazon completes each day, that seems highly unlikely. In fact, it’s not the case. Amazon has dozens if not hundreds of web servers, each of them capable of fulfilling the same tasks to make your shopping experience as easy as possible. Each server helps balance out the work for the website, which is called load balancing. Load balancing technology predates its usage in the cloud. Hardware devices, conveniently named load balancers, would essentially act like the web server to the outside world. Then when a user visited the website, the load balancer would send the request to one of many real web servers to fulfill the request. Cloud implementations have made load balancing easier to configure, since the servers can be virtual instead of physical. While hardware load balancers still exist, many CSPs’ load balancer as a service (LBaaS) offerings are cheaper and more convenient to set up and manage. Common Load Balancing Configurations We already shared one example of load balancing with an online retailer. In that example, all servers are identical (or very close to identical) and perform the same tasks. Two other common load-balancing configurations are cross-region and content-based. In a cross-region setup, all servers likely provide access to the same types of content, much like in our Amazon example. The big feature with this setup is that there are servers local to each region—proximity to the users will help speed up network performance. For example, say that a company has a geo-redundant cloud and users in North America, Asia, and Europe. When a request comes in, the load balancer senses the incoming IP address and routes the request to a server in that region. This is illustrated in Figure 2.12. If all servers in that region are too busy with other requests, then it might be sent to another region for processing. 52 Chapter 2 F i g u r e 2.1 2 Cloud Networking and Storage Cross-region load balancing Incoming Requests Load Balancer Another common way to load balance is to split up banks of servers to handle specific types of requests. For example, one group of servers could handle web requests, while a second set hosts streaming video and a third set manages downloads. This type of load balancing is called content-based load balancing and is shown in Figure 2.13. F i g u r e 2.1 3 Content-based load balancing Incoming Requests Load Balancer Web Video Streaming Downloads Understanding Cloud Networking Concepts 53 Load Balancing Benefits Cloud-based load balancing has performance benefits for high-traffic networks and heavily used applications. Scalability and reliability are important benefits as well. Let’s give a few examples of each. Performance The Amazon example we used earlier is probably the best example of this, but not all companies provide massive online shopping services. Smaller companies can benefit from performance enhancements as well. Servers that are specialized to handle a specific content type are often more efficient than multipurpose ones. And, the global load balancing example can be applied to distributed sites within a country as well. Scalability We know that the cloud is scalable, and so is load balancing within the cloud. For example, let’s say that your company does sell products online and has two servers dedicated to that task. For the vast majority of the year, the two servers can handle the traffic without a problem. On the busiest shopping day of the year—Cyber Monday—those servers are overwhelmed. With cloud-based load balancing, traffic spikes can be handled by quickly provisioning additional virtual servers to handle the traffic. When the capacity is no longer required, the servers are turned off. Cloud-based load balancing can also be scaled to function across a multicloud environment. Reliability Imagine a company that uses a business-critical application for remote salespeople. What happens if the server hosting that application crashes? It wouldn’t be good. With cloud-based load balancing, different servers can host the application, even in different regions. Perhaps a hurricane wipes out the data center in Florida. The load balancer can direct users to other data centers in different regions, and the business can continue to generate revenue. Domain Name System For one host on a TCP/IP network to communicate with another, it must know the remote host’s IP address. Think about the implications of this when using the Internet. You open your browser, and in the address bar you type the uniform resource locator (URL) of your favorite website, something like www.google.com, and press Enter. The fi rst question your computer asks is, “Who is that?” The website name means nothing to it—your device requires an IP address to connect to the website. The Domain Name System (DNS) server provides the answer, “That is 72.14.205.104.” Now that your computer knows the address of the website you want, it’s able to traverse the Internet to connect to it. DNS has one function on the network, and that is to resolve hostnames (or URLs) to IP addresses. This sounds simple enough, but it has profound implications for our everyday lives. Each DNS server has a database where it stores hostname-to-IP-address pairs, called a zone file. If the DNS server does not know the address of the host you are seeking, it has the ability to query other DNS servers to help answer the request. 54 Chapter 2 Cloud Networking and Storage We all probably use Google several times a day, but in all honesty how many of us know its IP address? It’s certainly not something that we are likely to have memorized. Much less, how could you possibly memorize the IP addresses of all of the websites you visit? Because of DNS, it’s easy to fi nd resources. Whether you want to find Coca-Cola, Toyota, Amazon, or thousands of other companies, it’s usually pretty easy to figure out how. Type in the name with a.com on the end of it and you’re usually right. The only reason why this is successful is that DNS is there to perform resolution of that name to the corresponding IP address. DNS works the same way on an intranet (a local network not attached to the Internet) as it does on the Internet. The only difference is that instead of helping you find www.google.com, it may help you find Jenny’s print server or Joe’s file server. From a client-side perspective, the host just needs to be configured with the address of one or two legitimate DNS servers and it should be good to go. Any company hosting a website is required to maintain its own DNS. In fact, the company is required to have two DNS servers for fault tolerance. protocols and ports We introduced several TCP/IP protocols in the last few sections. To help keep network traffic organized across an IP-based network, each Process/Application layer protocol uses what’s called a port number. The port number is combined with the IP address to form a socket. Using cable television as an analogy, think of the IP address as the home address where the cable television is delivered, and the port number is the channel that’s being watched. Port numbers aren’t specifically called out in the CompTIA Cloud Essentials+ exam objectives. However, we’d hate for you to be unprepared if one of them pops up on the test. Table 2.1 lists the protocols we’ve discussed and their associated port numbers. For a full list (and there are a lot!), you can visit https://www.iana.org/assignments/servicenames-port-numbers/service-names-port-numbers.xhtml. TA b l e 2.1 Protocols and port numbers Protocol Port number DNS 53 HTTP 80 HTTPS 443 RDP 3389 SSH 22 Understanding Cloud Networking Concepts 55 Firewalls A firewall is a hardware or software solution that serves as a network’s security guard. They’re probably the most important devices on networks connected to the Internet. Firewalls can protect network resources from hackers lurking in the dark corners of the Internet, and they can simultaneously prevent computers on your network from accessing undesirable content on the Internet. At a basic level, fi rewalls fi lter network traffic based on rules defi ned by the network administrator. Anti-malware software examines individual files for threats. Firewalls protect you from streams of network traffic that could be harmful to your computer. Firewalls can be stand-alone “black boxes,” software installed on a server or router, or some combination of hardware and software. In addition to the categorizations of hardware and software, there are two types of fi rewalls: network-based and host-based. A network-based fi rewall is designed to protect a whole network of computers and almost always is a hardware solution with software on it. Host-based fi rewalls protect only one computer and are almost always software solutions. How Firewalls Work Most network-based fi rewalls have at least two network connections: one to the Internet, or public side, and one to the internal network, or private side. Some fi rewalls have a third network port for a second semi-internal network. This port is used to connect servers that can be considered both public and private, such as web and email servers. This intermediary network is known as a demilitarized zone (DMZ). A DMZ can also be configured as a space between two fi rewalls. Figure 2.14 shows examples of DMZs. A fi rewall is configured to allow only packets (network data) that pass specific security restrictions to get through. By default, most fi rewalls are configured as default deny, which means that all traffic is blocked unless specifically authorized by the administrator. The basic method of configuring fi rewalls is to use an access control list (ACL). The ACL is the set of rules that determines which traffic gets through the fi rewall and which traffic is blocked. There will be different ACLs for inbound and outbound network traffic. ACLs are typically configured to block traffic by IP address, protocol (such as HTTPS or RDP), domain name, or some combination of characteristics. Packets that meet the criteria in the ACL are passed through the fi rewall to their destination. 56 Chapter 2 f i g u r e 2.1 4 Cloud Networking and Storage Two ways to configure a DMZ Internet Internet DMZ Firewall Web and Email Servers Firewall DMZ Firewall Web and Email Servers Internal Network Internal Network Obtaining a Firewall Windows comes with its own software fi rewall called Windows Defender Firewall. There are also numerous fi rewalls on the market—Barracuda and Zscaler are popular ones— and of course AWS and Azure provide fi rewall services as well. Some third-party security software, such as Norton Internet Security, comes with its own software fi rewall. In cases where you have those types of programs installed, they will turn off Windows Defender Firewall automatically. Regardless of which fi rewall is used, the key principle is to use one. Combined with antimalware software, they are critical security features for any network. As a reminder, for the Cloud Essentials+ Exam, you need to know about direct connect, VPN, RDP, SSH, and HTTP cloud access. You also should be able to identify and understand what software-defined networking (SDN), load balancing, DNS, and firewalls do. Understanding Cloud Storage Technologies 57 Understanding Cloud Storage Technologies The cloud offers companies and users a large number of services, and it seems like more service types are being added every day. Fundamentally, though, what the cloud really provides is compute, network, and storage services without the need to invest in the underlying hardware. Of these three, storage is the most tangible and understandable. It’s also the most widely used and the one that catapulted the cloud to popularity. It’s easy to see why cloud storage is so popular. Among its benefits are convenient file access, easy sharing and collaboration, and automated redundancy. Users can access files from anywhere in the world, nearly instantaneously. Those same users can collaborate on a document or project, which wouldn’t be nearly as easy or even possible if files had to be stored on someone’s local hard drive. And the cloud has systems to back up all those files without users needing to intervene. To top it all off, cloud storage is cheap! In this section, we will look at cloud storage technologies. We’ll start with a high-level overview of how cloud storage works and how providers set it up. Then we’ll get into storage characteristics, types, and features. Finally, we’ll end by looking at a specific type of cloud storage—a content delivery network. How Cloud Storage Works Most computing devices come with some sort of long-term persistent storage media. By persistent, we mean that once the power is turned off, the device retains the information stored on it. Usually this device is called a hard drive. A hard drive is either a set of spinning metal-coated platters or electronic circuitry that stores long stretches of binary data—literally 0s and 1s—for later retrieval. It’s up to the device’s OS and apps to make sense of those 0s and 1s to know where relevant information starts and stops. If the device runs low on free storage space, performance can suffer. If the device runs out of storage space, no new data can be stored. Storage in the cloud looks and acts a lot like storage on a local device. There’s a physical hard drive—more like hundreds or thousands of them—sitting somewhere storing data, controlled by one or more servers. Data is stored persistently and can be retrieved by a user who has the appropriate access. Of course, just like accessing any other cloud service, the user needs a network connection. Even though there are a lot of similarities between how traditional and cloud storage work, there’s one major difference. On a traditional computing device, the number of storage devices, and consequently the maximum storage space, is limited by the number of drive connections the computer has. For example, if a computer’s motherboard has only four hard drive connectors, then that computer can manage a maximum of four hard drives. 58 Chapter 2 Cloud Networking and Storage Cloud storage runs on a technology called software-defined storage (SDS). In SDS, the physical storage of data is separated from the logical control over drive configuration, independent from the underlying hardware. If this sounds a lot like SDN, that’s because it is. It’s the same idea, just applied to storage instead of networking. Figure 2.15 shows an example. F i g u r e 2.1 5 Software-defined storage Cloud-Defined Storage Controller Storage Volume 1 Storage Volume 2 Hard Drive Server 1 Server 2 Server 3 Server 4 Volume 3 HDD1 SSD1 HDD1 SSD1 Volume 4 HDD2 SSD2 HDD2 SSD2 HDD3 HDD1 HDD3 HDD4 HDD2 HDDx Volume 5 Volume 6 Volume 7 In SDS, one logical unit of storage can be composed of an innumerable number of physical hard drives, enabling virtually unlimited storage. At the same time, one physical hard drive can be separated into a large number of logical storage units. Those two extremes and every combination in between are possible. The user doesn’t know what the underlying hardware is, and quite frankly it doesn’t matter as long as the files are safe, secure, and easily accessible. Finally, a good SDS solution will have the following features: Scalability Both the amount of storage offered to clients and the underlying hardware should be scalable without any performance issues or downtime. Scalability for clients should be automated. Transparency Administrators and clients should know how much storage is available and what the cost of storage is. A Standard Interface trators and clients. Management and maintenance of SDS should be easy for adminis- Storage Type Support The SDS should support applications written for object, file, or block storage. (We will review these concepts in more detail in the “Storage Types” section later in this chapter.) Understanding Cloud Storage Technologies 59 Cloud-Based Storage Providers There is no shortage of cloud-based storage providers on the market today. They come in a variety of sizes, from the big Cloud Service Providers (CSPs) that deliver a full range of services, such as AWS, Azure, and Google Cloud, to niche players that offer storage solutions only. Each one offers slightly different features, and most of them will offer limited storage for free and premium services for more data-heavy users. Table 2.2 shows you a comparison of some personal plans from the more well-known storage providers. Please note that this table is for illustrative purposes only, since the data storage limits and costs can change. Most of these providers offer business plans with very high or unlimited storage for an additional fee. TA b l e 2. 2 Cloud-based storage providers and features Service Free Premium Cost per Year Dropbox 2GB 1TB $99 Apple iCloud 5GB 50GB, 200GB, or 2TB $12, $36, or $120 Box 10GB 100GB $120 Microsoft OneDrive 5GB 50GB or 1TB $24 or $70 Google Drive 15GB 100GB, 1TB, or 10TB $24, $120, or $1200 Which one should you choose? If you want extra features such as web-based applications, then Google or Microsoft is probably the best choice. If you just need data storage, then Box or Dropbox might be a better option. Nearly all client OSs will work with any of the cloud-based storage providers, with the exception of Linux, which natively works only with Dropbox. Most cloud storage providers offer synchronization to the desktop, which makes it so that you have a folder on your computer, just as if it were on your hard drive. And importantly, that folder will always have the most current edition of the fi les stored in the cloud. Cloud Storage Terminology When working with a CSP to set up cloud storage, you will be presented with a plethora of features. Having a solid understanding of the terms that will be thrown in your direction will make it easier to determine which solutions are best for your business. In this section, we’ll look at storage characteristics, types, and features. 60 Chapter 2 Cloud Networking and Storage Storage Characteristics The first thing your company needs to decide is what the cloud storage will be used for. Some want cloud storage to be instantly accessible, like a replacement for local hard drive space. Others need long-term archival storage, and of course, there could be any combination in between. Once you understand the purpose for the storage, you can focus on performance and price of available solutions. Taking a high-level view of cloud storage, there are two major categories of performance— hot and cold. Hot storage refers to data that is readily available at all times. Cold storage refers to data that isn’t needed very often, such as archived data. It might take from several minutes to several days to make cold data available to access. Table 2.3 shows an overview of hot versus cold storage performance characteristics. Ta b l e 2. 3 Hot versus cold storage performance Trait Hot Cold Access frequency Frequent Infrequent Access speed Fast Slow Media type Fast hard drives such as solid-state drives (SSDs) Slower hard drives such as conventional hard disk drives (HDDs), tape drives, offline Cost per GB Higher Lower Cloud storage would be pretty easy to navigate if there were only two options to consider. In the real world, CSPs offer a wide range of storage solutions with varying performance and cost. For example, Microsoft Azure offers premium, hot, cool, and archive tiers. Google has two hot options—multiregional storage and regional storage—along with nearline and coldline options. AWS has two primary designations, which are its Simple Storage Service (S3) for hot data and S3 Glacier for archives. Within S3 and S3 Glacier, there are multiple options to choose from. You might have noticed in the previous paragraph that there are a few types of service not covered in Table 2.3. It’s common for providers to have a level of service between hot and cold, sometimes called warm or cool. Maybe they will even have both. Google calls its in-between service nearline, to indicate it’s not quite as fast as online but not as slow as a traditional archive. Understanding Cloud Storage Technologies 61 Understanding Storage Containers When a client uploads data to the cloud, that data is placed into a storage container. CSPs call their containers by different names. For example, AWS and Google Cloud use the term bucket, whereas Azure uses blob. This leads to the use of some unintentionally funny phrases—making archived Azure data readable again is called rehydrating the blob. CSPs allow clients to have multiple containers to meet their storage needs. For example, many companies will have several hot containers for different offices or departments, and more cold containers to archive data. The end result is a flexible infrastructure that can be customized to meet any client’s needs. Regardless of what the CSP chooses to call its storage offering, focus on the performance characteristics and cost. As you might expect, the hotter the service and the more features it has, the more you will pay per gigabyte. Here are some key performance characteristics and parameters to pay attention to: Cost per gigabyte Storage capacity limits The maximum number of containers allowed Data encryption Storage compression and/or deduplication If intelligent analysis of storage usage and/or automated optimization is provided Dynamic container resizing (e.g., capacity on demand) Data read/write speed Number of data reads/writes allowed, or cost per thousand reads/writes Data latency for reads/writes Data retrieval time for archives Archived data retrieval cost Most companies tend to overestimate the amount of data they need to store, as well as capacity needed for other cloud services. It’s possibly cheaper to pay for less capacity and then pay for peak overage charges than it is to pay for more capacity all the time. Do the due diligence to price out options, knowing that if you really need more capacity later, you can almost always add it dynamically. 62 Chapter 2 Cloud Networking and Storage Storage Types At the hardware level, data is stored as long strings of 1s and 0s on its storage media. This includes everything from critical OS files to databases to pictures of your last vacation to that inappropriate video clip of the last company holiday party. It’s up to the device’s OS or software to interpret the data and know where a relevant piece of information starts and stops. There are multiple ways in which data can be stored and retrieved by a computer. The underlying 1s and 0s don’t change, but how they are logically organized into groups and accessed can make a big performance difference. The three storage types you need to know are file storage, block storage, and object storage. File Storage For anyone who has used a Windows-based or Mac computer, file storage is instantly recognizable. It’s based on the concept of a filing cabinet. Inside the filing cabinet are folders, and files are stored within the folders. Each file has a unique name when you include the folders and subfolders it’s stored in. For example, c:\files\doc1.txt is different than c:\papers\doc1.txt. Figure 2.16 shows an example of file storage, as seen through Windows Explorer. F i g u r e 2.1 6 File storage The hierarchical folder structure and naming scheme of file storage makes it relatively easy for humans to navigate. Larger data sets and multiple embedded folders can make it trickier—who here hasn’t spent 10 minutes trying to figure out which folder they put that file in—but it’s still pretty straightforward. OSs have built-in file systems that manage files. Windows uses the New Technology File System (NTFS), Macs use Apple File System (APFS), and Linux uses the fourth extended file system (ext4), among others. Although they all function in slightly different ways, Understanding Cloud Storage Technologies 63 each file system maintains a table that tracks all files and includes a pointer to where each file resides. (Folders are just specialized files that can contain pointers to other folders or files.) Because of the pointer system and the way that files are written to hard drives, it’s entirely possible (and in fact likely) that c:\files\doc1.txt and c:\files\doc2.txt are in completely different locations on the hard drive. On top of it all, most file systems are specific to an OS. A Windows-based computer, for example, can’t natively read an APFS- or ext4-formatted hard drive. File systems will experience performance problems when they are responsible for storing large numbers of files or very large files, such as databases. Their structure makes them highly inefficient. They work great in an office environment where users need to store some data on their PC, share files with co-workers, and make small backups. Large-scale data environments require a different solution. Block Storage With file storage, each file is treated as its own singular entity, regardless of how small or large it is. With block storage, files are split into chunks of data of equal size, assigned a unique identifier, and then stored on the hard drive. Because each piece of data has a unique address, a file structure is not needed. Figure 2.17 illustrates what this looks like. F i g u r e 2.17 Block storage Hard Drive 0 1 2 3 7 8 9 10 11 12 13 4 5 6 14 15 16 17 18 19 20 In Block 12 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 … Out Block 39 Block storage allows files to be broken into more manageable chunks rather than being stored as one entity. This allows the OS to modify one portion of a file without needing to open the entire file. In addition, since data reads and writes are always of the same block size, data transfers are more efficient and therefore faster. Latency with block storage is lower than with other types of storage. One of the first common use cases for block storage was for databases, and it remains the best choice for large, structured databases today. Block storage is also used for storage area networks (SANs) that are found in large data centers, and many email server applications natively support block storage because of its efficiency over file storage. Finally, virtualization software uses block storage as well. Essentially, the VMM creates block storage containers for guest OSs, which function as the storage system for the guest OSs. 64 Chapter 2 Cloud Networking and Storage While block storage is great for large, structured data sets that need to be accessed and updated frequently, it’s not perfect. One thing it doesn’t handle well is metadata, which is needed to make sense of unstructured data. For unstructured data, the best choice is object storage. When data is referred to as structured, that means it conforms to defined parameters. The best examples to think of are spreadsheets and relational databases, which have columns and rows to provide the structure. On the flip side, unstructured data is information that is not organized in a predefined way. Unstructured information is often text-heavy, such as a novel stored as a long string of words with no punctuation or chapter breaks. Data such as dates, numbers, and facts can be unstructured too, if they’re not stored in a relational way. Images and videos are other great examples of unstructured data. Object Storage Not all data fits into easily defi ned or standard parameters. Think about pictures or videos stored on a hard drive, for example. How would a user easily search a folder containing 1,000 pictures for people wearing green shirts? With fi le or even block storage, this is next to impossible without opening every fi le and examining it. Contrast that to searching 1,000 text fi les for the word green, which is a much simpler task. Pictures and videos are two examples of unstructured data—they don’t have a predefi ned data model, and they can’t be organized in a predefi ned structure. (To be fair, they could be organized into a predefi ned structure, but it would require tedious manual labor and be inflexible. For example, say you create a folder and put any picture with someone wearing a green shirt into it. Now how do you easily search that folder for people with blue shirts as well? You don’t.) A term that’s thrown around a lot today is big data. It seems that every company wants to find an edge by analyzing big data, but most people in positions of power don’t even really know what that means. Big data doesn’t necessarily refer to the size of the data set, even though it can. Really, big data refers to a collection of unstructured data that, if properly analyzed, can yield new insights. By using object storage, a company can make the organization and retrieval of unstructured data easier. In object storage, every unit of storage is called an object. There’s no predefi ned size or structure for each object—it could be a small text fi le, a two-hour long movie, a song, or a copy of War and Peace. An object can be literally anything that anyone wants to store. If you’ve posted a picture in social media, you’ve used object storage. Each object has four characteristics, which are shown in Figure 2.18. Understanding Cloud Storage Technologies 65 The Data This is the data for the object itself—literally the bits that make up the file or image or whatever it is that’s being stored. Metadata Describing the Object Metadata is optional information about the object. It can describe the object, what it’s used for, or anything else that will be relevant for people trying to find the object. In the green shirt example we used earlier, a metadata tag could say green shirt, for example. Metadata could also describe the type of shirt, words or logo on the shirt, who is wearing the shirt, where the picture was taken, the wearer’s favorite food, or anything else. Metadata is completely customizable and can be entered in by the creator or others who access the file. Object Attributes These are classifications of metadata, such as color, person, or other relevant characteristics. They’re optional but can make it easier to compare different unstructured data sets. A Unique Identifying ID Each object needs a unique ID within the system in which it resides. As with block storage, there is no hierarchical folder structure. F i g u r e 2.1 8 Object storage Data ID Metadata Object Attributes In direct contrast to block storage, which makes every storage unit the same size, object storage sets no restrictions on object size. Similarly, there is no limit to how much metadata can be created for any object. The result is a completely flexible and customizable data storage solution. One downside compared to block storage is that in object storage, the entire object must be opened to modify any part of it. This can result in slower performance for large objects that are frequently modified. Because of this, object storage is inferior to block storage for things like databases. As for use cases, we already mentioned the storage of unstructured data such as pictures and videos. The other big use case for object storage is for backup archives. By nature, backup archives are often massive but rarely if ever need to be modified. That’s a great combination to take advantage of the features of object storage. 66 Chapter 2 Cloud Networking and Storage Storage Features Cloud storage solutions offer two features to help save space—compression and deduplication. They also allow you to obtain additional services at any time, which is called capacity on demand. We’ll look at each one in more detail next. Compression The purpose of compression is to make fi les smaller so they take up less storage space. Compression works by looking for repeated information within a fi le and then replacing that information with a shorter string. Therefore, fewer bits are required to store the same amount of information. For example, let’s imagine you want to apply compression to the rules manual for the National Football League (NFL). You can probably think of a lot of words that will be repeated, such as football, player, touchdown, pass, run, kick, tackle, and so on. A compression utility can replace the word football with a 1, player with a 2, touchdown with a 3, and so forth. Every time the word football is mentioned, a 1 is used instead, saving seven characters of space. Every time player is used, you save five characters. Interestingly (maybe only to us), the word football appears only 25 times in the 2018 NFL official playing rules, so only 168 characters are saved. But player is mentioned a whopping 812 times in 89 pages. Compressing that to a single character saves 4,060 characters. Repeating this process for all fi les in a storage unit can save a lot of capacity. A dictionary fi le is maintained to keep track of all the replacements. Of course, real-life compression is bit more complicated than this, but this example illustrates the principle. The process is to remove redundant words or phrases and replace them with a shorter placeholder to save space. Before the file can be opened or modified by a user, it must be decompressed back to its original form, which will slow down the computer. Almost all OSs allow for folder and file compression. In most versions of Windows, right-click the folder or file, go to Properties, and click the Advanced button. You will see a screen similar to the one shown in Figure 2.19. f i g u r e 2.1 9 Advanced attributes folder compression Understanding Cloud Storage Technologies 67 Deduplication On the surface, compression and deduplication might seem like the same thing, but they’re not. Data deduplication works at the fi le level or block level to eliminate duplicate data. As an example of this, imagine a backup of an email server that stores email for 2,000 corporate users. The users collaborate a lot, so there are a lot of emails that go back and forth with Word or Excel attachments in them. It’s easy to conceive that there might be dozens or even hundreds of backed-up emails that contain the same attached Excel fi le. With deduplication, only one copy of that Excel fi le will be stored. For the other instances, a small pointer fi le is created, pointing to the original. In this way, deduplication saves storage space. When used in large data sets, compression and deduplication can save companies thousands of dollars per year in storage costs. To save money, compress or deduplicate files before uploading them to the CSP. If you upload 100GB of data and the CSP is able to compress or dedupe it down to 10GB, you will still be charged for 100GB of storage. Most of the time, you won’t know that the CSP has compressed or deduped the data. But if you do it yourself and upload that 10GB, you will be charged for only 10GB. Capacity on Demand We’ve already hit on the concept of capacity on demand in both Chapter 1 and earlier in this chapter. The idea is straightforward—if you need extra storage capacity, it is instantaneously available. You just pay extra for the extra capacity that you use. Capacity on demand can be great, but it poses some risks as well. Just because you can have the capacity doesn’t mean you should use the capacity. Many times, excessive capacity can be obtained unintentionally. For example, take a company that has a 100GB cloud-hosted customer service database. Clearly, it will pay for the 100GB of storage. In addition, an admin set up the database to be replicated, just in case of failure. So now it’s paying for 200GB. A perfectly well-meaning network administrator doesn’t know that the database is replicated and decides to set up a backup for the same database. Now the company is paying for 300GB or even 400GB, depending on what the admin did. The CSP isn’t going to ping you to be sure your company really wants to buy 400GB of storage; they will be happy to just send you the bill. This is a good time to remind you to conduct frequent capacity planning and usage analysis and adjust your purchase accordingly. Also remember that organizations tend to overestimate the capacity they will need. If you can, pay for only what you absolutely need and use! 68 Chapter 2 Cloud Networking and Storage Content Delivery Networks A content delivery network (CDN) is a specialized type of load balancing used with web servers. Its primary use is to speed up access to web resources for users in geographically distributed locations. CDNs allow users in remote locations to access web data on servers that are closer to them than the original web server is. An example is shown in Figure 2.20. Figure 2.20 A content delivery network Edge Server Origin Edge Server Edge Server PoP PoP Web Server PoP Edge Server PoP Edge Server PoP In Figure 2.20, you can see that the company’s web server is based in the eastern United States. In CDN terms, this server is the origin because it’s the primary content server. If users across the globe needed to access this one server, there could be some potential issues. Latency (delay) can be a problem, because users in Asia or Africa are quite some distance away. The number of users can also be a problem, because more users equates to slower response times. The CDN creates a point of presence (PoP) in each remote location. Within each PoP will be one or more edge servers. The job of the edge server is to cache the content of the origin and serve it to users who are located nearby. If everything is configured properly, the edge server will have the same information that the origin does. The cached content is typically a website but can include data such as plain text, images, videos, audio, PDFs, or scripting files for programmers. Understanding Cloud Storage Technologies 69 From the end user’s standpoint, the location of the server is transparent. All they have to do is open their browser and type in the website’s address. They won’t know where the server that responds is located, nor does it really matter as long as they get their content and get it quickly. It’s estimated that more than 75 percent of large multinational corporations use CDNs to distribute web content. Some key benefits of using a CDN are as follows: Increased website performance via faster load times Increased reliability, thanks to greater availability and redundancy Decreased bandwidth costs Greater scalability for web resources Increased security The fi rst four benefits probably make a lot of intuitive sense, but increased security might not be so apparent. Take the instance of a distributed denial-of-service (DDoS) attack, where the attacker attempts to overwhelm a web server and make it unable to respond to legitimate requests. For an example, assume this attack comes from Europe. If the CDN is configured properly, the attack may temporarily disable the European edge server, but it won’t affect the other locations. Users making requests of the website can be directed to other edge servers to fulfi ll the request. As a reminder, the objectives you need to know from this section for the exam are the following: Storage features Compression Deduplication Capacity on demand Storage characteristics Performance Hot vs. cold Storage types Object storage File storage Block storage Software-defined storage Content delivery network If you don’t feel comfortable with all of these concepts, go back through them before answering the end-of-chapter review questions! 70 Chapter 2 Cloud Networking and Storage Summary This chapter covered two major cloud concepts: networking and storage. First, we started with a quick networking primer to make sure that the other cloud networking concepts make sense within a framework. Once you learned more than you might have wanted to about TCP/IP, we looked at the ways to connect to a cloud. Common approaches for single clients include HTTPS, RDP, SSH, and a VPN. For multiple clients or an entire site, VPNs and direct connect are the two connectivity types to remember. We finished our discussion of cloud networking by reviewing four services that you can get from cloud service providers. The first is software-defined networking. It’s a service where the physical networking hardware is abstracted and logically controlled through a single interface. It’s great for network flexibility and scalability. Load balancing spreads out the work that servers need to do to efficiently increase network performance. DNS resolves host names to IP addresses. Without it, the Internet might kind of suck—or at least it would be very different from what we’re used to today. The final service was a firewall, which is a security device for computers or a network. To kick off our discussion of cloud storage technologies, we first looked at how cloud storage works. One of the primary technologies behind cloud storage is software-defined storage. After that, we talked about some popular cloud storage providers and gave examples of different plans. The main part of learning about cloud storage was around characteristics, types, and features. Characteristics are hot and cold and performance profiles for each. The storage types we discussed are file, block, and object storage. Features you need to know are compression, deduplication, and capacity on demand. We ended the chapter by talking about how a content delivery network can help load balance web services. Exam Essentials Know how to connect to a cloud with HTTPS. Hypertext Transfer Protocol Secure is used by a web browser such as Google Chrome, Microsoft Edge, or Apple Safari to connect to a URL for the cloud resource. Know what RDP is used for. Windows-based server. Remote Desktop Protocol is used to connect to a remote Know what SSH is used for. server. Secure Shell is used to connect to a remote Linux-based Understand how VPNs are used. A virtual private network is a secure tunnel through an unsecured network, such as the Internet. VPNs can be used to connect an individual user to a cloud server, and they can also be used to connect one corporate site to another. Exam Essentials 71 Know when a direct connect should be used. If a company needs to make extensive use of cloud services and have very little delay, a direct connect between that company’s network and a CSP’s network might be the best option. Know which connectivity methods are secure. HTTPS, RDP, SSH, and VPNs all have builtin encryption. A direct connection should be secure as well, provided it’s configured properly. Understand what SDN is. In software-defined networking, the physical routing of packets is separated from the logical control of where the packets should be routed to. SDN provides flexibility and scalability for corporate networks in the cloud. Know what a load balancer does. Load balancing is a set of servers configured to perform the same task. They can speed up network performance by sending tasks to the server that has the most capacity. Understand what DNS does. Domain Name System resolves host names to IP addresses. Understand what a firewall does. A firewall is a security device that protects computers or networks from malicious network traffic. Know what file compression is. File compression makes files smaller to take up less room when they are stored. It works by looking for repeated words or phrases within a file and replacing them with a shortened version. Understand the difference between file compression and deduplication. File compression works within a file. Deduplication works between files. For example, if there are 10 copies of a Word document, a deduplicated file system will store only one copy of it and replace the others with pointers to the original file. Know what capacity on demand is. With capacity on demand, clients can get more storage space from their CSP nearly instantly. Understand hot versus cold storage and the performance implications of each. Hot storage is online and always accessible. Cold storage may be offline or on tapes. Hot is more expensive per gigabyte, but the performance is much faster. Understand when to use file, block, or object storage. File storage is used on personal computers and many servers and works best for smaller storage solutions. Block storage is best suited for databases and large-scale, frequently accessed storage solutions such as those found in a SAN. Object storage is great for unstructured (“big”) data and for large computer or network data backups. Know what SDS is. With software-defined storage, the physical storage medium is abstracted from the user and controlled by a central server. This allows storage volumes to span physical devices or many storage volumes to be held within one physical device. Understand what a CDN is. A content delivery network is a load balancer for web-based content. Edge servers in remote locations service users who are in closer proximity to them than they are to the original web server.