Practical Cloud Security (2023, 2nd Edition) PDF

Se dit E co ion nd Practical Cloud Security A Guide for Secure Design and Deployment Chris Dotson Practical Cloud Security With rapidly changing a...

Se dit E co ion nd Practical Cloud Security A Guide for Secure Design and Deployment Chris Dotson Practical Cloud Security With rapidly changing architecture and API-driven automation, cloud platforms come with unique security “In Practical Cloud challenges and opportunities. In this updated second edition, Security, Chris Dotson you’ll examine security best practices for multivendor expertly navigates the cloud environments, whether your company plans to move complex world of shared legacy on-premises projects to the cloud or build a new responsibilities in cloud infrastructure from the ground up. systems, particularly as Developers, IT architects, and security professionals will they pertain to sensitive learn cloud-specific techniques for securing popular cloud sectors like healthcare. platforms such as Amazon Web Services, Microsoft Azure, Using a straightforward and IBM Cloud. IBM Distinguished Engineer Chris Dotson yet thorough approach, shows you how to establish data asset management, this second edition offers identity and access management (IAM), vulnerability essential strategies management, network security, and incident response for protecting data in your cloud environment. and applications in Learn the latest threats and challenges in the cloud the cloud. It is a must- security space read for students and professionals aiming to Manage cloud providers that store or process data or strengthen their skills in deliver administrative control cloud security.” Learn how standard principles and concepts—such as least —Amir Bahmani, PhD privilege and defense in depth—apply in the cloud Stanford lecturer and Director of Stanford Deep Data Research Center Understand the critical role played by IAM in the cloud Use best tactics for detecting, responding, and recovering from the most common security incidents Chris Dotson is an IBM Distinguished Engineer and an executive security Manage various types of vulnerabilities, especially those architect in the IBM CIO organization. common in multicloud or hybrid cloud architectures He has 11 professional certifications, Examine privileged access management in including the Open Group Distinguished IT Architect certification, and over 25 cloud environments years of experience in the IT industry. CLOUD COMPUTING Twitter: @oreillymedia linkedin.com/company/oreilly-media US $49.99 CAN $62.99 youtube.com/oreillymedia ISBN: 978-1-098-14817-1 SECOND EDITION Practical Cloud Security A Guide for Secure Design and Deployment Chris Dotson Beijing Boston Farnham Sebastopol Tokyo Practical Cloud Security by Chris Dotson Copyright © 2024 Chris Dotson. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Acquisitions Editor: Megan Laddusaw Indexer: WordCo Indexing Services, Inc. Development Editor: Rita Fernando Interior Designer: David Futato Production Editor: Clare Laylock Cover Designer: Karen Montgomery Copyeditor: Liz Wheeler Illustrator: Kate Dullea Proofreader: Rachel Head March 2019: First Edition October 2023: Second Edition Revision History for the Second Edition 2023-10-06: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098148171 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Practical Cloud Security, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-14817-1 [LSI] Table of Contents Preface....................................................................... ix 1. Principles and Concepts...................................................... 1 Least Privilege 2 Defense in Depth 2 Zero Trust 3 Threat Actors, Diagrams, and Trust Boundaries 4 Cloud Service Delivery Models 8 The Cloud Shared Responsibility Model 8 Risk Management 12 Conclusion 13 Exercises 15 2. Data Asset Management and Protection....................................... 17 Data Identification and Classification 17 Example Data Classification Levels 18 Relevant Industry or Regulatory Requirements 19 Data Asset Management in the Cloud 21 Tagging Cloud Resources 22 Protecting Data in the Cloud 23 Tokenization 23 Encryption 24 Conclusion 31 Exercises 33 3. Cloud Asset Management and Protection...................................... 35 Differences from Traditional IT 35 Types of Cloud Assets 36 iii Compute Assets 37 Storage Assets 43 Network Assets 48 Asset Management Pipeline 49 Procurement Leaks 50 Processing Leaks 51 Tooling Leaks 52 Findings Leaks 52 Tagging Cloud Assets 52 Conclusion 54 Exercises 56 4. Identity and Access Management............................................. 57 Differences from Traditional IT 59 Life Cycle for Identity and Access 60 Request 62 Approve 62 Create, Delete, Grant, or Revoke 63 Authentication 63 Cloud IAM Identities 63 Business-to-Consumer and Business-to-Employee 64 Multi-Factor Authentication 65 Passwords, Passphrases, and API Keys 68 Shared IDs 70 Federated Identity 71 Single Sign-On 71 Instance Metadata and Identity Documents 73 Secrets Management 75 Authorization 79 Centralized Authorization 80 Roles 81 Revalidate 82 Putting It All Together in the Sample Application 85 Conclusion 87 Exercises 89 5. Vulnerability Management.................................................. 91 Differences from Traditional IT 92 Vulnerable Areas 94 Data Access 95 Application 95 Middleware 98 iv | Table of Contents Operating System 99 Network 100 Virtualized Infrastructure 100 Physical Infrastructure 100 Finding and Fixing Vulnerabilities 101 Network Vulnerability Scanners 102 Agentless Scanners and Configuration Management Systems 104 Agent-Based Scanners and Configuration Management Systems 105 Cloud Workload Protection Platforms 107 Container Scanners 107 Dynamic Application Scanners (DAST) 108 Static Application Scanners (SAST) 108 Software Composition Analysis Tools (SCA) 109 Interactive Application Scanners (IAST) 109 Runtime Application Self-Protection Scanners (RASP) 109 Manual Code Reviews 110 Penetration Tests 110 User Reports 112 Example Tools for Vulnerability and Configuration Management 112 Risk Management Processes 115 Vulnerability Management Metrics 115 Tool Coverage 116 Mean Time to Remediate 116 Systems/Applications with Open Vulnerabilities 117 Percentage of False Positives 117 Percentage of False Negatives 117 Vulnerability Recurrence Rate 118 Change Management 118 Putting It All Together in the Sample Application 119 Conclusion 123 Exercises 124 6. Network Security.......................................................... 125 Differences from Traditional IT 125 Concepts and Definitions 127 Zero Trust Networking 127 Allowlists and Denylists 127 DMZs 129 Proxies 129 Software-Defined Networking 130 Network Functions Virtualization 130 Overlay Networks and Encapsulation 130 Table of Contents | v Virtual Private Clouds 131 Network Address Translation 132 IPv6 133 Network Defense in Action in the Sample Application 134 Encryption in Motion 135 Firewalls and Network Segmentation 138 Allowing Administrative Access 144 Network Defense Tools 148 Egress Filtering 152 Data Loss Prevention 155 Conclusion 156 Exercises 158 7. Detecting, Responding to, and Recovering from Security Incidents............... 161 Differences from Traditional IT 162 What to Watch 163 Privileged User Access 165 Logs from Defensive Tooling 167 Cloud Service Logs and Metrics 170 Operating System Logs and Metrics 171 Middleware Logs 172 Secrets Server 172 Your Application 172 How to Watch 173 Aggregation and Retention 174 Parsing Logs 175 Searching and Correlation 176 Alerting and Automated Response 176 Security Information and Event Managers 177 Threat Hunting 179 Preparing for an Incident 179 Team 180 Plans 181 Tools 183 Responding to an Incident 185 Cyber Kill Chains and MITRE ATT&CK 185 The OODA Loop 187 Cloud Forensics 188 Blocking Unauthorized Access 189 Stopping Data Exfiltration and Command and Control 189 Recovery 189 Redeploying IT Systems 189 vi | Table of Contents Notifications 190 Lessons Learned 190 Example Metrics 190 Example Tools for Detection, Response, and Recovery 191 Detection and Response in a Sample Application 192 Monitoring the Protective Systems 193 Monitoring the Application 194 Monitoring the Administrators 195 Understanding the Auditing Infrastructure 195 Conclusion 196 Exercises 198 Appendix. Exercise Solutions................................................... 199 Index....................................................................... 205 Table of Contents | vii Preface As the title states, this book is a practical guide to securing your cloud environments. In almost all organizations, security has to fight for time and funding, and it often takes a back seat to implementing features and functions. Focusing on the “best bang for the buck,” security-wise, is important. This book is intended to help you get the most important security controls for your most important assets in place quickly and correctly, whether you’re a security profes‐ sional who is somewhat new to the cloud, or an architect or developer with security responsibilities. From that solid base, you can continue to build and mature your controls. While many of the security controls and principles are similar in cloud and on- premises environments, there are some important practical differences. For that rea‐ son, a few of the recommendations for practical cloud security may be surprising to those with an on-premises security background. While there are certainly legitimate differences of opinion among security professionals in almost any area of informa‐ tion security, the recommendations in this book stem from years of experience in securing cloud environments, and they are informed by some of the latest develop‐ ments in cloud computing offerings. This is primarily a book about security, not compliance. That said, if you need to meet specific compliance requirements, such as PCI DSS, HIPAA, or FedRAMP, you will find some limited guidance on designing your security controls so that you will be able to do so. Who Should Read This Book This book is designed as an intermediate-level resource and is intended primarily for two types of practitioners: ix Those who have some experience with securing on-premises environments, but little or no experience with cloud environments Those who have experience building cloud environments, but little or no experi‐ ence with securing those cloud environments The goal of this book is to provide a conceptual-level understanding of the “art of the possible” in cloud security. You won’t find a cookbook-style guide on exactly how to implement various controls in specific cloud environments, for a few reasons. One is that such guides tend to become out of date very quickly, because cloud providers are constantly improving their implementations. Another is that the cloud providers gen‐ erally do a better job of providing explicit how-to guides than I can, because the implementations are specific to the way they’ve designed their services. A detailed how-to guide by one cloud provider will be more useful than a generic how-to that tries to cover multiple cloud providers. What I try to provide is the understanding of when you need to find such a guide and use it. Navigating This Book The first three chapters deal with understanding your responsibilities in the cloud and how they differ from those in on-premises environments, as well as understand‐ ing what assets you have, what the most likely threats to those assets are, and some protections for them. Chapters 4 through 6 provide practical guidance, in priority order, of the most important security controls that you should consider first: Identity and access management Vulnerability management Network controls The final chapter deals with how to detect when something’s wrong and deal with it. It’s a good idea to read this chapter before something actually goes wrong! What’s New in the Second Edition This new edition has been updated based on developments in the cloud computing and security industries in the years since the release of the first edition. Some exam‐ ples are: More information on zero trust principles as they apply to protecting cloud environments x | Preface Advancements in encryption techniques, such as quantum-resistant encryption algorithms Advancements in authentication techniques, such as passwordless technologies and passkeys The use of privileged access management tools to protect cloud environments Verification of workload identities in addition to human identities The importance of protecting software supply chains, including build and deployment environments in the cloud, with transparency through a Software Bill of Materials (SBOM) Updates based on changes to offerings by major cloud providers since the previ‐ ous publication Updated examples of the different types of defensive tools and technologies avail‐ able today In addition, you can now check your newfound understanding of cloud security con‐ cepts as you read. I have added some questions and exercises to the end of each chap‐ ter, and the answers are in the Appendix. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. Preface | xi This element signifies a general note. This element indicates a warning or caution. O’Reilly Online Learning Platform For almost 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help compa‐ nies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, conferences, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in- depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, please visit http://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-829-7019 (international or local) 707-829-0104 (fax) [email protected] https://www.oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/PracticalCloudSecurity2e. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. xii | Preface Follow us on Twitter: https://twitter.com/oreillymedia. Watch us on YouTube: https://youtube.com/oreillymedia. Acknowledgments This book would not have happened without the encouragement and support of my wonderful wife, Tabitha Dotson, who told me that I couldn’t pass up this opportunity and juggled schedules and obligations for over a year to make it happen. I’d also like to thank my children, Samantha (for her extensive knowledge of Greek mythology) and Molly (for constantly challenging assumptions and thinking outside the box). It takes many people besides the author to bring a book to publication, and I didn’t fully appreciate this before writing one. I’d like to thank my first edition editors, Andy Oram and Courtney Allen; my second edition editors, Rita Fernando and Megan Laddusaw; my first edition reviewers, Hans Donker, Darren Day, and Edgar Ter Dan‐ ielyan; my second edition reviewers, Lee Atchison, Karan Dwivedi, and Akhil Behl; and the rest of the wonderful team at O’Reilly who have guided and supported me through this. Finally, I’d like to thank all of my friends, family, colleagues, and mentors over the years who have answered questions, bounced around ideas, listened to bad puns, laughed at my mistakes, and actually taught me most of the content in this book. Preface | xiii CHAPTER 1 Principles and Concepts Yes, this is a practical guide, but we do need to cover a few cloud-relevant security principles and concepts at a high level before we dive into the practical bits. If you’re a seasoned security professional, but new to the cloud, you may want to skim down to “The Cloud Shared Responsibility Model” on page 8. The reason for covering these principles and concepts first is because they are used implicitly throughout the rest of the book when I discuss designing and implement‐ ing security controls to stop attackers. Conceptual gaps and misunderstandings in security can cause lots of issues. For example: If you’re not familiar with least privilege, you may understand authorization for cloud services well, but still grant too much access to people or automation in your cloud account or on a cloud database with sensitive information. If you’re not familiar with defense in depth, then having multiple layers of authentication, network access control, or encryption may not seem useful. If you don’t know a little about threat modeling—the likely motivations of attack‐ ers, and the trust boundaries of the system that you’re designing—you may be spending time and effort protecting the wrong things. If you don’t understand the cloud service delivery models and the shared respon‐ sibility model, you may spend time worrying about risks that are your cloud pro‐ vider’s responsibility and miss risks that are your responsibility to address. If you don’t know a little about risk management, you may spend too much time and effort on low risks rather than managing your higher risks. I’ll cover this foundational information quickly so that we can get to cloud security controls. 1 Least Privilege The principle of least privilege simply states that people or automated tools should be able to access only what they need to do their jobs, and no more. It’s easy to forget the automation part of this; for example, a component accessing a database should not use credentials that allow write access to the database if write access isn’t needed. A practical application of least privilege often means that your access policies are deny by default. That is, users are granted no (or very few) privileges by default, and they need to go through the request and approval process for any privileges they require. For cloud environments, some of your administrators will need to have access to the cloud console—a web page that allows you to create, modify, and destroy cloud assets such as virtual machines. With many providers, anyone with access to your cloud console will have godlike privileges by default for everything managed by that cloud provider. This might include the ability to read, modify, or destroy data from any part of the cloud environment, regardless of what controls are in place on the operating systems of the provisioned systems. For this reason, you need to tightly control access to and privileges on the cloud console, much as you tightly control physical data cen‐ ter access in on-premises environments, and record what these users are doing. Defense in Depth Many of the controls in this book, if implemented perfectly, would negate the need for other controls. Defense in depth is an acknowledgment that almost any security control can fail, either because an attacker is sufficiently determined and skilled or because of a problem with the way that security control is implemented. With defense in depth, you create multiple layers of overlapping security controls so that if one fails, the one behind it can still catch the attackers. You can certainly go to silly extremes with defense in depth, which is why it’s impor‐ tant to understand the threats you’re likely to face. However, as a general rule, you should be able to point to any single security control you have and say, “What if this fails?” If the answer is unacceptable, you probably have insufficient defense in depth. You may also have insufficient defense in depth if a single failure can make several of your security controls ineffective, such as an inventory issue that causes multiple tools to miss a problem. 2 | Chapter 1: Principles and Concepts Zero Trust Many products and services today claim to be zero trust, or to support zero trust principles. The name is confusing, because zero trust does not mean a complete lack of trust in anything, and the confusion is worse because it’s used for so many different marketing purposes. There are many different definitions and different ideas about what is meant by zero trust. We are probably stuck with the term at this point, but “zero trust” should really be called something else, such as “zero implicit trust” or “zero assumed trust without a good reason.”1 The core principle is that trust from a user or another system should be earned, rather than given simply because the user is able to reach you on the net‐ work, or has a company-owned device, or some other criterion that’s not well controlled. The implementation of zero trust will differ widely depending on whether you’re talk‐ ing about trusting devices, network connections, or something else. One commonly used implementation of zero trust is requiring encryption and authentication for all connections, even ones that originate and terminate in supposedly trusted networks. This was always a good idea, but it’s even more important in cloud environments where the perimeter is less strictly designed and internet connectivity is easy. Another common implementation of zero trust principles is limiting users’ network access to only the applications that they need, challenging the implicit trust that all users should be able to connect to all applications, even if they cannot log in. If you think this sounds a lot like least privilege and defense in depth, you’re right. There is considerable overlap between zero trust principles and some of the other principles in this chapter. A third example of zero trust is the use of multi-factor authentication of users, with reauthentication required either periodically or when higher-risk transactions are requested. In this case, we’re challenging the implicit trust that whoever has the pass‐ word for an account, or controls a particular session for an application, is the intended user. When following zero trust principles, you should only trust an interaction if you have strong evidence that the trust is warranted, such as by proof of strong authentication, or authorization, or correct configuration. That evidence should either be from some‐ thing you directly control (such as your own authentication system or device man‐ agement system), or from some third party that you have explicitly evaluated as competent to make trust decisions for you. Like other principles in this chapter, it can be disruptive to the user experience if taken to extremes. 1 If you’re expecting tips on how to pick catchy marketing names, you’re probably reading the wrong book! Zero Trust | 3 Threat Actors, Diagrams, and Trust Boundaries There are different ways to think about your risks, but I typically favor an asset- oriented approach. This means that you concentrate first on what you need to pro‐ tect, which is why I dig into data assets first, in Chapter 2. It’s also a good idea to keep in mind who is most likely to cause you problems. In cybersecurity parlance, these are your potential “threat actors.” For example, you may not need to guard against a well-funded state actor, but you might be in a business where a cyber-criminal can make money by stealing your data, or where a “hackti‐ vist” might want to deface your website for political or social reasons. Keep these peo‐ ple in mind when designing all of your defenses. While there is plenty of information and discussion available on the subject of threat actors, motivations, and methods,2 in this book we’ll consider four main types of threat actors that you may need to worry about: Organized crime or independent criminals, interested primarily in making money Hacktivists, interested primarily in discrediting you by releasing stolen data, committing acts of vandalism, or disrupting your business Inside attackers, usually interested in discrediting you or making money State actors, who may be interested in stealing secrets or disrupting your business to advance a foreign government’s political mission or cause To borrow a technique from the world of user experience design, you may want to imagine a member of each applicable group, give them a name, jot down a little about that “persona” on a card, and keep the cards visible when designing your defenses. The second thing you have to do is figure out what needs to talk to what in your application, and the easiest way to do that is to draw a picture and figure out where your weak spots are likely to be. There are entire books on how to do this,3 but you don’t need to be an expert to draw something useful enough to help you make deci‐ sions. However, if you are in a high-risk environment, you should probably create formal diagrams with a suitable tool rather than draw stick figures. 2 The Verizon Data Breach Investigations Report is an excellent free resource for understanding different types of successful attacks, organized by industry and methods, and the executive summary is very readable. 3 I recommend Threat Modeling: Designing for Security, by Adam Shostack (Wiley, 2014). 4 | Chapter 1: Principles and Concepts Although there are many different application architectures, for the sample applica‐ tion used for illustration here, I will show a simple three-tier design. Here is what I recommend for a very simple application component diagram: 1. Draw a stick figure and label it “user.” Draw another stick figure and label it “administrator” (Figure 1-1). You may find later that you have multiple types of users and administrators, or other roles, but this is a good start. Figure 1-1. User and administrator roles 2. Draw a box for the first component the user talks to (for example, the web servers), draw a line from the user to that first component, and label the line with how the user talks to that component (Figure 1-2). Note that at this point, the component may be a serverless function, a container, a virtual machine, or some‐ thing else. This will let anyone talk to it, so it will probably be the first thing attacked. We really don’t want the other components trusting this one more than necessary. Figure 1-2. First component 3. Draw other boxes behind the first for all of the other components that first sys‐ tem has to talk to, and draw lines going to those (Figure 1-3). Whenever you get to a system that actually stores data, draw a little symbol (I use a cylinder) next to it and jot down what data is there. Keep going until you can’t think of any more boxes to draw for your application. Threat Actors, Diagrams, and Trust Boundaries | 5 Figure 1-3. Additional components 4. Now draw how the administrator (and any other roles you’ve defined) accesses the application. Note that the administrator may have several different ways of talking to this application; for example, via the cloud provider’s portal or APIs, or through the operating system, or in a manner similar to how a user accesses it (Figure 1-4). Figure 1-4. Administrator access 5. Draw some trust boundaries as dotted lines around the boxes (Figure 1-5). A trust boundary means that anything inside that boundary can be at least some‐ what confident of the motives of anything else inside that boundary, but requires verification before trusting anything outside of the boundary. The idea is that if an attacker gets into one part of the trust boundary, it’s reasonable to assume they’ll eventually have complete control over everything in it, so getting through each trust boundary should take some effort. Note that I drew multiple web servers inside the same trust boundary; that means it’s okay for these web servers to trust each other, and if someone has access to one, they effectively have access to all. Or, to put it another way, if someone compromises one of these web servers, no further damage will be done by having them all compromised. In this context, zero trust principles lead us to reduce these trust boundaries to the smallest reasonable size—for example, a single component, which here might be an individual server or a cluster of servers with the same data and purpose. 6 | Chapter 1: Principles and Concepts Figure 1-5. Component trust boundaries 6. To some extent, we trust our entire system more than the rest of the world, so draw a dotted line around all of the boxes, including the admin, but not the user (Figure 1-6). Note that if you have multiple admins, like a web server admin and a database admin, they might be in different trust boundaries. The fact that there are trust boundaries inside of trust boundaries shows the different levels of trust. For example, the servers here may be willing to accept network connections from servers in other trust boundaries inside the application, but still verify their iden‐ tities. On the other hand, they may not be willing to accept connections from sys‐ tems outside of the whole application trust boundary. Figure 1-6. Whole application trust boundary We’ll use this diagram of an example application throughout the book when discus‐ sing the shared responsibility model, asset inventory, controls, and monitoring. Right now, there are no cloud-specific controls shown in the diagram, but that will change as we progress through the chapters. Look at any place a line crosses a trust bound‐ ary. These are the places we need to focus on securing first! Threat Actors, Diagrams, and Trust Boundaries | 7 Cloud Service Delivery Models There is an unwritten law that no book on cloud computing is complete without an overview of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Soft‐ ware as a Service (SaaS). Rather than give the standard overview, I’d like to quickly say that IaaS services typically allow you to create virtual computers, storage, and net‐ works; PaaS services are typically higher-level services, such as databases, that enable you to build applications; and SaaS services are applications used by end users. You can find many expanded definitions and subdivisions of these categories, but these are the core definitions. These service models are useful only for a general understanding of concepts; in par‐ ticular, the line between IaaS and PaaS is becoming increasingly blurred. Is a content delivery network service that caches information for you around the internet to keep it close to users a PaaS or an IaaS? It doesn’t really matter. What’s important is that you understand what is (and isn’t!) provided by the service, not whether it fits neatly into any particular category. The Cloud Shared Responsibility Model The most basic security question you must answer is, “What aspects of security am I responsible for?” This is often answered implicitly in an on-premises environment. The development organization is responsible for code errors, and the operations organization (IT) is responsible for everything else. Many organizations now run a DevOps model where those responsibilities are shared, and team boundaries between development and operations are blurred or nonexistent. Regardless of how it’s organ‐ ized, almost all security responsibility is inside the company. Perhaps one of the most jarring changes when moving from an on-premises environ‐ ment to a cloud environment is a more complicated shared responsibility model for security. In an on-premises environment, you may have had some sort of internal document of understanding or contract with IT or some other department that ran servers for you. However, in many cases business users of IT were used to handing the requirements or code to an internal provider and having everything else done for them, particularly in the realm of security. Even if you’ve been operating in a cloud environment for a while, you may not have stopped to think about where the cloud provider’s responsibility ends and where yours begins. This line of demarcation is different depending on the types of cloud services you’re purchasing. Almost all cloud providers address this in some way in their documentation and training materials, but the best way to explain it is to use the analogy of eating pizza. 8 | Chapter 1: Principles and Concepts With Pizza as a Service,4 let’s say you’re hungry for pizza. There are a lot of choices! You could just make a pizza at home, although you’d need to have quite a few ingredi‐ ents and it would take a while. You could run to the grocery store and grab a take and bake; that only requires you to have an oven and a place to eat it. You could call your favorite pizza delivery place. Or, you could just go sit down at a restaurant and order a pizza. If we draw a diagram of the various components and who’s responsible for them, we get something like Figure 1-7. Figure 1-7. Pizza as a Service The traditional on-premises world is like making a pizza at home. You have to buy a lot of different components and put them together yourself, but you get complete flexibility. Anchovies and cinnamon on wheat crust? If you can stomach it, you can make it. When you use Infrastructure as a Service, though, the base layer is already done for you. You can bake it to taste and add a salad and drinks, and you’re responsible for those things. When you move up to Platform as a Service, even more decisions are already made for you, including how your pizza is baked. (As mentioned in the previ‐ ous section, sometimes it can be difficult to categorize a service as IaaS or PaaS, and they’re growing together in many cases. The exact classification isn’t important; what’s important is that you understand what the service provides and what your responsi‐ bilities are.) 4 Original concept from Albert Barron’s 2014 LinkedIn article, “Pizza as a Service”. The Cloud Shared Responsibility Model | 9 When you get to Software as a Service (compared to dining out in Figure 1-7), it seems like everything is done for you. It’s not, though. You still have a responsibility to eat safely, and the restaurant is not responsible if you choke on your food. In the SaaS world, this largely comes down to managing access control properly. If we draw the diagram but focus on technology instead of pizza, it looks more like Figure 1-8. Figure 1-8. Cloud shared responsibility model The reality of cloud computing is unfortunately a little more complicated than eating pizza, so there are some gray areas. At the bottom of the diagram, things are concrete (often literally). The cloud provider has complete responsibility for physical infra‐ structure security—which often involves controls beyond what many companies can reasonably do on-premises, such as biometric access with anti-tailgating measures, security guards, slab-to-slab barriers, and similar controls to keep unauthorized per‐ sonnel out of the physical facilities. Likewise, if the provider offers virtualized environments, the virtualized infrastruc‐ ture security controls keeping your virtual environment separate from other virtual environments are the provider’s responsibility. When the Spectre and Meltdown vul‐ nerabilities came to light in early 2018, one of the potential effects was that users in one virtual machine could read the memory of another virtual machine on the same physical computer. For IaaS customers, fixing that part of the vulnerability was the responsibility of the cloud provider—Amazon, Microsoft, Google, and IBM all had to make updates to their hypervisors, for example—but fixing the vulnerabilities within the operating system was the customer’s responsibility. 10 | Chapter 1: Principles and Concepts Network security is shown as a shared responsibility in the IaaS section of Figure 1-8. Why? It’s hard to show on a diagram, but there are several layers of networking, and the responsibility for each lies with a different party. The cloud provider has its own network that is its responsibility, but there is usually a virtual network on top (for example, some cloud providers offer a virtual private cloud), and it’s the customer’s responsibility to carve this into reasonable security zones and put in the proper rules for access between them. Many implementations also use overlay networks, firewalls, and transport encryption that are the customer’s responsibility. This will be discussed in depth in Chapter 6. Operating system security is usually straightforward: it’s your responsibility if you’re using IaaS, and it’s the provider’s responsibility if you’re purchasing platform or soft‐ ware services. In general, if you’re purchasing those services, you have no access to the underlying operating system. (As a general rule of thumb, if you have the ability to break it, you usually have the responsibility for securing it!) Middleware, in this context, is a generic name for software such as databases, applica‐ tion servers, or queuing systems. They’re in the middle between the operating system and the application—not used directly by end users, but used to develop solutions for end users. If you’re using a PaaS, middleware security is often a shared responsibility; the provider might keep the software up to date (or make updates easily available to you), but you retain the responsibility for security-relevant settings such as encryption. The application layer is what the end user actually uses. If you’re using SaaS, vulnera‐ bilities at this layer (such as cross-site scripting or SQL injection) are the provider’s responsibility, but if you’re reading this book you’re probably not just using someone else’s SaaS. Even if all of the other layers have bulletproof security, a vulnerability at the application security layer can easily expose all of your information. Finally, data access security is almost always your responsibility as a customer. If you incorrectly tell your cloud provider to allow access to specific data, such as granting incorrect object storage permissions, middleware permissions, or SaaS permissions, there’s really not much the provider can do other than try to detect the problem and warn you. The root cause of many security incidents is an assumption that the cloud provider is handling something, when it turns out nobody was handling it. Many real-world examples of security incidents stemming from poor understanding of the shared responsibility model come from open Amazon Simple Storage Service (Amazon S3) buckets. Sure, S3 storage is secure and encrypted, but none of that helps if you don’t set your access controls properly. This misunderstanding has caused the loss of: The Cloud Shared Responsibility Model | 11 Data on 198 million US voters Auto-tracking company records Wireless customer records Over 3 million demographic survey records Over 50,000 Indian citizens’ credit reports Over 100,000 students’ grades and personal info Thousands of hours of audio and video recordings that contain private conversations There are many more examples. Although there has been considerable progress, the shared responsibility model is often still misunderstood. Many IT decision makers still believe that public cloud providers are responsible for securing not just the cloud services they offer, but also customer applications and data in the cloud. If you read your agreement with your cloud provider, you’ll find this just isn’t true! Risk Management Risk management is a deep subject, with entire books written about it. If you’re really interested in a deep dive, I recommend reading The Failure of Risk Management: Why It’s Broken and How to Fix It, by Douglas W. Hubbard (Wiley, 2020), and NIST Special Publication 800-30 Rev 1. In a nutshell, humans are really bad at assessing risk and figuring out what to do about it. This section is intended to give you just the barest essentials for managing the risk of security incidents and data breaches. At the risk of stating the obvious, a risk is something bad that could happen. In most risk management systems, the level of risk is based on a combination of how probable it is that the bad thing will happen (likelihood), and how bad the result will be if it does happen (impact). For example, something that’s very likely to happen (such as someone guessing your password of “1234”) and will be very bad if it does happen (such as you losing all of your customers’ files and paying large fines) would be a high risk. Something that’s very unlikely to happen (such as an asteroid wiping out two different regional data centers at once) but that would be very bad if it does happen (going out of business) might only be a low risk, depending on the system you use for deciding the level of risk.5 5 Risks can also interact, or aggregate. There may be two risks that each have relatively low likelihood and limi‐ ted impacts, but they may be likely to occur together, and the impacts can combine to be more severe. For example, the impact of either power line in a redundant pair going out may be negligible, but the impact of both going out may be really bad. This is often difficult to anticipate; the Atlanta airport power outage in 2017 is a good example. 12 | Chapter 1: Principles and Concepts In this book, I’ll talk about unknown risks (where we don’t have enough information to know what the likelihoods and impacts are) and known risks (where we at least know what we’re up against). Once you have an idea of the known risks, you can do one of four things with them: 1. Avoid the risk. In information security, this typically means you turn off the system—no more risk, but also none of the benefits you had from running the system in the first place. 2. Mitigate the risk. It’s still there, but you do additional things to lower either the likelihood that the bad thing will happen or the impact if it does happen. For example, you may choose to store less sensitive data so that if there is a breach, the impact won’t be as bad. 3. Transfer the risk. You pay someone else to manage things so that the risk is their problem. This is done a lot with the cloud, where you transfer many of the risks of managing the lower levels of the system to the cloud provider. 4. Accept the risk. After looking at the overall risk level and the benefits of continu‐ ing the activity, you may decide to write down that the risk exists, get all of your stakeholders to agree that it’s a risk, and then move on. Any of these actions may be reasonable. However, what’s not acceptable is to either have no idea what your risks are, or to have an idea of what the risks are and accept them without weighing the consequences or getting buy-in from your stakeholders. At a minimum, you should have a list somewhere in a spreadsheet or document that details the risks you know about, the actions taken, and any approvals needed. Conclusion Even though there are often no perfect answers in the real world, understanding some foundational concepts will help you make better choices in securing your cloud environments. Least privilege is basically just recognizing that giving privileged access to anything or anyone is a risk, and you don’t want to take more risks than necessary. It’s an art, of course, because there are sometimes trade-offs between risk and productivity, but the general principle is good—only give the minimum amount of privilege necessary. This is often overlooked for automation, but is arguably even more important there because many real-world attacks hinge upon fooling a system or automation into tak‐ ing unexpected actions. Defense in depth is recognizing that we’re not perfect, and the systems we design will not be perfect. It’s also a nod to the basic laws of probability—if you have two inde‐ pendent things that both have to fail for a bad thing to happen, it’s a lot less likely to happen. If you have to flip a coin and get tails twice in a row, your chances of that are Conclusion | 13 only 25%, compared to the 50% chance of getting tails on one coin flip. We aspire to have security controls that are much more effective than a coin toss, but the principle is the same. If you have two overlapping, independent controls that are 95% effective, then the combination of the two will be 99.75% effective! There are diminishing returns with this approach, however, so five or six layers in the same area is probably not a good use of resources. Threat modeling is the process of understanding who is likely to attack your system and why, and understanding the components of your system and how they work together. With those two pieces of information, you can look at your system through the eyes of potential attackers, and try to spot areas where the attackers may be able to do something undesirable. Then, for each of those areas, you can put obstacles (or, more formally, “controls” and “mitigations”) in place to thwart the attackers. In gen‐ eral, the most effective places to put mitigations are on trust boundaries, which are the places where one part of your system needs to trust another part. Understanding cloud delivery models can help you focus on the parts of the overall system that you’re responsible for, so that you don’t waste time trying to do your cloud provider’s job, and so that you don’t assume that your cloud provider is taking care of something that’s really your responsibility. While there are standardized terms for different cloud delivery models, such as IaaS, PaaS, and SaaS, some services don’t fit neatly into those buckets. They’re conceptually useful, though, and the most important thing is to understand where your provider’s responsibility ends and yours begins in the cloud shared responsibility model. In an on-premises world, the secu‐ rity of the entire system will often be the responsibility of a single organization within a company, whereas in cloud deployments, it’s almost always split among at least two different companies! Finally, while humans are pretty good at assessing risk in “is this predator going to eat me?” types of situations, we’re not naturally very good at it in more abstract situa‐ tions. Risk management is a discipline that makes us better at assessing risk and fig‐ uring out what to do about it. The easiest form of risk management is estimating the likelihood that something bad will happen and the impact of how bad it will be if it does happen, and then making decisions based on the combination of likelihood and impact. Risk management can lower our overall risk by letting us focus on the biggest risks first. Now that we have these concepts and principles in our tool kit, let’s put them to use in protecting the data and other assets in our cloud environments. 14 | Chapter 1: Principles and Concepts Exercises 1. Which of these are good examples of the principle of least privilege in action? Select all that apply. a. Having different levels of access within an application, with users only able to access the functions that they require for their work b. Requiring both a password and a second factor in order to log in c. Giving an inventory tool read-only access rather than read/write access d. Use of a tool such as sudo to allow a user to only execute certain commands 2. Which of these are good examples of the principle of defense in depth? Select all that apply. a. Encrypting valuable data, and also keeping people from reading the encrypted data unless they need to see it b. Having very strict firewall controls c. Ensuring that your trust boundaries are well defined d. Having multi-factor authentication 3. What are some common motivations for threat actors? Select all that apply. a. Stealing money b. Stealing secrets c. Disrupting your business d. Embarrassing you 4. Which of these items is always the cloud provider’s responsibility? a. Physical infrastructure security b. Network security c. Operating system security d. Data access security 5. What are the most important factors in assessing how severe a risk is? Select the two that apply. a. The chances, or likelihood, that an event will happen b. How bad the impact will be if an event happens c. Whether or not you can transfer the risk to someone else d. Whether the actions causing the risk are legal or illegal Exercises | 15 CHAPTER 2 Data Asset Management and Protection Now that Chapter 1 has given you some idea of where your cloud provider’s responsi‐ bility ends and yours begins, your first step to securing your cloud environment is to figure out where your data is—or is going to be—and how you’re going to protect it. There is often a lot of confusion about the term “asset management.” What exactly are our assets, and what do we need to do to manage them? The obvious (and unhelpful) answer is that assets are anything valuable that you have. Let’s start to home in on the details. In this book, I’ve broken up asset management into two parts: data asset management and cloud asset management. Data assets are the important information you have, such as customer names and addresses, credit card information, bank account infor‐ mation, or credentials to access such data. Cloud assets are the things you have that store and process your data—compute resources such as servers or containers, stor‐ age such as object stores or block storage, and platform instances such as databases or queues. Managing these assets is covered in the next chapter. While you can start with either data assets or cloud assets, and may need to go back and forth a bit to get a full picture, I find it easier to start with data assets. The theory of managing data assets in the cloud is no different than on-premises, but in practice there are some cloud technologies that can help. Data Identification and Classification If you’ve created at least a “back-of-the-napkin” diagram and threat model as described in the previous chapter, you’ll have some idea of what your important data is, as well as the threat actors you have to worry about and what they might be after. Let’s look at different ways threat actors might attack your data. 17 One of the more popular information security models is the CIA triad: confidential‐ ity, integrity, and availability. A threat actor trying to breach your data confidentiality wants to steal it, usually to sell it for money or embarrass you. A threat actor trying to breach your data integrity wants to change your data, such as by altering a bank bal‐ ance. (Note that this can be effective even if the attacker cannot read the bank balan‐ ces; I’d be happy to have my bank balance be a copy of Bill Gates’s, even if I don’t know what that value is.) A threat actor trying to breach your data availability wants to take you offline for fun or profit, or use ransomware to encrypt your files.1 Most of us have limited resources and must prioritize our efforts.2 A data classifica‐ tion system can assist with this, but resist the urge to make it more complicated than absolutely necessary. Example Data Classification Levels Every organization is different, but the following rules provide a good, simple starting point for assessing the value of your data, and therefore the risk of having it breached: Low or public While the information in this category may or may not be intended for public release, if it were released publicly the impact to the organization would be very low or negligible. Here are some examples: Your servers’ public IP addresses Application log data without any personal data, secrets, or value to attackers Software installation materials without any secrets or other items of value to attackers Moderate or private This information should not be disclosed outside of the organization without the proper nondisclosure agreements. In many cases (especially in larger organiza‐ tions) this type of data should be disclosed only on a need-to-know basis within the organization. In most organizations, the majority of information will fall into this category. Here are some examples: Detailed information on how your information systems are designed, which may be useful to an attacker Information on your personnel, which could provide information to attack‐ ers for phishing or pretexting attacks 1 Ransomware is both an availability and an integrity breach, because it uses unauthorized modifications of your data in order to make it unavailable. 2 If you have unlimited resources, please contact me! 18 | Chapter 2: Data Asset Management and Protection Routine financial information, such as purchase orders or travel reimburse‐ ments, which might be used, for example, to infer that an acquisition is likely High or confidential This information is vital to the organization, and disclosure could cause signifi‐ cant harm. Access to this data should be very tightly controlled, with multiple safeguards. In some organizations, this type of data is called the “crown jewels.” Here are some examples: Information about future strategy, or financial information that would pro‐ vide a significant advantage to competitors Trade secrets, such as the recipe for your popular soft drink or fried chicken Secrets that provide the “keys to the kingdom,” such as full access credentials to your cloud infrastructure Sensitive information placed into your hands for safekeeping, such as your customers’ financial data Any other information where a breach might be newsworthy Note that laws and industry rules may effectively dictate how you classify some infor‐ mation. For example, the European Union’s General Data Protection Regulation (GDPR) has many different requirements for handling personal data, so with this sys‐ tem you might choose to classify all personal data as “moderate” risk and protect it accordingly. Payment Card Industry Data Security Standard (PCI DSS) requirements would probably dictate that you classify cardholder data as “high” risk if you have it in your environment. Also, note that there are cloud services that can help with data classification and pro‐ tection. As examples, Amazon Macie can help you find sensitive data in Amazon S3 buckets, Google Cloud Sensitive Data Prevention can help you classify or mask cer‐ tain types of sensitive data, and Microsoft Purview can classify data on Azure cloud services. Whatever data classification system you use, write down a definition of each classifi‐ cation level and some examples of each, and make sure that everyone generating, col‐ lecting, or protecting data understands the classification system. Relevant Industry or Regulatory Requirements As mentioned in the preface, this is a book on security, not compliance. As a gross overgeneralization, compliance is about proving your security to a third party—and that’s much easier to accomplish if you have actually secured your systems and data. The information in this book will help you with being secure, but there will be addi‐ tional compliance work and documentation to complete after you’ve secured your systems. Data Identification and Classification | 19 That said, some compliance requirements may inform your security design. So, even at this early stage, it’s important to make note of a few industry or regulatory requirements: EU GDPR This regulation may apply to the personal data of any European Union or Euro‐ pean Economic Area citizen, regardless of where in the world the data is. The GDPR requires you to catalog, protect, and audit access to “any information relating to an identifiable person who can be directly or indirectly identified in particular by reference to an identifier.” The techniques in this chapter may help you meet some GDPR requirements, but you must make sure that you include relevant personal data as part of the data you’re protecting. US FISMA or FedRAMP The Federal Information Security Management Act is applied per agency, whereas Federal Risk and Authorization Management Program certification may be used with multiple agencies, but both require you to classify your data and systems in accordance with FIPS 199 and other US government standards. If you’re in an area where you may need one of these certifications, you should use the FIPS 199 classification levels. US ITAR If you are subject to International Traffic in Arms Regulations, in addition to your own controls, you will need to choose cloud services that support ITAR. Such services are available from some cloud providers and are managed only by US personnel. Global PCI DSS If you’re handling credit card information, the Payment Card Industry Data Security Standard dictates that there are specific controls that you have to put in place, and there are certain types of data you’re not allowed to store. US HIPAA If you’re in the US and dealing with any protected health information (PHI), the Health Insurance Portability and Accountability Act mandates that you include that information in your list and protect it, which often involves encryption. There are many other regulatory and industry requirements around the world, such as MTCS (Singapore), G-Cloud (UK), and IRAP (Australia). If you think you may be subject to any of these, review the types of data they are designed to protect so that you can ensure that you catalog and protect that data accordingly. 20 | Chapter 2: Data Asset Management and Protection Data Asset Management in the Cloud Most of the preceding information is good general practice and not specific to cloud environments. However, cloud providers are in a unique situation to help you iden‐ tify and classify your data. For starters, they will be able to tell you everywhere you are storing data, because they want to charge you for the storage! In addition, use of cloud services brings some level of standardization by design. In many cases, your persistent data in the cloud will be in one of the cloud services that store data, such as object storage, file storage, block storage, a cloud database, or a cloud message queue, rather than being spread across thousands of different disks attached to many different physical servers. Your cloud provider gives you the tools to inventory these storage locations, as well as to access them (in a carefully controlled manner) to determine what types of data are stored there. There are also cloud services that will look at all of your storage loca‐ tions and automatically attempt to classify where your important data is. You can then use this information to tag your cloud assets that store data. When you’re identifying your important data, don’t forget about passwords, API keys, and other secrets that can be used to read or modify that data! We’ll talk about the best way to secure secrets in Chapter 4, but first you need to know exactly where they are. If we look at our sample application that we diagrammed in Chapter 1, there’s obvi‐ ously customer data in the database. However, where else do you have important assets? Here are some things to consider: The web servers have log data that may be used to identify your customers. Your web server has a private key for a Transport Layer Security (TLS) certificate; with that and a little Domain Name System (DNS) or Border Gateway Protocol (BGP) hijacking, anyone could pretend to be your site and steal your customers’ passwords (and some types of second factors) as they try to log in. Do you keep a list of password hashes to verify your customers? Hopefully you’re using some sort of federated ID system, as described in Chapter 4, but if not, the password hashes are a nice target for attackers.3 3 Remember LinkedIn’s 6.5 million password hashes that were stolen, cracked offline, and then used to com‐ promise other accounts where users reused their LinkedIn password? This has happened many times, and sites like have i been pwned can tell you about all of the breaches that may contain your email or password data. Data Asset Management in the Cloud | 21 Your application server needs a password or API key to access the database. With this password, an attacker could read or modify everything in the database that the application can. Even in this really simple application, there are a lot of nonobvious things you need to protect. Figure 2-1 repeats Figure 1-6 from the previous chapter, adding the data assets in the boxes. Figure 2-1. Sample application diagram with data assets Tagging Cloud Resources Most cloud providers, as well as container management systems such as Kubernetes, have the concept of tags. A tag is usually a combination of a name (or “key”) and a value. These tags can be used for lots of purposes, from categorizing resources in an inventory, to making access decisions, to choosing what to alert on. For example, you might have a key of PII-data and a value of yes for anything that contains personally identifiable information, or you might use a key of datatype and a value of PII. The problem is clear: if everyone in your organization uses different tags, they won’t be very useful! Have a policy to use tags. To support this policy, create a tag standard with a list of tags and explanations for when they must be used, use these same tags across multiple cloud providers, and require them to be applied by any automated tools that create cloud resources. In smaller organizations, a simple tag standard will probably suffice. In larger organi‐ zations, this tag standard should probably be treated as a versioned, backward- compatible standard with an assigned owner and periodic reviews. Some tags will likely be organization-wide and some specific to subsections of the organization. Even if one of your cloud providers doesn’t explicitly support the use of tags, there are 22 | Chapter 2: Data Asset Management and Protection often other description fields that may be used to hold tags in easy-to-parse formats such as JSON. Tags rarely cause any harm, so use them liberally; if you don’t need them, they’re easily ignored. Tags are free to use, so there’s rarely any technical concern with creating a lot of them, but you should be careful not to make the tag standard so complicated that it confu‐ ses the humans who have to write rules for applying and consuming tag data. In addi‐ tion, cloud providers do impose some limits on how many tags a particular resource can have (usually between 15 and 64 tags per resource). Some cloud providers even offer automation to check whether tags are properly applied to resources, so that you can catch untagged or mistagged resources early and correct them. For example, if you have a rule that every asset must be tagged with the maximum data classification allowed on that asset, then you can run automated scans to find any resources where the tag is missing or where the value isn’t one of the clas‐ sification levels you have decided upon. Table 2-1 shows the different names given to tagging by different cloud providers. Kubernetes, which may run on-premises or on any IaaS provider, uses the term “labels.” Table 2-1. Tagging features Infrastructure Feature name Amazon Web Services Tags Microsoft Azure Tags Google Compute Platform Labels and network tags IBM Cloud Tags We will talk more about tagging resources in Chapter 3, but for now, jot down some data-related tags that may apply to your different cloud resources, such as data class:low, dataclass:moderate, dataclass:high, or regulatory:gdpr. Protecting Data in the Cloud Several of the data protection techniques discussed in this section may also be applied on-premises, but many cloud providers give you easy, standardized, and less expen‐ sive ways to protect your data. Tokenization Why store the data when you can store something that functions similarly to the data, but is useless to an attacker? Tokenization, which is most often used with credit card numbers, replaces a piece of sensitive data with a token (usually randomly generated). Protecting Data in the Cloud | 23 It has the benefit that the token generally has the same characteristics (such as being 16 digits long) as the original data, so underlying systems that are built to take that data don’t need to be modified. Only one place (a “token service”) knows the actual sensitive data. Tokenization can be used on its own or in conjunction with encryp‐ tion, discussed next. Examples include cloud services that work with your browser to tokenize sensitive data before sending it, and cloud services that sit in between the browser and the application to tokenize sensitive data before it reaches the application. Encryption Encryption is the silver bullet of the data protection world; we want to “encrypt all the things.” Unfortunately, it’s a little more complicated than that. There are three types of data you might need to encrypt: Data in motion (being transmitted across a network) Confidential computing, or data in use (currently being processed in a comput‐ er’s CPU or held in RAM) Data at rest (on persistent storage, such as a disk) Encryption of data in motion is discussed in detail in Chapter 6. In this section, we’ll discuss the other two uses of encryption. More bits are not always necessary or useful once you get to a cer‐ tain point; encryption is often broken due to a flaw in the imple‐ mentation rather than brute force. In addition, there’s often a performance trade-off with using a cipher algorithm with more bits, particularly if you use something without hardware accelera‐ tion. If you don’t want to make a deep study of it, it’s usually safe to adopt the same cipher requirements as large private and govern‐ mental organizations that have studied the subject extensively. Confidential computing Encryption of data in use is now available from several cloud providers, and is typi‐ cally marketed to organizations with very sensitive data under the name confidential computing. Because it changes the way the processor accesses memory, it requires support in the hardware platform, and then the feature must be exposed by the cloud provider. The most common cloud implementation is to encrypt process or virtual machine memory so that even a privileged user (or an attacker or malware impersonating a privileged user) cannot read it, and the processor can read it only when executing 24 | Chapter 2: Data Asset Management and Protection code for a specific process or virtual machine.4 If you are in a very high-security envi‐ ronment and your threat model includes protecting data in memory from a privi‐ leged user, or you want additional isolation between you and another tenant in the cloud, you should seek out a platform that supports memory encryption. This often goes by hardware-specific brand names such as Intel SGX/TGX, AMD SEV, and IBM Z Pervasive Encryption. Encryption of data at rest Encryption of data at rest can be the most complicated to implement correctly. The problem is not in encrypting the data; there are many libraries to do this. The prob‐ lem is that once you’ve encrypted the data, you now have an encryption key that can be used to access it. Where do many people put this? In the clear, right next to the data! Imagine locking a door and then hanging the key on a hook next to it helpfully labeled “key.” To have real security (instead of just ticking a checkbox indicating that you’ve encrypted data), you must have proper key management. Fortunately, there are cloud services to help. Encrypted data can’t be effectively compressed or deduplicated. If you want to make use of compression or deduplication, do that before encrypting it. In traditional on-premises environments with high security requirements, you would purchase a hardware security module (HSM) to hold your encryption keys, usually in the form of an expansion card or a module accessed over the network. An HSM has significant logical and physical protections against unauthorized access. With most systems, anyone with physical access can try to tamper with it, but an HSM has sen‐ sors to wipe out the data as soon as someone tries to take it apart, scan it with X-rays, fiddle with its power source, or look threateningly in its general direction. HSMs are expensive, and so are not feasible for most on-premises deployments. However, in cloud environments, advanced technologies such as HSMs and encryp‐ tion key management systems are now within reach of projects with modest budgets. Some cloud providers have an option to rent a dedicated HSM for your environment. While this may be required for the highest-security environments, a dedicated HSM is still expensive in a cloud environment, and is often harder to spin up automatically. Another good option is a key management service (KMS), which is run by the cloud provider and usually uses an HSM on the backend to keep keys safe. A KMS is 4 Note that in-memory encryption protects data only from attacks from outside the process; if you manage to trick the process itself into doing something it shouldn’t, it can read the memory and divulge the data. Protecting Data in the Cloud | 25 usually a multitenant service, which is a slightly larger attack surface, and you do have to trust both the HSM and the KMS (instead of just the HSM), which adds a little additional risk. However, compared to performing your own key management— often incorrectly—a KMS provides excellent security at a very low cost. You can have the benefits of proper key management in projects with more modest security budgets. Table 2-2 lists the key management options offered by the major cloud providers, as of this writing. Table 2-2. Key management options Provider Dedicated HSM option Key management service Amazon Web Services Cloud HSM Amazon KMS Microsoft Azure Azure Dedicated HSM Key Vault Google Compute Platform Cloud HSM Cloud KMS IBM Cloud Cloud HSM Key Protect So, how do you actually use a KMS correctly? This is where things get a little complicated. Key management. The simplest approach to key management is to generate a key, encrypt the data with that key, stuff the key into the KMS, and then write the encryp‐ ted data to disk along with a note indicating which key was used to encrypt it. There are two main problems with this approach: 1. It puts a lot of load on the KMS. There are good reasons for wanting a different key for every file, so a KMS with a lot of customers would have to store billions or trillions of keys with near instantaneous retrieval. 2. If you want to securely erase the data, you have to trust the KMS to irrevocably erase the key when you’re done with it, and not leave any backup copies lying around. Alternatively, you have to overwrite all of the encrypted data,5 which can take a while. You may not want to wait hours or days for your data to be overwritten. It’s better if you have the option to quickly and securely erase data objects in two ways: either by deleting a key at the KMS, which may effectively erase a lot of different objects at once; or by deleting a key where the data is actually stored, to delete a single data 5 Despite the findings of a well-known USENIX paper from 1996 by Peter Gutmann exploring the ability to recover data on a hard disk that’s been overwritten, it’s not practical today. Recovering overwritten data from solid state drives (SSDs) is slightly more practical due to the way writes happen, but most SSDs have a “secure erase” feature to sanitize the entire drive; see Michael Wei et al.’s 2011 USENIX paper for more details. 26 | Chapter 2: Data Asset Management and Protection object. For these reasons, you typically have two levels of keys: a key encryption key (KEK) and a data encryption key (DEK). As the names suggest, the key encryption key is used to encrypt (or “wrap”) data encryption keys, and the wrapped keys are stored right next to the data. The key encryption key usually stays in the KMS and never comes out, for safety. The wrapped data encryption keys are sent to the HSM for unwrapping when needed, and then the unwrapped keys are used to encrypt or decrypt the data. You never write down the unwrapped keys. When you’re done with the current encryption or decryption operation, you forget about them.6 The use of keys is easier to understand with a real-world analogy. Imagine you are selling your house (which contains all of your data), and you provide a key to your realtor to unlock your door. This house key is like a data encryption key; it can be used to directly access your house (data). The realtor will place this key into a key box on your door, and protect it with a code provided by the realtor service. This code is like the key encryption key, and the realtor service that hands out codes is like the key management service. In this mildly strained analogy, you actually take the key box to the KMS, and it gives you a copy of the key inside with the agreement that you won’t make a copy of it (write it to disk) and you’ll melt (forget) that copy when finished with it. You never actually see the code that opens the box. The end result is that when you walk up to the house (data), you know the data key’s right there, but it can’t be opened without another key or password. Of course, in the real world, a hammer and a little time would get the key out of the box, or would allow you to break a window and not need the key. The cryptographic equivalent of the hammer is guessing the key or password used to protect the data key. This is usu‐ ally done by trying all of the possibilities (brute force) or, for keys based on pass‐ words, trying many common passwords (a dictionary attack). If the encryption algorithm and the implementation of that algorithm are correct, the expected time for the “hammer” to get into the box is considerably longer than the expected lifetime of the universe. Server-side and client-side encryption. The great news is that you usually don’t have to do most of this key management yourself! For most cloud providers, if you’re using their storage and their KMS, and you turn on KMS encryption for your storage instances, the storage service will automatically create data encryption keys, wrap them using a key encryption key that you can manage in the KMS, and store the wrapped keys along with the data. You can still manage the keys in the KMS, but you don’t have to ask the KMS to wrap or unwrap them, and you don’t have to perform the encryption or decryption operations yourself. Some providers call this server-side encryption. 6 This is an extremely simplified explanation. For a really deep discussion of all things cryptographic, see Bruce Schneier’s book Applied Cryptography, 2nd ed. (Wiley, 1996). Protecting Data in the Cloud | 27 Because the multitenant storage service does have the ability to decrypt your data, an error in that storage service could potentially allow an unauthorized user to ask the storage service to decrypt your data. For this reason, having the storage service per‐ form the encryption/decryption is not quite as secure as doing the decryption in your own instance—if you implement it correctly, using well-known libraries and pro‐ cesses. Doing the encryption and decryption in your own application is often called client-side encryption. However, unless you have a very low risk tolerance (and a budget to match that low risk tolerance), I recommend that you use well-tested cloud services and allow them to handle the encryption/decryption for you. Note that when using client-side encryption, the server does not have the ability to read the encrypted data because it doesn’t have the keys. This means no server-side searches, calculation, indexing, malware scans, or other high-value tasks can be per‐ formed. Homomorphic encryption may make it feasible for operations such as addi‐ tion to be performed correctly on encrypted data without decrypting the data, but as of this writing it’s too slow to be practical. Unless you have devoted most of your distinguished career to cryp‐ tography, do not attempt to create or implement your own crypto systems. Even when performing the encryption/decryption in your own application, use only well-tested and supported library imple‐ mentations of secure algorithms. If your organization doesn’t have a list of approved cryptographic algorithms, a good source for recommended algorithms is NIST SP 800-131A. Cryptographic erasure. It’s actually difficult to reliably destroy large amounts of data.7 It takes a long time to overwrite the data completely, and even then there may be other copies sitting around. We can solve this through cryptographic erasure. With this approach, rather than storing clear-text data on the disk, we store only an encrypted version. Then, when we want to make data unrecoverable, we can wipe or revoke access to the key encryption key in the KMS, which will make all of the data encryption keys “wrapped” with that key encryption key useless, wherever they are in the world. We can also wipe a specific piece of data by wiping out just its wrapped data encryption key, so a multiterabyte file can be effectively made unrecoverable by overwriting a 256-bit key. 7 Although paradoxically, it’s often easy to do by accident! 28 | Chapter 2: Data Asset Management and Protection How encryption foils different types of attac

Practical Cloud Security (2023, 2nd Edition) PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue