Mastering Cloud Computing Foundations and Applications Programming PDF
Document Details
Uploaded by Deleted User
2013
Rajkumar Buyya, Christian Vecchiola, S. Thamarai Selvi
Tags
Summary
This textbook, Mastering Cloud Computing: Foundations and Applications Programming (2013), details the fundamentals of cloud computing, highlighting its principles, virtualization technologies, and reference model. It provides a framework for understanding cloud computing architectures and delves into application development and infrastructure aspects within a cloud environment.
Full Transcript
Mastering Cloud Computing This page intentionally left blank Mastering Cloud Computing Foundations and Applications Programming Rajkumar Buyya The University of Melbourne and Man...
Mastering Cloud Computing This page intentionally left blank Mastering Cloud Computing Foundations and Applications Programming Rajkumar Buyya The University of Melbourne and Manjrasoft Pty Ltd, Australia Christian Vecchiola The University of Melbourne and IBM Research, Australia S. Thamarai Selvi Madras Institute of Technology, Anna University, Chennai, India AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann is an imprint of Elsevier Acquiring Editor: Todd Green Editorial Project Manager: Lindsay Lawrence Project Manager: Punithavathy Govindaradjane Designer: Matthew Limbert Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA Copyright r 2013 Elsevier Inc. All rights reserved No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Application submitted British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-411454-8 Printed and bound in the United States of America 13 14 15 16 17 10 9 8 7 6 5 4 3 2 1 For information on all MK publications visit our website at www.mkp.com Contents Acknowledgments................................................................................................................................ xi Preface................................................................................................................................................ xiii PART 1 FOUNDATIONS CHAPTER 1 Introduction.......................................................................................... 3 1.1 Cloud computing at a glance...................................................................................... 3 1.1.1 The vision of cloud computing......................................................................... 5 1.1.2 Defining a cloud................................................................................................ 7 1.1.3 A closer look..................................................................................................... 9 1.1.4 The cloud computing reference model........................................................... 11 1.1.5 Characteristics and benefits............................................................................ 13 1.1.6 Challenges ahead............................................................................................. 14 1.2 Historical developments........................................................................................... 15 1.2.1 Distributed systems......................................................................................... 15 1.2.2 Virtualization................................................................................................... 18 1.2.3 Web 2.0........................................................................................................... 19 1.2.4 Service-oriented computing............................................................................ 20 1.2.5 Utility-oriented computing.............................................................................. 21 1.3 Building cloud computing environments................................................................. 22 1.3.1 Application development................................................................................ 22 1.3.2 Infrastructure and system development.......................................................... 23 1.3.3 Computing platforms and technologies.......................................................... 24 Summary.......................................................................................................................... 26 Review questions............................................................................................................. 27 CHAPTER 2 Principles of Parallel and Distributed Computing............................ 29 2.1 Eras of computing..................................................................................................... 29 2.2 Parallel vs. distributed computing............................................................................ 29 2.3 Elements of parallel computing............................................................................... 31 2.3.1 What is parallel processing?........................................................................... 31 2.3.2 Hardware architectures for parallel processing.............................................. 32 2.3.3 Approaches to parallel programming............................................................. 36 2.3.4 Levels of parallelism....................................................................................... 36 2.3.5 Laws of caution............................................................................................... 37 v vi Contents 2.4 Elements of distributed computing.......................................................................... 39 2.4.1 General concepts and definitions.................................................................... 39 2.4.2 Components of a distributed system............................................................... 39 2.4.3 Architectural styles for distributed computing............................................... 41 2.4.4 Models for interprocess communication......................................................... 51 2.5 Technologies for distributed computing.................................................................. 54 2.5.1 Remote procedure call.................................................................................... 54 2.5.2 Distributed object frameworks........................................................................ 56 2.5.3 Service-oriented computing............................................................................ 61 Summary.......................................................................................................................... 69 Review questions............................................................................................................. 70 CHAPTER 3 Virtualization...................................................................................... 71 3.1 Introduction............................................................................................................... 71 3.2 Characteristics of virtualized environments............................................................. 73 3.2.1 Increased security............................................................................................ 74 3.2.2 Managed execution......................................................................................... 75 3.2.3 Portability........................................................................................................ 77 3.3 Taxonomy of virtualization techniques.................................................................... 77 3.3.1 Execution virtualization.................................................................................. 77 3.3.2 Other types of virtualization........................................................................... 89 3.4 Virtualization and cloud computing......................................................................... 91 3.5 Pros and cons of virtualization................................................................................. 93 3.5.1 Advantages of virtualization........................................................................... 93 3.5.2 The other side of the coin: disadvantages...................................................... 94 3.6 Technology examples............................................................................................... 95 3.6.1 Xen: paravirtualization.................................................................................... 96 3.6.2 VMware: full virtualization............................................................................ 97 3.6.3 Microsoft Hyper-V........................................................................................ 104 Summary........................................................................................................................ 109 Review questions........................................................................................................... 109 CHAPTER 4 Cloud Computing Architecture........................................................ 111 4.1 Introduction............................................................................................................. 111 4.2 The cloud reference model..................................................................................... 112 4.2.1 Architecture................................................................................................... 112 4.2.2 Infrastructure- and hardware-as-a-service.................................................... 114 Contents vii 4.2.3 Platform as a service..................................................................................... 117 4.2.4 Software as a service..................................................................................... 121 4.3 Types of clouds....................................................................................................... 124 4.3.1 Public clouds................................................................................................. 125 4.3.2 Private clouds................................................................................................ 126 4.3.3 Hybrid clouds................................................................................................ 128 4.3.4 Community clouds........................................................................................ 131 4.4 Economics of the cloud.......................................................................................... 133 4.5 Open challenges...................................................................................................... 135 4.5.1 Cloud definition............................................................................................. 135 4.5.2 Cloud interoperability and standards............................................................ 136 4.5.3 Scalability and fault tolerance...................................................................... 137 4.5.4 Security, trust, and privacy........................................................................... 138 4.5.5 Organizational aspects................................................................................... 138 Summary........................................................................................................................ 139 Review questions........................................................................................................... 139 PART 2 CLOUD APPLICATION PROGRAMMING AND THE ANEKA PLATFORM CHAPTER 5 Aneka................................................................................................ 143 5.1 Framework overview.............................................................................................. 143 5.2 Anatomy of the Aneka container........................................................................... 146 5.2.1 From the ground up: the platform abstraction layer.................................... 147 5.2.2 Fabric services............................................................................................... 147 5.2.3 Foundation services....................................................................................... 150 5.2.4 Application services...................................................................................... 153 5.3 Building Aneka clouds........................................................................................... 155 5.3.1 Infrastructure organization............................................................................ 155 5.3.2 Logical organization...................................................................................... 155 5.3.3 Private cloud deployment mode................................................................... 158 5.3.4 Public cloud deployment mode..................................................................... 158 5.3.5 Hybrid cloud deployment mode................................................................... 160 5.4 Cloud programming and management................................................................... 162 5.4.1 Aneka SDK.................................................................................................... 162 5.4.2 Management tools......................................................................................... 167 Summary........................................................................................................................ 168 Review questions........................................................................................................... 168 viii Contents CHAPTER 6 Concurrent Computing..................................................................... 171 6.1 Introducing parallelism for single-machine computation...................................... 171 6.2 Programming applications with threads................................................................. 173 6.2.1 What is a thread?........................................................................................... 174 6.2.2 Thread APIs................................................................................................... 174 6.2.3 Techniques for parallel computation with threads....................................... 177 6.3 Multithreading with Aneka.................................................................................... 189 6.3.1 Introducing the thread programming model................................................. 190 6.3.2 Aneka thread vs. common threads................................................................ 191 6.4 Programming applications with Aneka threads..................................................... 195 6.4.1 Aneka threads application model.................................................................. 195 6.4.2 Domain decomposition: matrix multiplication............................................. 196 6.4.3 Functional decomposition: Sine, Cosine, and Tangent................................ 203 Summary........................................................................................................................ 203 Review questions........................................................................................................... 210 CHAPTER 7 High-Throughput Computing............................................................ 211 7.1 Task computing...................................................................................................... 211 7.1.1 Characterizing a task..................................................................................... 212 7.1.2 Computing categories.................................................................................... 213 7.1.3 Frameworks for task computing................................................................... 214 7.2 Task-based application models.............................................................................. 216 7.2.1 Embarrassingly parallel applications............................................................ 216 7.2.2 Parameter sweep applications....................................................................... 217 7.2.3 MPI applications........................................................................................... 218 7.2.4 Workflow applications with task dependencies........................................... 222 7.3 Aneka task-based programming............................................................................. 225 7.3.1 Task programming model............................................................................. 226 7.3.2 Developing applications with the task model............................................... 227 7.3.3 Developing a parameter sweep application.................................................. 243 7.3.4 Managing workflows..................................................................................... 248 Summary........................................................................................................................ 250 Review questions........................................................................................................... 251 CHAPTER 8 Data-Intensive Computing................................................................ 253 8.1 What is data-intensive computing?........................................................................ 253 8.1.1 Characterizing data-intensive computations................................................. 254 Contents ix 8.1.2 Challenges ahead........................................................................................... 254 8.1.3 Historical perspective.................................................................................... 255 8.2 Technologies for data-intensive computing........................................................... 260 8.2.1 Storage systems............................................................................................. 260 8.2.2 Programming platforms................................................................................. 268 8.3 Aneka MapReduce programming.......................................................................... 276 8.3.1 Introducing the MapReduce programming model....................................... 276 8.3.2 Example application...................................................................................... 293 Summary........................................................................................................................ 309 Review questions........................................................................................................... 310 PART 3 INDUSTRIAL PLATFORMS AND NEW DEVELOPMENTS CHAPTER 9 Cloud Platforms in Industry............................................................. 315 9.1 Amazon web services............................................................................................. 315 9.1.1 Compute services.......................................................................................... 316 9.1.2 Storage services............................................................................................. 321 9.1.3 Communication services............................................................................... 329 9.1.4 Additional services........................................................................................ 332 9.2 Google AppEngine................................................................................................. 332 9.2.1 Architecture and core concepts..................................................................... 333 9.2.2 Application life cycle.................................................................................... 338 9.2.3 Cost model..................................................................................................... 340 9.2.4 Observations.................................................................................................. 341 9.3 Microsoft Azure...................................................................................................... 341 9.3.1 Azure core concepts...................................................................................... 342 9.3.2 SQL Azure..................................................................................................... 347 9.3.3 Windows Azure platform appliance............................................................. 349 9.3.4 Observations.................................................................................................. 349 Summary........................................................................................................................ 350 Review questions........................................................................................................... 351 CHAPTER 10 Cloud Applications......................................................................... 353 10.1 Scientific applications.......................................................................................... 353 10.1.1 Healthcare: ECG analysis in the cloud.................................................... 353 10.1.2 Biology: protein structure prediction....................................................... 355 10.1.3 Biology: gene expression data analysis for cancer diagnosis................. 357 10.1.4 Geoscience: satellite image processing.................................................... 358 x Contents 10.2 Business and consumer applications.................................................................... 358 10.2.1 CRM and ERP........................................................................................... 359 10.2.2 Productivity............................................................................................... 362 10.2.3 Social networking...................................................................................... 365 10.2.4 Media applications.................................................................................... 366 10.2.5 Multiplayer online gaming........................................................................ 369 Summary........................................................................................................................ 370 Review questions........................................................................................................... 371 CHAPTER 11 Advanced Topics in Cloud Computing.......................................... 373 11.1 Energy efficiency in clouds................................................................................. 373 11.1.1 Energy-efficient and green cloud computing architecture....................... 375 11.2 Market-based management of clouds.................................................................. 377 11.2.1 Market-oriented cloud computing............................................................. 378 11.2.2 A reference model for MOCC.................................................................. 379 11.2.3 Technologies and initiatives supporting MOCC...................................... 384 11.2.4 Observations.............................................................................................. 389 11.3 Federated clouds/InterCloud................................................................................ 390 11.3.1 Characterization and definition................................................................. 391 11.3.2 Cloud federation stack.............................................................................. 392 11.3.3 Aspects of interest..................................................................................... 399 11.3.4 Technologies for cloud federations........................................................... 417 11.3.5 Observations.............................................................................................. 422 11.4 Third-party cloud services................................................................................... 422 11.4.1 MetaCDN.................................................................................................. 423 11.4.2 SpotCloud.................................................................................................. 425 Summary........................................................................................................................ 425 Review questions........................................................................................................... 427 References.......................................................................................................................................... 429 Index.................................................................................................................................................. 439 Acknowledgments First and foremost, we are grateful to all researchers and industrial developers worldwide for their con- tributions to various concepts and technologies discussed in this book. Our special thanks to all the members and consultants of Manjrasoft, the Cloud Computing and Distributed Systems (CLOUDS) Lab of the University of Melbourne, and Melbourne Ventures, who contributed to the development of the Aneka Cloud Application Platform, the preparation of associated application demonstrators and documents, and/or the commercialization of the Aneka technology. They include Chu Xingchen, Srikumar Venugopal, Krishna Nadiminti, Christian Vecchiola, Dileban Karunamoorthy, Chao Jin, Rodrigo Calheiros, Michael Mattess, Jessie Wei, Enayat Masoumi, Ivan Mellado, Richard Day, Wolfgang Gentzsch, Laurence Liew, David Sinclair, Suraj Pandey, Abhi Shekar, Dexter Duncan, Murali Sathya, Karthik Sukumar, Ravi Kumar Challa, and Sita Venkatraman. We thank the Australian Research Council (ARC) and the Department of Innovation, Industry, Science, and Research (DIISR) for supporting our research and commercialization endeavors. We thank all of our colleagues at the University of Melbourne, especially Professors Rao Kotagiri, Iven Mareels, and Glyn Davis, for their mentorship and positive support for our research and our efforts to impart the knowledge we have gained. We thank all colleagues and users of the Aneka technology for their direct and indirect contri- butions to application case studies reported in the book. Our special thanks to Raghavendra Kune from ADRIN/ISRO for his enthusiastic efforts in creating a satellite image-processing application using Aneka and publishing articles in this area. We thank Srinivasa Iyengar from MSRIT for cre- ating data-mining applications using Aneka and demonstrating the power of Aneka to academics from the early days of cloud computing. We thank the members of the CLOUDS Lab for proofreading one or more chapters. They include Rodrigo Calheiros, Nikolay Grozev, Amir Vahid, Anton Beloglazov, Adel Toosi, Deepak Poola, Mohammed AlRokayan, Atefeh Khosravi, Sareh Piraghaj, and Yaser Mansouri. We thank our family members, including Smrithi Buyya, Soumya Buyya, and Radha Buyya, for their love and understanding during the preparation of the book. We sincerely thank external reviewers commissioned by the publisher for their critical com- ments and suggestions on enhancing the presentation and organization of many chapters at a finer level. This has greatly helped us improve the quality of the book. Finally, we would like to thank the staff at Elsevier Inc for their enthusiastic support and guidance during the preparation of the book. In particular, we thank Todd Green for inspiring us to take up this project and for setting the process of publication in motion. The Elsevier staff were wonderful to work with! Professor Rajkumar Buyya The University of Melbourne and Manjrasoft Pty Ltd, Australia Dr. Christian Vecchiola The University of Melbourne and IBM Research, Australia Professor S. Thamarai Selvi Madras Institute of Technology, Anna University, Chennai, India xi This page intentionally left blank Preface The growing popularity of the Internet and the Web, along with the availability of powerful handheld computing, mobile, and sensing devices, are changing the way we interact, manage our lives, conduct business, and access or deliver services. The lowering costs of computation and com- munication are driving the focus from personal to datacenter-centric computing. Although parallel and distributed computing has been around for several years, its new forms, multicore and cloud computing, have brought about a sweeping change in the industry. These trends are pushing the industry focus from developing applications for PCs to cloud datacenters that enable millions of users to use software simultaneously. Computing is being transformed to a model consisting of commoditized services delivered in a manner similar to utilities such as water, electricity, gas, and telephony. As a result, information technology (IT) services are billed and delivered as “computing utilities” over shared delivery net- works, akin to water, electricity, gas, and telephony services delivery. In such a model, users access services based on their requirements, regardless of where those services are hosted. Several com- puting paradigms have promised to deliver this utility computing vision. Cloud computing is the most recent emerging paradigm promising to turn the vision of “computing utilities” into a reality. Cloud computing has become one of the buzzwords in the IT industry. Several IT vendors are promising to offer storage, computation, and application hosting services and to provide coverage on several continents, offering service-level agreements-backed performance and uptime promises for their services. They offer subscription-based access to infrastructure, platforms, and applications that are popularly termed Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS). These emerging services have reduced the cost of computation and application hosting by several orders of magnitude, but there is significant complexity involved in the development and delivery of applications and their services in a seamless, scalable, and reliable manner. There are several cloud technologies and platforms on the market—to mention a few: Google AppEngine, Microsoft Azure, and Manjrasoft Aneka. Google AppEngine provides an extensible runtime environment for Web-based applications that leverage the huge Google IT infrastructure. Microsoft Azure provides a wide array of Windows-based services for developing and deploying Windows applications on the cloud. Manjrasoft Aneka provides a flexible model for creating cloud applications and deploying them on a wide variety of infrastructures, including public clouds such as Amazon EC2. With this sweeping shift from developing applications on PCs to datacenters, there is a huge demand for manpower with new skill sets in cloud computing. Universities play an important role in this regard by training the next generation of IT professionals and equipping them with the nec- essary tools and knowledge to tackle these challenges. These institutions need to be able to set up a cloud computing environment for teaching and learning with minimal investment. One of the attractive cloud application platforms that meet this need is Manjrasoft’s Aneka, which (1) enables the construction of a private/enterprise cloud by harnessing the existing network of computers xiii xiv Preface (LAN-connected PCs), (2) provides a software development kit (SDK) that supports application programming interfaces (APIs) for multiple programming models such as Thread, Task, and MapReduce, and (3) supports, in a seamless manner, the deployment and execution of applications on diverse infrastructures such as multicore servers, private clouds, and public clouds. Currently, expert developers are required to create cloud applications and services. Cloud researchers, practitioners, and vendors alike are working to ensure that potential users are educated about the benefits of cloud computing and the best way to harness its full potential. However, because it’s a new and popular paradigm, the very definition of cloud computing depends on which computing expert is asked. So, although the realization of true utility computing appears closer than ever, its acceptance is currently restricted to cloud experts due to the perceived complexities of interacting with cloud computing providers. This book aims to change the game by simplifying and imparting cloud computing foundations, technologies, and programming skills to readers so that even average programmers and software engineers are able to develop cloud applications easily. The book at a glance This book introduces the fundamental principles of cloud computing and its related paradigms. It discusses the concepts of virtualization technologies along with the architectural models of cloud computing. It presents prominent cloud computing technologies that are available in the marketplace, including the Aneka Cloud Application Platform. The book contains chapters dedi- cated to discussion of concurrent, high-throughput, and data-intensive computing paradigms and their use in programming cloud applications. Various application case studies from domains such as science, engineering, gaming, and social networking are introduced, along with their architecture and how they leverage various cloud technologies. These case studies allow the reader to understand the mechanisms needed to harness cloud computing in their own respective endeavors. Finally, the book details many open research problems and opportunities that have arisen from the rapid uptake of cloud computing. We hope that this motivates the reader to address these in their own future research and development. The book also comes with an asso- ciated Website (hosted at www.buyya.com/MasteringClouds) that contains pointers to advanced online resources. The book contains 11 chapters, which are organized into three major parts: Part 1: Foundations Chapter 1—Introduction Chapter 2—Principles of Parallel and Distributed Computing Chapter 3—Virtualization Chapter 4—Cloud Computing Architecture Part 2: Cloud Application Programming and the Aneka Platform Chapter 5—Aneka: Cloud Application Platform Chapter 6—Concurrent Computing: Thread Programming Chapter 7—High-Throughput Computing: Task Programming Chapter 8—Data-Intensive Computing: MapReduce Programming Preface xv Part 3: Industrial Platforms and New Developments Chapter 9—Cloud Platforms in Industry Chapter 10—Cloud Applications Chapter 11—Advanced Topics in Cloud Computing The book serves as a perfect guide to the world of cloud computing. Starting with the funda- mentals, the book drives students and professionals through the practical use of these concepts via hands-on sessions on how to develop cloud applications, using Aneka as a reference platform. Part 3 goes beyond the reference platform and introduces other industrial technologies and solutions (Amazon Web Services, Google AppEngine, and Microsoft Azure) and real applications, identi- fies emerging trends, and offers future directions for cloud computing. Benefits and readership Given the rapid emergence of cloud computing as a mainstream computing paradigm, it is essential to have both a solid understanding of the core concepts characterizing the phenomenon and a prac- tical grasp of how to design and implement cloud computing applications and systems. This set of skills is already fundamental today for software architects, engineers, and developers because many applications are being moved to the cloud. It will become even more important in the future, when this technology matures further. This book provides an ideal blend of background information, the- ory, and practical cloud computing development techniques, expressed in a language that is accessi- ble to a wide range of readers: from graduate-level students to practitioners, developers, and engineers who want to, or need to, design and implement cloud computing solutions. Moreover, more advanced topics presented at the end of the manuscript make the book an interesting read for researchers in the field of cloud computing who want an overview of the next challenges in cloud computing that will arise in coming years. This book is a timely contribution to the cloud computing field, which is gaining considerable commercial interest and momentum. The book is targeted at graduate students and IT professionals such as system architects, practitioners, software engineers, and application programmers. As cloud computing is recognized as one of the top five emerging technologies that will have a major impact on the quality of science and society over the next 20 years, the knowledge conveyed through this book will help position our readers at the forefront of the field. Directions for adoption: theory, labs, and projects Given the importance of the cloud computing paradigm and its rapid uptake in industry, universities and educational institutions need to upgrade their curriculum by introducing one or more subjects in the area of cloud computing and related topics, such as parallel computing and distributed sys- tems. We recommend that they offer at least one subject on cloud computing as part of their under- graduate and postgraduate degree programs, such as B.E./B.Tech./BSc in computer science and related areas and Masters, including the Master of Computer Applications (MCA). We believe that xvi Preface this book will serve as an excellent textbook for such subjects. If the students have already had exposure to the concepts of parallel and distributed computing, Chapter 2 can be skipped. For those aiming to make their curriculum rich with cloud computing, we recommend offering two courses: “Introduction to Cloud Computing” and “Advanced Cloud Computing,” in two differ- ent semesters. This book has sufficient content to cater to both of them. The first subject can be based on Chapters 16 and the second one based on Chapters 711. In addition to theory, we strongly recommend the introduction of a laboratory subject that offers hands-on experience. The lab exercises and assignments can focus on creating high-performance cloud applications and assignments on a range of topics, including parallel execution of mathemati- cal functions, sorting of large data in parallel, image processing, and data mining. Using cloud soft- ware systems such as Aneka, institutions can easily set up a private/enterprise cloud computing facility by utilizing existing LAN-connected PCs running Windows. Students can use this facility to learn about various cloud application programming models and interfaces discussed in Chapter 6 (Thread Programming), Chapter 7 (Task Programming), and Chapter 8 (MapReduce Programming). Students need to learn various programming examples discussed in these chapters and execute them on an Aneka-based cloud facility. We encourage students to take up some of the program- ming exercises noted in the “Review Questions” sections of these chapters as lab assignments and develop their own solutions. Students can also carry out their final-year projects focused on developing cloud applications to solve real-world problems. For example, students can work with academics, researchers, and experts from other science and engineering disciplines, such as life and medical sciences or civil and mechanical engineering, and develop suitable applications that can harness the power of cloud computing. For inspiration, please read various application case studies presented in Chapter 10. Supplemental materials Supplemental materials for instructors or students can be downloaded from Elsevier: http://store.elsevier.com/product.jsp?isbn=9780124114548 PART Foundations 1 This page intentionally left blank CHAPTER Introduction 1 Computing is being transformed into a model consisting of services that are commoditized and delivered in a manner similar to utilities such as water, electricity, gas, and telephony. In such a model, users access services based on their requirements, regardless of where the services are hosted. Several computing paradigms, such as grid computing, have promised to deliver this utility computing vision. Cloud computing is the most recent emerging paradigm promising to turn the vision of “computing utilities” into a reality. Cloud computing is a technological advancement that focuses on the way we design computing systems, develop applications, and leverage existing services for building software. It is based on the concept of dynamic provisioning, which is applied not only to services but also to compute capability, storage, networking, and information technology (IT) infrastructure in general. Resources are made available through the Internet and offered on a pay-per-use basis from cloud computing vendors. Today, anyone with a credit card can subscribe to cloud services and deploy and configure servers for an application in hours, growing and shrinking the infrastructure serving its application according to the demand, and paying only for the time these resources have been used. This chapter provides a brief overview of the cloud computing phenomenon by presenting its vision, discussing its core features, and tracking the technological developments that have made it possible. The chapter also introduces some key cloud computing technologies as well as some insights into development of cloud computing environments. 1.1 Cloud computing at a glance In 1969, Leonard Kleinrock, one of the chief scientists of the original Advanced Research Projects Agency Network (ARPANET), which seeded the Internet, said: As of now, computer networks are still in their infancy, but as they grow up and become sophisti- cated, we will probably see the spread of ‘computer utilities’ which, like present electric and tele- phone utilities, will service individual homes and offices across the country. This vision of computing utilities based on a service-provisioning model anticipated the massive transformation of the entire computing industry in the 21st century, whereby computing services will be readily available on demand, just as other utility services such as water, electricity, tele- phone, and gas are available in today’s society. Similarly, users (consumers) need to pay providers 3 4 CHAPTER 1 Introduction only when they access the computing services. In addition, consumers no longer need to invest heavily or encounter difficulties in building and maintaining complex IT infrastructure. In such a model, users access services based on their requirements without regard to where the services are hosted. This model has been referred to as utility computing or, recently (since 2007), as cloud computing. The latter term often denotes the infrastructure as a “cloud” from which busi- nesses and users can access applications as services from anywhere in the world and on demand. Hence, cloud computing can be classified as a new paradigm for the dynamic provisioning of com- puting services supported by state-of-the-art data centers employing virtualization technologies for consolidation and effective utilization of resources. Cloud computing allows renting infrastructure, runtime environments, and services on a pay- per-use basis. This principle finds several practical applications and then gives different images of cloud computing to different people. Chief information and technology officers of large enterprises see opportunities for scaling their infrastructure on demand and sizing it according to their business needs. End users leveraging cloud computing services can access their documents and data anytime, anywhere, and from any device connected to the Internet. Many other points of view exist.1 One of the most diffuse views of cloud computing can be summarized as follows: I don’t care where my servers are, who manages them, where my documents are stored, or where my applications are hosted. I just want them always available and access them from any device connected through Internet. And I am willing to pay for this service for as a long as I need it. The concept expressed above has strong similarities to the way we use other services, such as water and electricity. In other words, cloud computing turns IT services into utilities. Such a deliv- ery model is made possible by the effective composition of several technologies, which have reached the appropriate maturity level. Web 2.0 technologies play a central role in making cloud computing an attractive opportunity for building computing systems. They have transformed the Internet into a rich application and service delivery platform, mature enough to serve complex needs. Service orientation allows cloud computing to deliver its capabilities with familiar abstrac- tions, while virtualization confers on cloud computing the necessary degree of customization, con- trol, and flexibility for building production and enterprise systems. Besides being an extremely flexible environment for building new systems and applications, cloud computing also provides an opportunity for integrating additional capacity or new features into existing systems. The use of dynamically provisioned IT resources constitutes a more attractive opportunity than buying additional infrastructure and software, the sizing of which can be difficult to estimate and the needs of which are limited in time. This is one of the most important advan- tages of cloud computing, which has made it a popular phenomenon. With the wide deployment of cloud computing systems, the foundation technologies and systems enabling them are becoming consolidated and standardized. This is a fundamental step in the realization of the long-term vision 1 An interesting perspective on the way cloud computing evokes different things to different people can be found in a series of interviews made by Rob Boothby, vice president and platform evangelist of Joyent, at the Web 2.0 Expo in May 2007. Chief executive officers (CEOs), chief technology officers (CTOs), founders of IT companies, and IT ana- lysts were interviewed, and all of them gave their personal perception of the phenomenon, which at that time was start- ing to spread. The video of the interview can be found on YouTube at the following link: www.youtube.com/watch? v56PNuQHUiV3Q. 1.1 Cloud computing at a glance 5 for cloud computing, which provides an open environment where computing, storage, and other ser- vices are traded as computing utilities. 1.1.1 The vision of cloud computing Cloud computing allows anyone with a credit card to provision virtual hardware, runtime environ- ments, and services. These are used for as long as needed, with no up-front commitments required. The entire stack of a computing system is transformed into a collection of utilities, which can be provisioned and composed together to deploy systems in hours rather than days and with virtually no maintenance costs. This opportunity, initially met with skepticism, has now become a practice across several application domains and business sectors (see Figure 1.1). The demand has fast- tracked technical development and enriched the set of services offered, which have also become more sophisticated and cheaper. Despite its evolution, the use of cloud computing is often limited to a single service at a time or, more commonly, a set of related services offered by the same vendor. Previously, the lack of effective standardization efforts made it difficult to move hosted services from one vendor to another. The long-term vision of cloud computing is that IT services are traded as utilities in an open market, without technological and legal barriers. In this cloud marketplace, cloud service pro- viders and consumers, trading cloud services as utilities, play a central role. Many of the technological elements contributing to this vision already exist. Different stake- holders leverage clouds for a variety of services. The need for ubiquitous storage and compute power on demand is the most common reason to consider cloud computing. A scalable runtime for applications is an attractive option for application and system developers that do not have infra- structure or cannot afford any further expansion of existing infrastructure. The capability for Web- based access to documents and their processing using sophisticated applications is one of the appealing factors for end users. In all these cases, the discovery of such services is mostly done by human intervention: a person (or a team of people) looks over the Internet to identify offerings that meet his or her needs. We imagine that in the near future it will be possible to find the solution that matches our needs by simply entering our request in a global digital market that trades cloud computing services. The existence of such a market will enable the automation of the discovery process and its integration into existing software systems, thus allowing users to transparently leverage cloud resources in their applications and systems. The existence of a global platform for trading cloud services will also help service providers become more visible and therefore potentially increase their revenue. A global cloud market also reduces the barriers between service consumers and providers: it is no lon- ger necessary to belong to only one of these two categories. For example, a cloud provider might become a consumer of a competitor service in order to fulfill its own promises to customers. These are all possibilities that are introduced with the establishment of a global cloud comput- ing marketplace and by defining effective standards for the unified representation of cloud services as well as the interaction among different cloud technologies. A considerable shift toward cloud computing has already been registered, and its rapid adoption facilitates its consolidation. Moreover, by concentrating the core capabilities of cloud computing into large datacenters, it is possible to reduce or remove the need for any technical infrastructure on the service consumer side. This approach provides opportunities for optimizing datacenter facilities and fully utilizing their I need to grow my I have a lot of infrastructure, but infrastructure that I I do not know for want to rent … how long… I have a surplus of infrastructure that I want to make use of I cannot invest in infrastructure, I just started my business…. I have infrastructure and middleware and I can host applications I want to focus on application logic and not maintenance and scalability issues I have infrastructure and provide application services I want to access and edit my documents and photos from everywhere.. FIGURE 1.1 Cloud computing vision. 1.1 Cloud computing at a glance 7 capabilities to serve multiple users. This consolidation model will reduce the waste of energy and carbon emissions, thus contributing to a greener IT on one end and increasing revenue on the other end. 1.1.2 Defining a cloud Cloud computing has become a popular buzzword; it has been widely used to refer to different technologies, services, and concepts. It is often associated with virtualized infrastructure or hard- ware on demand, utility computing, IT outsourcing, platform and software as a service, and many other things that now are the focus of the IT industry. Figure 1.2 depicts the plethora of different notions included in current definitions of cloud computing. The term cloud has historically been used in the telecommunications industry as an abstraction of the network in system diagrams. It then became the symbol of the most popular computer network: the Internet. This meaning also applies to cloud computing, which refers to an Internet-centric way of No capital investments Clou dbus r ting SaaS Quality of Service Inte S Gree Pay as you go rne Paa n com t putin g Billing g IaaS putin y com evel Utilit Elas cing ice L ticity sour Serv ment IT out Agre e l tua Vir nters lity IT outsourcing e labi Da ta c Sca Privac y & Tru g st onin visi d Pro eman Security on d tion aliza Vir tu FIGURE 1.2 Cloud computing technologies, concepts, and ideas. 8 CHAPTER 1 Introduction computing. The Internet plays a fundamental role in cloud computing, since it represents either the medium or the platform through which many cloud computing services are delivered and made accessible. This aspect is also reflected in the definition given by Armbrust et al. : Cloud computing refers to both the applications delivered as services over the Internet and the hardware and system software in the datacenters that provide those services. This definition describes cloud computing as a phenomenon touching on the entire stack: from the underlying hardware to the high-level software services and applications. It introduces the con- cept of everything as a service, mostly referred as XaaS,2 where the different components of a sys- tem—IT infrastructure, development platforms, databases, and so on—can be delivered, measured, and consequently priced as a service. This new approach significantly influences not only the way that we build software but also the way we deploy it, make it accessible, and design our IT infra- structure, and even the way companies allocate the costs for IT needs. The approach fostered by cloud computing is global: it covers both the needs of a single user hosting documents in the cloud and the ones of a CIO deciding to deploy part of or the entire corporate IT infrastructure in the pub- lic cloud. This notion of multiple parties using a shared cloud computing environment is highlighted in a definition proposed by the U.S. National Institute of Standards and Technology (NIST): Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Another important aspect of cloud computing is its utility-oriented approach. More than any other trend in distributed computing, cloud computing focuses on delivering services with a given pricing model, in most cases a “pay-per-use” strategy. It makes it possible to access online storage, rent virtual hardware, or use development platforms and pay only for their effective usage, with no or minimal up-front costs. All these operations can be performed and billed simply by entering the credit card details and accessing the exposed services through a Web browser. This helps us pro- vide a different and more practical characterization of cloud computing. According to Reese , we can define three criteria to discriminate whether a service is delivered in the cloud computing style: The service is accessible via a Web browser (nonproprietary) or a Web services application programming interface (API). Zero capital expenditure is necessary to get started. You pay only for what you use as you use it. Even though many cloud computing services are freely available for single users, enterprise- class services are delivered according a specific pricing scheme. In this case users subscribe to the service and establish with the service provider a service-level agreement (SLA) defining the 2 XaaS is an acronym standing for X-as-a-Service, where the X letter can be replaced by one of a number of things: S for software, P for platform, I for infrastructure, H for hardware, D for database, and so on. 1.1 Cloud computing at a glance 9 quality-of-service parameters under which the service is delivered. The utility-oriented nature of cloud computing is clearly expressed by Buyya et al. : A cloud is a type of parallel and distributed system consisting of a collection of interconnected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and consumers. 1.1.3 A closer look Cloud computing is helping enterprises, governments, public and private institutions, and research organizations shape more effective and demand-driven computing systems. Access to, as well as integration of, cloud computing resources and systems is now as easy as performing a credit card transaction over the Internet. Practical examples of such systems exist across all market segments: Large enterprises can offload some of their activities to cloud-based systems. Recently, the New York Times has converted its digital library of past editions into a Web-friendly format. This required a considerable amount of computing power for a short period of time. By renting Amazon EC2 and S3 Cloud resources, the Times performed this task in 36 hours and relinquished these resources, with no additional costs. Small enterprises and start-ups can afford to translate their ideas into business results more quickly, without excessive up-front costs. Animoto is a company that creates videos out of images, music, and video fragments submitted by users. The process involves a considerable amount of storage and backend processing required for producing the video, which is finally made available to the user. Animoto does not own a single server and bases its computing infrastructure entirely on Amazon Web Services, which are sized on demand according to the overall workload to be processed. Such workload can vary a lot and require instant scalability.3 Up-front investment is clearly not an effective solution for many companies, and cloud computing systems become an appropriate alternative. System developers can concentrate on the business logic rather than dealing with the complexity of infrastructure management and scalability. Little Fluffy Toys is a company in London that has developed a widget providing users with information about nearby bicycle rental services. The company has managed to back the widget’s computing needs on Google AppEngine and be on the market in only one week. End users can have their documents accessible from everywhere and any device. Apple iCloud is a service that allows users to have their documents stored in the Cloud and access them from any device users connect to it. This makes it possible to take a picture while traveling with a smartphone, go back home and edit the same picture on your laptop, and have it show as updated on your tablet computer. This process is completely transparent to the user, who does not have to set up cables and connect these devices with each other. How is all of this made possible? The same concept of IT services on demand—whether com- puting power, storage, or runtime environments for applications—on a pay-as-you-go basis 3 It has been reported that Animoto, in one single week, scaled from 70 to 8,500 servers because of user demand. 10 CHAPTER 1 Introduction accommodates these four different scenarios. Cloud computing does not only contribute with the opportunity of easily accessing IT services on demand, it also introduces a new way of thinking about IT services and resources: as utilities. A bird’s-eye view of a cloud computing environment is shown in Figure 1.3. The three major models for deploying and accessing cloud computing environments are public clouds, private/enterprise clouds, and hybrid clouds (see Figure 1.4). Public clouds are the most common deployment models in which necessary IT infrastructure (e.g., virtualized datacenters) is established by a third-party service provider that makes it available to any consumer on a subscrip- tion basis. Such clouds are appealing to users because they allow users to quickly leverage com- pute, storage, and application services. In this environment, users’ data and applications are deployed on cloud datacenters on the vendor’s premises. Large organizations that own massive computing infrastructures can still benefit from cloud computing by replicating the cloud IT service delivery model in-house. This idea has given birth to the concept of private clouds as opposed to public clouds. In 2010, for example, the U.S. federal government, one of the world’s largest consumers of IT spending (around $76 billion on more than Subscription - Oriented Cloud Services: X{compute, apps, data,..} Manjrasoft as a Service (..aaS) Public Clouds Applications Development and Runtime Platform Compute Cloud Hy Storage br Manager id Cl ou d Clients Private Cloud Other Govt. Cloud Services Cloud Services FIGURE 1.3 A bird’s-eye view of cloud computing. 1.1 Cloud computing at a glance 11 Cloud Deployment Models Public/Internet Private/Enterprise Hybrid/Inter Clouds Clouds Clouds *Third-party, *A public cloud model * Mixed use of multitenant cloud within a private and public infrastructure company’s own clouds; leasing public and services datacenter/infrastructure cloud services for internal when private cloud *Available on a and/or partners’ use capacity is insufficient subscription basis to all FIGURE 1.4 Major deployment models for cloud computing. 10,000 systems) started a cloud computing initiative aimed at providing government agencies with a more efficient use of their computing facilities. The use of cloud-based in-house solutions is also driven by the need to keep confidential information within an organization’s premises. Institutions such as governments and banks that have high security, privacy, and regulatory concerns prefer to build and use their own private or enterprise clouds. Whenever private cloud resources are unable to meet users’ quality-of-service requirements, hybrid computing systems, partially composed of public cloud resources and privately owned infra- structures, are created to serve the organization’s needs. These are often referred as hybrid clouds, which are becoming a common way for many stakeholders to start exploring the possibilities offered by cloud computing. 1.1.4 The cloud computing reference model A fundamental characteristic of cloud computing is the capability to deliver, on demand, a variety of IT services that are quite diverse from each other. This variety creates different perceptions of what cloud computing is among users. Despite this lack of uniformity, it is possible to classify cloud computing services offerings into three major categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS). These categories are related to each other as described in Figure 1.5, which provides an organic view of cloud computing. We refer to this diagram as the Cloud Computing Reference Model, and we will use it throughout the 12 CHAPTER 1 Introduction Web 2.0 Software as a Service Interfaces End-user applications Scientific applications Office automation, photo editing, CRM, and social networking Examples : Google Documents, Facebook, Flickr, Salesforce Platform as a Service Runtime environment for applications Development and data processing platforms Examples : Windows Azure, Hadoop, Google AppEngine, Aneka Infrastructure as a Service Virtualized servers Storage and networking Examples : Amazon EC2, S3, Rightscale, vCloud FIGURE 1.5 The Cloud Computing Reference Model. book to explain the technologies and introduce the relevant research on this phenomenon. The model organizes the wide range of cloud computing services into a layered view that walks the computing stack from bottom to top. At the base of the stack, Infrastructure-as-a-Service solutions deliver infrastructure on demand in the form of virtual hardware, storage, and networking. Virtual hardware is utilized to provide compute on demand in the form of virtual machine instances. These are created at users’ request on the provider’s infrastructure, and users are given tools and interfaces to configure the software stack installed in the virtual machine. The pricing model is usually defined in terms of dollars per hour, where the hourly cost is influenced by the characteristics of the virtual hardware. Virtual storage is delivered in the form of raw disk space or object store. The former complements a virtual hardware offering that requires persistent storage. The latter is a more high-level abstraction for storing enti- ties rather than files. Virtual networking identifies the collection of services that manage the net- working among virtual instances and their connectivity to the Internet or private networks. Platform-as-a-Service solutions are the next step in the stack. They deliver scalable and elastic runtime environments on demand and host the execution of applications. These services are backed by a core middleware platform that is responsible for creating the abstract environment where applications are deployed and executed. It is the responsibility of the service provider to provide scalability and to manage fault tolerance, while users are requested to focus on the logic of the application developed by leveraging the provider’s APIs and libraries. This approach increases the level of abstraction at which cloud computing is leveraged but also constrains the user in a more controlled environment. At the top of the stack, Software-as-a-Service solutions provide applications and services on demand. Most of the common functionalities of desktop applications—such as office 1.1 Cloud computing at a glance 13 automation, document management, photo editing, and customer relationship management (CRM) software—are replicated on the provider’s infrastructure and made more scalable and accessible through a browser on demand. These applications are shared across multiple users whose interac- tion is isolated from the other users. The SaaS layer is also the area of social networking Websites, which leverage cloud-based infrastructures to sustain the load generated by their popularity. Each layer provides a different service to users. IaaS solutions are sought by users who want to leverage cloud computing from building dynamically scalable computing systems requiring a spe- cific software stack. IaaS services are therefore used to develop scalable Websites or for back- ground processing. PaaS solutions provide scalable programming platforms for developing applications and are more appropriate when new systems have to be developed. SaaS solutions tar- get mostly end users who want to benefit from the elastic scalability of the cloud without doing any software development, installation, configuration, and maintenance. This solution is appropriate when there are existing SaaS services that fit users needs (such as email, document management, CRM, etc.) and a minimum level of customization is needed. 1.1.5 Characteristics and benefits Cloud computing has some interesting characteristics that bring benefits to both cloud service con- sumers (CSCs) and cloud service providers (CSPs). These characteristics are: No up-front commitments On-demand access Nice pricing Simplified application acceleration and scalability Efficient resource allocation Energy efficiency Seamless creation and use of third-party services The most evident benefit from the use of cloud computing systems and technologies is the increased economical return due to the reduced maintenance costs and operational costs related to IT software and infrastructure. This is mainly because IT assets, namely software and infrastructure, are turned into utility costs, which are paid for as long as they are used, not paid for up front. Capital costs are costs associated with assets that need to be paid in advance to start a business activity. Before cloud computing, IT infrastructure and software generated capital costs, since they were paid up front so that business start-ups could afford a computing infrastructure, enabling the business activities of the organization. The revenue of the business is then utilized to compensate over time for these costs. Organizations always minimize capital costs, since they are often associ- ated with depreciable values. This is the case of hardware: a server bought today for $1,000 will have a market value less than its original price when it is eventually replaced by new hardware. To make profit, organizations have to compensate for this depreciation created by time, thus reducing the net gain obtained from revenue. Minimizing capital costs, then, is fundamental. Cloud comput- ing transforms IT infrastructure and software into utilities, thus significantly contributing to increas- ing a company’s net gain. Moreover, cloud computing also provides an opportunity for small organizations and start-ups: these do not need large investments to start their business, but they can comfortably grow with it. Finally, maintenance costs are significantly reduced: by renting the 14 CHAPTER 1 Introduction infrastructure and the application services, organizations are no longer responsible for their mainte- nance. This task is the responsibility of the cloud service provider, who, thanks to economies of scale, can bear the maintenance costs. Increased agility in defining and structuring software systems is another significant benefit of cloud computing. Since organizations rent IT services, they can more dynamically and flexibly com- pose their software systems, without being constrained by capital costs for IT assets. There is a reduced need for capacity planning, since cloud computing allows organizations to react to unplanned surges in demand quite rapidly. For example, organizations can add more servers to pro- cess workload spikes and dismiss them when they are no longer needed. Ease of scalability is another advantage. By leveraging the potentially huge capacity of cloud computing, organizations can extend their IT capability more easily. Scalability can be leveraged across the entire computing stack. Infrastructure providers offer simple methods to provision customized hardware and integrate it into existing systems. Platform-as-a-Service providers offer runtime environment and programming mod- els that are designed to scale applications. Software-as-a-Service offerings can be elastically sized on demand without requiring users to provision hardware or to program application for scalability. End users can benefit from cloud computing by having their data and the capability of operating on it always available, from anywhere, at any time, and through multiple devices. Information and services stored in the cloud are exposed to users by Web-based interfaces that make them accessi- ble from portable devices as well as desktops at home. Since the processing capabilities (that is, office automation features, photo editing, information management, and so on) also reside in the cloud, end users can perform the same tasks that previously were carried out through considerable software investments. The cost for such opportunities is generally very limited, since the cloud ser- vice provider shares its costs across all the tenants that he is servicing. Multitenancy allows for bet- ter utilization of the shared infrastructure that is kept operational and fully active. The concentration of IT infrastructure and services into large datacenters also provides opportunity for considerable optimization in terms of resource allocation and energy efficiency, which eventually can lead to a less impacting approach on the environment. Finally, service orientation and on-demand access create new opportunities for composing sys- tems and applications with a flexibility not possible before cloud computing. New service offerings can be created by aggregating together existing services and concentrating on added value. Since it is possible to provision on demand any component of the computing stack, it is easier to turn ideas into products with limited costs and by concentrating technical efforts on what matters: the added value. 1.1.6 Challenges ahead As any new technology develops and becomes popular, new issues have to be faced. Cloud com- puting is not an exception. New, interesting problems and challenges are regularly being posed to the cloud community, including IT practitioners, managers, governments, and regulators. Besides the practical aspects, which are related to configuration, networking, and sizing of cloud computing systems, a new set of challenges concerning the dynamic provisioning of cloud comput- ing services and resources arises. For example, in the Infrastructure-as-a-Service domain, how many resources need to be provisioned, and for how long should they be used, in order to maxi- mize the benefit? Technical challenges also arise for cloud service providers for the management of large computing infrastructures and the use of virtualization technologies on top of them. In 1.2 Historical developments 15 addition, issues and challenges concerning the integration of real and virtual infrastructure need to be taken into account from different perspectives, such as security and legislation. Security in terms of confidentiality, secrecy, and protection of data in a cloud environment is another important challenge. Organizations do not own the infrastructure they use to process data and store information. This condition poses challenges for confidential data, which organizations cannot afford to reveal. Therefore, assurance on the confidentiality of data and compliance to secu- rity standards, which give a minimum guarantee on the treatment of information on cloud comput- ing systems, are sought. The problem is not as evident as it seems: even though cryptography can help secure the transit of data from the private premises to the cloud infrastructure, in order to be processed the information needs to be decrypted in memory. This is the weak point of the chain: since virtualization allows capturing almost transparently the memory pages of an instance, these data could easily be obtained by a malicious provider. Legal issues may also arise. These are specifically tied to the ubiquitous nature of cloud com- puting, which spreads computing infrastructure across diverse geographical locations. Different leg- islation about privacy in different countries may potentially create disputes as to the rights that third parties (including government agencies) have to your data. U.S. legislation is known to give extreme powers to government agencies to acquire confidential data when there is the suspicion of operations leading to a threat to national security. European countries are more restrictive and pro- tect the right of privacy. An interesting scenario comes up when a U.S. organization uses cloud ser- vices that store their data in Europe. In this case, should this organization be suspected by the government, it would become difficult or even impossible for the U.S. government to take control of the data stored in a cloud datacenter located in Europe. 1.2 Historical developments The idea of renting computing services by leveraging large distributed computing facilities has been around for long time. It dates back to the days of the mainframes in the early 1950s. From there on, technology has evolved and been refined. This process has created a series of favorable conditions for the realization of cloud computing. Figure 1.6 provides an overview of the evolution of the distributed computing technologies that have influenced cloud computing. In tracking the historical evolution, we briefly review five core technologies that played an important role in the realization of cloud computing. These technolo- gies are distributed systems, virtualization, Web 2.0, service orientation, and utility computing. 1.2.1 Distributed systems Clouds are essentially large distributed computing facilities that make available their services to third parties on demand. As a reference, we consider the characterization of a distributed system proposed by Tanenbaum et al. : A distributed system is a collection of independent computers that appears to its users as a single coherent system. 16 CHAPTER 1 Introduction 2010: Microsoft 1970: DARPA’s TCP/IP 1999: Grid Computing Azure 1984: IEEE 802.3 1997: IEEE Ethernet & LAN 2008: Google 802.11 (Wi-Fi) AppEngine 1966: Flynn’s Taxonomy SISD, SIMD, MISD, MIMD 1989: TCP/IP IETF RFC 1122 2007: Manjrasoft Aneka 1969: ARPANET 1984: DEC’s 2005: Amazon 1951: UNIVAC I, VMScluster AWS (EC2, S3) First Mainframe 1975: Xerox PARC Invented Ethernet 2004: Web 2.0 Clouds 1990: Lee-Calliau WWW, HTTP, HTML 1960: Cray’s First Grids Supercomputer Clusters Mainframes 1950 1960 1970 1980 1990 2000 2010 FIGURE 1.6 The evolution of distributed computing technologies, 1950s2010s. This is a general definition that includes a variety of computer systems, but it evidences two very important elements characterizing a distributed system: the fact that it is composed of multiple independent components and that these components are perceived as a single entity by users. This is particularly true in the case of cloud computing, in which clouds hide the complex architecture they rely on and provide a single interface to users. The primary purpose of distributed systems is to share resources and utilize them better. This is true in the case of cloud computing, where this concept is taken to the extreme and resources (infrastructure, runtime environments, and services) are rented to users. In fact, one of the driving factors of cloud computing has been the availability of the large computing facilities of IT giants (Amazon, Google) that found that offering their com- puting capabilities as a service provided opportunities to better utilize their infrastructure. Distributed systems often exhibit other properties such as heterogeneity, openness, scalability, transparency, concurrency, continuous availability, and independent failures. To some extent these also characterize clouds, especially in the context of scalability, concurrency, and continuous availability. Three major milestones have led to cloud computing: mainframe computing, cluster computing, and grid computing. Mainframes. These were the first examples of large computational facilities leveraging multiple processing units. Mainframes were powerful, highly reliable computers specialized for large 1.2 Historical developments 17 data movement and massive input/output (I/O) operations. They were mostly used by large organizations for bulk data processing tasks such as online transactions, enterprise resource planning, and other operations involving the processing of significant amounts of data. Even though mainframes cannot be considered distributed systems, they offered large computational power by using multiple processors, which were presented as a single entity to users. One of the most attractive features of mainframes was the ability to be highly reliable computers that were “always on” and capable of tolerating failures transparently. No system shutdown was required to replace failed components, and the system could work without interruption. Batch processing was the main application of mainframes. Now their popularity and deployments have reduced, but evolved versions of such systems are still in use for transaction processing (such as online banking, airline ticket booking, supermarket and telcos, and government services). Clusters. Cluster computing started as a low-cost alternative to the use of mainframes and supercomputers. The technology advancement that created faster and more powerful mainframes and supercomputers eventually generated an increased availability of cheap commodity machines as a side effect. These machines could then be connected by a high-bandwidth network and controlled by specific software tools that manage them as a single system. Starting in the 1980s, clusters become the standard technology for parallel and high-performance computing. Built by commodity machines, they were cheaper than mainframes and made high-performance computing available to a large number of groups, including universities and small research labs. Cluster technology contributed considerably to the evolution of tools and frameworks for distributed computing, including Condor , Parallel Virtual Machine (PVM) , and Message Passing Interface (MPI).4 One of the attractive features of clusters was that the computational power of commodity machines could be leveraged to solve problems that were previously manageable only on expensive supercomputers. Moreover, clusters could be easily extended if more computational power was required. Grids. Grid computing appeared in the early 1990s as an evolution of cluster computing. In an analogy to the power grid, grid computing proposed a new approach to access large computational power, huge storage facilities, and a variety of services. Users can “consume” resources in the same way as they use other utilities such as power, gas, and water. Grids initially developed as aggregations of geographically dispersed clusters by means of Internet connections. These clusters belonged to different organizations, and arrangements were made among them to share the computational power. Different from a “large cluster,” a computing grid was a dynamic aggregation of heterogeneous computing nodes, and its scale was nationwide or even worldwide. Several developments made possible the diffusion of computing grids: (a) clusters became quite common resources; (b) they were often underutilized; (c) new problems were requiring computational power that went beyond the capability of single clusters; and (d) the improvements in networking and the diffusion of the Internet made possible long-distance, high-bandwidth connectivity. All these elements led to the development of grids, which now serve a multitude of users across the world. 4 MPI is a specification for an API that allows many computers to communicate with one another. It defines a language- independent protocol that supports point-to-point and collective communication. MPI has been designed for high perfor- mance, scalability, and portability. At present, it is one of the dominant paradigms for developing parallel applications. 18 CHAPTER 1 Introduction Cloud computing is often considered the successor of grid computing. In reality, it embodies aspects of all these three major technologies. Computing clouds are deployed in large datacenters hosted by a single organization that provides services to others. Clouds are characterized by the fact of having virtually infinite capacity, being tolerant to failures, and being always on, as in the case of mainframes. In many cases, the computing nodes that form the infrastructure of computing clouds are commodity machines, as in the case of clusters. The services made available by a cloud vendor are consumed on a pay-per-use basis, and clouds fully implement the utility vision intro- duced by grid computing. 1.2.2 Virtualization Virtualization is another core technology for cloud computing. It encompasses a collection of solu- tions allowing the abstraction of some of the fundamental elements for computing, such as hard- ware, runtime environments, storage, and networking. Virtualization has been around for more than 40 years, but its application has always been limited by technologies that did not allow an efficient use of virtualization solutions. Today these limitations have been substantially overcome, and vir- tualization has become a fundamental element of cloud computing. This is particularly true for solutions that provide IT infrastructure on demand. Virtualization confers that degree of customiza- tion and control that makes cloud computing appealing for users and, at the same time, sustainable for cloud services providers. Virtualization is essentially a technology that allows creation of different computing environ- ments. These environments are called virtual because they simulate the interface that is expected by a guest. The most common example of virtualization is hardware virtualization. This technology allows simulating the hardware interface expected by an operating system. Hardware virtualization allows the coexistence of different software stacks on top of the same hardware. These stacks are contained inside virtual machine instances, which operate in complete isolation from each other. High-performance servers can host several virtual machine instances, thus creating the opportunity to have a customized software stack on demand. This is the base technology that enables cloud computing solutions to deliver virtual servers on demand, such as Amazon EC2, RightScale, VMware vCloud, and others. Together with hardware virtualization, storage and network virtualiza- tion complete the range of technologies for the emulation of IT infrastructure. Virtualization technologies are also used to replicate runtime environments for programs. Applications in the case of process virtual machines (which include the foundation of technologies such as Java or.NET), instead of being executed by the operating system, are run by a specific pro- gram called a virtual machine. This technique allows isolating the execution of applications and pro- viding a finer control on the resource they access. Process virtual machines offer a higher level of abstraction with respect to hardware virtualization, since the guest is only constituted by an applica- tion rather than a complete software stack. This approach is used in cloud computing to provide a platform for scaling applications on demand, such as Google AppEngine and Windows Azure. Having isolated and customizable environments with minor impact on performance is what makes virtualization a attractive technology. Cloud computing is realized through platforms that leverage the basic co