Cloud Computing: Service Level Agreement (SLA) PDF
Document Details
Uploaded by SplendidBanshee5199
IIT Kharagpur
Tags
Summary
This document is about cloud computing service level agreements (SLAs). It covers topics like the definition, contents, and different types of SLAs, as well as an overview of QoS parameters, and factors to consider in different cloud service models.
Full Transcript
#### ![](media/image2.png)CLOUD COMPUTING SERVICE LEVEL AGREEMENT (SLA) #### What is Service Level Agreement? - - SLA: foundation of the consumer's trust in the provider - - SLA contains Service Level Objectives (SLOs) - - SLA & SLO: basis of selection of cloud provider 2 ####...
#### ![](media/image2.png)CLOUD COMPUTING SERVICE LEVEL AGREEMENT (SLA) #### What is Service Level Agreement? - - SLA: foundation of the consumer's trust in the provider - - SLA contains Service Level Objectives (SLOs) - - SLA & SLO: basis of selection of cloud provider 2 #### ![](media/image3.jpeg)SLA Contents - - A complete, specific definition of each service - - A set of metrics to measure whether the provider is offering the services as guaranteed - - The remedies available to the consumer and the provider if the terms are not satisfied - 3 #### Web Service SLA - - XML-based language and protocol for negotiating, establishing, and managing service agreements at runtime - Specify the nature of agreement template - Facilitates in discovering compatible providers - Interaction : request-response - SLA violation : dynamically managed and verified - - Formal XML-schema based language to express SLA and a runtime interpreter - Measure and monitor QoS parameters and report violations - Lack of formal definitions for semantics of metrics 4 #### ![](media/image3.jpeg)Difference between Cloud SLA and Web Service SLA - QoS Parameters : - Traditional Web Service : response time, SLA violation rate for reliability, availability, cost of service, etc. - Cloud computing : QoS related to security, privacy, trust, management, etc. - Automation : - Traditional Web Service : SLA negotiation, provisioning, service delivery, monitoring are not automated. - Cloud computing : SLA automation is required for highly dynamic and scalable service consumption - Resource Allocation : - Traditional Web Service : *UDDI (Universal Description Discovery and Integration)* for advertising and discovering between web services - Cloud computing : resources are allocated and distributed globally without any central directory 5 #### ![](media/image3.jpeg)Types of SLA - Present market place features two types of SLAs : - Off-the-shelf SLA or non-negotiable SLA or Direct SLA - Non-conducive for mission-critical data or applications - Provider creates the SLA template and define all criteria viz. contract period, billing, response time, availability, etc. - *Followed by the present day state-of-the-art clouds*. - Negotiable SLA - Negotiation via external agent - Negotiation via multiple external agents 6 #### ![](media/image3.jpeg)Service Level Objectives (SLOs) - ###### Objectively measurable conditions for the service - Encompasses multiple QoS parameters viz. availability, serviceability, billing, penalties, throughput, response time, or quality - ###### Example : - "***Availability*** of a service X is 99.9%" - "***Response time*** of a database query Q is between 3 to 5 seconds" - "***Throughput*** of a server S at peak load time is 0.875" 7 #### ![](media/image3.jpeg)Service Level Management - ###### Monitoring and measuring performance of services based on SLOs - Provider perspective : - Make decisions based on business objectives and technical realties - ###### Consumer perspective : - 8 #### ![](media/image3.jpeg)Considerations for SLA - **Business Level Objectives**: Consumers should know *why* they are using cloud - **Responsibilities of the Provider and Consumer**: The balance of responsibilities between providers and consumers will vary according to the type of service. - **Business Continuity and Disaster Recovery**: Consumers should ensure their cloud providers have adequate protection in case of a disaster. - 9 - ![](media/image2.png)**Maintenance**: Maintenance of cloud infrastructure affects any kind of cloud offerings (applicable to both software and hardware) - **Location of Data**: If a cloud service provider promises to enforce data location regulations, the consumer must be able to audit the provider to prove that regulations are being followed. - **Seizure of Data**: If law enforcement targets the data and applications associated with a particular consumer, the multi-tenant nature of cloud computing makes it likely that other consumers will be affected. Therefore, the consumer should consider using a third-party to keep backups of their data - **Failure of the Provider**: Consumers should consider the financial health of their provider and make contingency plans. The provider's policies of handling data and applications of a consumer whose account is delinquent or under dispute are to be considered. - **Jurisdiction**: Consumers should understand the laws that apply to any cloud providers they consider. 10 #### SLA Requirements - ![](media/image2.png)**Security**: Cloud consumer must understand the controls and federation patterns necessary to meet the security requirements. Providers must understand what they should deliver to enable the appropriate controls and federation patterns. - **Data Encryption**: Details of encryption and access control policies. - **Privacy**: Isolation of customer data in a multi-tenant environment. - **Data Retention and Deletion**: Some cloud providers have legal requirements of retaining data even of it has been deleted by the consumer. Hence, they must be able to prove their compliance with these policies. - **Hardware Erasure and Destruction**: Provider requires to zero out the memory if a consumer powers off the VM or even zero out the platters of a disk, if it is to be disposed or recycled. 11 - **Regulatory Compliance**: If regulations are enforced on data and applications, the providers should be able to prove compliance. - **Transparency**: For critical data and applications, providers must be proactive in notifying consumers when the terms of the SLA are breached. - **Certification**: The provider should be responsible in proving the certification of any kind of data or - **Monitoring**: To eliminate the conflict of interest between the provider and the consumer, a neural third-party organization is the best solution to monitor performance. - **Auditability**: As the consumers are liable to any breaches that occur, it is vital that they should be able to audit provider's systems and procedures. An SLA should make it clear how and when those audits take place. Because audits are disruptive and expensive, the provider will most likely place limits and charges on them. 12 - Low-level resource metrics -------------------------- - Multiple *KPI*s are composed, aggregated, or converted to for high-level *SLO*s. - Example : --------- - downtime, uptime, inbytes, outbytes, packet size, etc. - Possible mapping : ------------------ - ***Availability (A) = 1 -- (downtime/uptime)*** 13 #### ![](media/image3.jpeg)Industry-defined KPIs - Monitoring: - - "who should monitor the performance of the provider?" - "does the consumer meet its responsibilities?" - - Eliminates conflicts of interest if: - Provider reports outage at its sole discretion - Consumer is responsible for an outage - Auditability: - - Is the provider adhering to legal regulations or industry-standard - SLA should make it clear how and when to conduct audits 14 #### ![](media/image3.jpeg)Metrics for Monitoring and Auditing - **Throughput** -- How quickly the service responds - **Availability** -- Represented as a percentage of uptime for a service in a given observation period. - **Reliability** -- How often the service is available - **Load balancing** -- When elasticity kicks in (new VMs are booted or terminated, for example) - **Durability** -- How likely the data is to be lost - **Elasticity** -- The ability for a given resource to grow infinitely, with limits (the - **Linearity** -- How a system performs as the load increases 15 #### ![](media/image3.jpeg)Metrics for Monitoring and Auditing *(Contd...)* - **Agility** -- How quickly the provider responds as the consumer\'s resource load scales up and down - **Automation** -- What percentage of requests to the provider are handled without any human - **Customer service response times** -- How quickly the provider responds to a service request. This refers to the human interactions required when something goes wrong with the on- demand, self-service aspects of the cloud. - **Service-level violation rate** -- Expressed as the mean rate of SLA violation due to - **Transaction time** -- Time that has elapsed from when a service is invoked till the completion of the transaction, including the delays. - **Resolution time** -- Time period between detection of a service problem and its resolution. 16 #### ![](media/image3.jpeg)![](media/image2.png)SLA Requirements w.r.t. Cloud Delivery Models *Source: "Cloud Computing Use Cases* *White Paper" Version 4.0* 17 #### ![](media/image2.png)Example Cloud SLAs +-----------------+-----------------+-----------------+-----------------+ | | | | | +-----------------+-----------------+-----------------+-----------------+ | | | Storage-as-a- | | | | | | | | | | Service | | +-----------------+-----------------+-----------------+-----------------+ | | | | | +-----------------+-----------------+-----------------+-----------------+ | | | | | +-----------------+-----------------+-----------------+-----------------+ | | | | | +-----------------+-----------------+-----------------+-----------------+ 18 -- -- -- -- -- -- -- -- 19 -- -- -- -- -- -- -- -- 20 -- -- -- -- -- -- -- -- 21 - Service measurement - ![](media/image3.jpeg)Restricted to uptime percentage - Measured by taking the mean of service availability observed over a specific period of time - Ignores other parameters like stability, capacity, etc. - - Measurement of parameters are mostly established according to vendor's advantage - - Customers are given access to some ticketing systems and are responsible for - Providers do not provide any access to active data streams or audit trails, nor do they report any outages. 22 - Gap between *QoS hype* and *SLA offerings* in reality - QoS in the areas of *governance*, *reliability*, *availability*, *security*, and - No formal ways of verifying if the SLA guarantees are complying or - Proper SLA are good for both provider as well as the customer - - 23 ### Expected SLA Parameters - ![](media/image2.png)Infrastructure-as-a-Service (IaaS): - CPU capacity, cache memory size, boot time of standard images, storage, scale up (maximum number of VMs for each user), scale down (minimum number of VMs for each user), On demand availability, scale uptime, scale downtime, auto scaling, maximum number of VMs configured on physical servers, availability, cost related to geographic locations, and response time - Platform-as-a-Service (PaaS): - 24 - Software-as-a-Service (SaaS): - Reliability, usability, scalability, availability, customizability, Response time - ###### Storage-as-a-Service : - 25 ![](media/image3.jpeg) 26 ![](media/image1.jpeg) 1 - **C**ommon Infrastructure - pooled, standardized resources, with benefits generated by statistical multiplexing. - **L**ocation-independence - ubiquitous availability meeting performance requirements, with benefits deriving from latency reduction and user experience enhancement. - **O**nline connectivity - #### ![](media/image3.jpeg)Cloud Properties: Economic Viewpoint - **U**tility pricing - - on-**D**emand Resource**s** - #### Value of Common Infrastructure - ![](media/image2.png)Economies of scale - - Buyer power through volume purchasing - Statistics of Scale - For infrastructure built to peak requirements: - Multiplexing demand higher utilization - - For infrastructure built to less than peak: - - Lower loss of revenue or a Service-Level agreement violation payout. #### A Useful Measure of "Smoothness" - - ≠ the variance *σ^2^* nor the correlation coefficient - - "Smoother" curves: - large mean for a given standard deviation - or smaller standard deviation for a given mean - Importance of *smoothness*: - a facility with fixed assets servicing highly variable demand will achieve lower utilization than a similar one servicing relatively smooth demand. - **Multiplexing demand from multiple sources may reduce the coefficient of variation *CV*** #### Coefficient of variation *CV* - ![](media/image2.png)X1, X2,..., Xn independent random variables for demand - Identical standard variation *σ* and mean *µ* - Aggregated demand - - Variance sum of variances: n. *σ^2^* - *n. µ* - ![](media/image3.jpeg)Adding n independent demands reduces the Cv by 𝑛 - - Aggregating 100 workloads bring the penalty to 10% #### But What about Workloads? - ###### ![](media/image2.png)Negative correlation demands - X and 1-X Sum is random variable 1 - ###### Appropriate selection of customer segments - Perfectly correlated demands - Aggregated demand : *n.X*, varianceofsum:*n^2^σ^2^(X)* - Mean: *n.µ*, standard deviation: *n.σ(X)* - ###### Coefficient of Variance remains constant - Simultaneous peaks #### Common Infrastructure in Real World - ![](media/image2.png)Correlated demands: - Private, mid-size and large-size providers can experience similar statistics of scale - Independent demands: - Midsize providers can achieve similar statistical economies to an infinitely large provider - Available data on economy of scale for large providers is - use the same COTS computers and components - Locating near cheap power supplies - Early entrant automation tools 3^rd^ parties take care of it #### Value of Location Independence - ![](media/image2.png)We used to go to the computers, but applications, services and contents now come to us! - Through networks: Wired, wireless, satellite, etc. - But what about latency? - Human response latency: 10s to 100s milliseconds - Latency is correlated with: - - Routing algorithms of routers and switches (second order effects) - Speed of light in fiber: only 124 miles per millisecond - If the Google word suggestion took 2 seconds - VOIP with latency of 200ms or more #### Value of Location Independence - ###### Supporting a global user base requires a dispersed service architecture - Coordination, consistency, availability, partition-tolerance - **Investment implications** #### ![](media/image3.jpeg)Value of Utility Pricing - As mentioned before, economy of scale might not be very effective - But cloud services don't need to be cheaper to be economical! - Consider a car - Buy or lease for INR 10,000/- per day - Rent a car for INR 45,000/- a day - If you need a car for 2 days in a trip, buying would be much more costly than renting - #### ![](media/image2.png)Utility Pricing in Detail ![](media/image9.png)CT= - Because the baseline should - When utility premium is less than ratio of peak demand to Average demand #### ![](media/image3.jpeg)Utility Pricing in Real World - - - You own a car for daily commute, and rent a car when traveling or when you need a van to move - Key factor is again the ratio of peak to average demand - But we should also consider other costs - Network cost (both fixed costs and usage costs) - Interoperability overhead - Consider Reliability, accessibility #### Value of on-Demand Services - I. Either pay for unused resources, or suffer the penalty of missing service delivery - *If demand is flat, penalty = 0* - *If demand is linear periodic provisioning* #### ![](media/image12.png)Penalty Costs for Exponential Demand - - ![](media/image3.jpeg)If demand is exponential (*D(t)=e^t^)*, any fixed provisioning interval (*tp*) according to the current demands will fall exponentially behind - *R(t) =* 𝑒^𝑡−𝑡^𝑝 - *D(t) -- R(t) =* 𝑒^𝑡^ − 𝑒^𝑡−𝑡^𝑝 = 𝑒^𝑡^ 1 − 𝑒^𝑡^𝑝 = - Penalty cost 𝖺*c.k~1~e^t^* #### Coefficient of Variation - Cv - ![](media/image2.png)A statistical measure of the dispersion of data points in a data series around the mean. - The coefficient of variation represents the *ratio of the standard deviation to the mean*, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from each other - In the investing world, the coefficient of variation allows you to determine how much volatility (risk) you are assuming in comparison to the amount of return you can expect from your investment. In simple language, the lower the ratio of standard deviation to mean return, the better your risk-return tradeoff. #### ![](media/image13.png)Assignment 1 17 #### ![](media/image2.png)Assignment 1 (contd...) The cost to provision unit cloud resource for unit time is 0.9 units. Calculate the penalty and draw inference. (Penalty: Either pay for unused resource or missing service delivery) ![](media/image2.png)19 1 - Relational database Introduction ============ - ![](media/image3.jpeg)Default data storage and retrieval mechanism since 80s - Efficient in: transaction processing - Example: System R, Ingres, etc. - Replaced hierarchical and network databases - - Google File System (GFS) - Massively parallel and fault tolerant distributed file system - BigTable - Organizes data - Similar to column-oriented databases (e.g. Vertica) - MapReduce - Parallel programming paradigm - - Large volume massively parallel text processing - Enterprise analytics - - Google App Engine's **Datastore** - Amazon's **SimpleDB** ![](media/image3.jpeg)Relational Databases ========================================== - - RDBM parser: - Transforms queries into memory and disk-level operations - Optimizes execution time - - Stores data records on pages of contiguous memory blocks - Pages are fetched from disk into memory as requested using pre-fetching and - Database file system layer: - Independent of OS file system - Reason: - - Files used by the DB may span multiple disks to handle large storage - Uses parallel I/O systems, viz. RAID disk arrays or multi- ![](media/image3.jpeg)Data Storage Techniques ============================================= - - Optimal for write-oriented operations viz. transaction processing applications - Relational records: stored on contiguous disk pages - Accessed through indexes (primary index) on specified columns - Example: B^+^- tree like storage - - Efficient for data-warehouse workloads - Aggregation of **measure** columns need to be performed based on values from **dimension** - Projection of a table is stored as sorted by dimension values - Require multiple "join indexes" ![](media/image24.png)Data Storage Techniques Contd... ====================================================== - Shared memory Parallel Database Architectures =============================== - ![](media/image2.png)Suitable for servers with multiple CPUs - Memory address space is shared and managed by a symmetric multi-processing (SMP) operating system - SMP: - Schedules processes in parallel exploiting all the processors - Shared nothing - Cluster of independent servers each with its own disk space - Connected by a network - Shared disk - Hybrid architecture - Independent server clusters share storage through high-speed network storage viz. NAS (network attached storage) or SAN (storage area network) - Clusters are connected to storage via: standard Ethernet, or faster Fiber Channel or Infiniband connections ![](media/image26.png)Parallel Database Architectures contd... ============================================================== Advantages of Parallel DB over Relational DB ============================================ - ![](media/image2.png)Efficient execution of SQL queries by exploiting multiple processors - For **shared nothing** architecture: - Tables are partitioned and distributed across multiple processing nodes - SQL optimizer handles distributed joins - Distributed **two-phase commit** locking for transaction isolation between processors - Fault tolerant - System failures handled by transferring control to "stand-by" system \[for transaction processing\] - Restoring computations \[for data warehousing applications\] Advantages of Parallel DB over Relational DB ============================================ - ###### ![](media/image2.png)Examples of databases capable of handling parallel processing: - Traditional transaction processing databases: **Oracle, DB2, SQL Server** - Data warehousing databases: **Netezza, Vertica, Teradata** Cloud File Systems ================== - - Designed to manage relatively large files using a very large distributed cluster of commodity servers connected by a high-speed network - Handles: - Failures even during reading or writing of individual files - Fault tolerant: a necessity - Support parallel reads, writes and appends by multiple simultaneous client programs - - Open source implementation of GFS architecture - Available on Amazon EC2 cloud platform ![](media/image3.jpeg)GFS Architecture ====================================== - - Large files are broken up into **chunks** (GFS) or **blocks** (HDFS) - - Stored on commodity (Linux) servers called **Chunk servers** (GFS) or **Data nodes** (HDFS) - Replicated **three** times on different: - Physical rack - Network segment ![](media/image3.jpeg)Read Operation in GFS =========================================== - - Master replies with meta-data for ***one*** of replicas of the chunk where this data is found. - - It reads data from the designated chunk server Write/Append Operation in GFS ============================= - ![](media/image2.png)Client program sends the full path of a file to the **Master** (GFS) or **Name Node** (HDFS) - Master replies with meta-data for **all** of replicas of the chunk where this data is found. - Client send data to be appended to all chunk servers - Chunk server acknowledge the receipt of this data - Master designates one of these chunk servers as **primary** - Primary chunk server appends its copy of data into the chunk by choosing an offset - Appending can also be done beyond **EOF** to account for multiple simultaneous - Sends the offset to each replica - If all replicas do not succeed in writing at the designated offset, the primary retries Fault Tolerance in GFS ====================== - - Heartbeat messages - - Chunk server's meta-data is updated to reflect failure - For failure of primary chunk server, the master assigns a new primary - Clients occasionally will try to this failed chunk server - Update their meta-data from master and retry ![](media/image3.jpeg)BigTable ============================== - - Sparse, persistent, multi-dimensional sorted map (**key-value pairs**) - - Row key - Column key - Timestamp ![](media/image3.jpeg)**BigTable** Contd... - Each column can store arbitrary **name-value** pairs in the form: ***column- family : label*** - - Labels within a column family can be created dynamically and at any time - ![](media/image32.png)BigTable Storage ====================================== - ![](media/image2.png)Each table is split into different row ranges, called **tablets** - Each tablet is managed by a **tablet server:** - Stores each column family for a given row range in a separate distributed file, called **SSTable** - A single meta-data table is managed by a **Meta-data server** - Locates the tablets of any user table in response to a read/write request - The meta-data itself can be very large: - Meta-data table can be similarly split into multiple tablets - A **root tablet** points to other meta-data tablets - Supports large parallel reads and inserts even simultaneously on the same table - Insertions done in sorted fashion, and requires more work can simple append - Developed by Amazon Dynamo ====== - - - Simple \ pair - Well-suited for Web-based e-commerce applications - Not dependent on any underlying distributed file system (for e.g. GFS/HDFS) for: - Failure handling - Data replication - Forwarding write requests to other replicas if the intended one is down - Conflict resolution ![](media/image34.png)![](media/image2.png)Dynamo Architecture ============================================================== 9/3/2017 23 - Objects: \ pairs with arbitrary arrays of bytes - MD5: generates a 128-bit hash value - Range of this hash function is mapped to a **set of virtual nodes** arranged in a ring - Each key gets mapped to one virtual node - The object is replicated at a **primary** virtual node as well as (N -- 1) additional virtual nodes - N: number of physical nodes - Each physical node (server) manages a number of virtual nodes at distributed - ![](media/image2.png)Load balancing for: - Transient failures - Network partition - - Executed at one of its virtual nodes - Forwards the request to **all** nodes which have the replicas of the object - **Quorum protocol**: maintains eventual consistency of the replicas when a large - - Write creates a new version of an object with its local timestamp incremented - Timestamp: - Captures history of updates - Versions that are superseded by later versions (having larger vector timestamp) are discarded - If multiple write operations on same object occurs at the same time, all versions will be - If conflict occurs: - **Quorum consistent**: - - Write operation access **W** replicas - If **(R + W) \> N** : system is said to be **quorum consistent** - - For efficient write: larger number of replicas to be read - For efficient read: larger number of replicas to be written into - Dynamo: - - Google and Amazon offer simple transactional \ pair database stores - ![](media/image2.png)Google App Engine's Datastore - Amazon' SimpleDB - All entities (objects) in Datastore reside in one BigTable table - Does not exploit column-oriented storage - **Entities table:** store data as one column family - Multiple index tables are used to support efficient queries - BigTable: - Horizontally partitioned (also called **sharded**) across disks - Sorted lexicographically by the key values - Beside lexicographic sorting Datastore enables: - Efficient execution of **prefix** and **range** queries on key values - Entities are 'grouped' for transaction purpose - Keys are lexicographic by group ancestry - Entities in the same group: stored close together on disk - Index tables: support a variety of queries - Uses values of entity attributes as keys - - Single-Property indexes - Supports efficient lookup of the records with **WHERE** clause - 'Kind' indexes - Supports efficient lookup of queries of form **SELECT ALL** - - Composite index: - Retrieves more complex queries - - Indexes with highest selectivity is chosen ![](media/image3.jpeg) 31 ![](media/image1.jpeg) 1 Introduction ============ - ![](media/image2.png)MapReduce: programming model developed at Google - Objective: - Implement large scale search - Text processing on massively scalable web data stored using BigTable and GFS distributed file system - Designed for processing and generating large volumes of data via massively parallel computations, utilizing tens of thousands of processors at a time - Fault tolerant: ensure progress of computation even if processors and networks fail - Example: - Hadoop: open source implementation of MapReduce (developed at Yahoo!) - Available on pre-packaged AMIs on Amazon EC2 cloud platform 9/3/2017 2 Parallel Computing ================== - ![](media/image2.png)Different models of parallel computing - Nature and evolution of multiprocessor computer architecture - Shared-memory model - Assumes that any processor can access any memory location - Unequal latency - Distributed-memory model - Each processor can access only its own memory and communicates with other processors using message passing - Parallel computing: - Developed for compute intensive scientific tasks - Later found application in the database arena - Shared-memory - Shared-disk - Shared-nothing Parallel Database Architectures =============================== ![](media/image3.jpeg)Parallel Database Architectures Contd... ============================================================== - Shared memory - Suitable for servers with multiple CPUs - Memory address space is shared and managed by a symmetric multi-processing (SMP) operating system - SMP: - Schedules processes in parallel exploiting all the processors - Shared nothing - Cluster of independent servers each with its own disk space - Connected by a network - Shared disk - Hybrid architecture - Independent server clusters share storage through high-speed network storage viz. NAS (network attached storage) or SAN (storage area network) - Clusters are connected to storage via: standard Ethernet, or faster Fiber Channel or Infiniband connections ![](media/image3.jpeg)Parallel Efficiency ========================================= - If a task takes time *T* in uniprocessor system, it should take *T/p* if executed on *p* - Inefficiencies introduced in distributed computation due to: - Need for synchronization among processors - Overheads of message communication between processors - Imbalance in the distribution of work to processors - ![](media/image40.png)*Parallel efficiency* of an algorithm is defined as: - parallel efficiency remains constant as the size of data is increased along with a corresponding increase in processors - parallel efficiency increases with the size of data for a fixed number of processors Illustration ============ - ![](media/image2.png)**Problem**: Consider a very large collection of documents, say web pages crawled from the entire Internet. The problem is to determine the frequency (i.e., total number of occurrences) of each word in this collection. Thus, if there are *n* documents and *m* distinct words, we wish to determine *m* frequencies, one for each word. - Two approaches: - Let each processor compute the frequencies for *m/p* words - Let each processor compute the frequencies of *m* words across *n/p* documents, followed by all the - Parallel computing is implemented as a distributed-memory model with a shared disk, so that each processor is able to access any document from disk in parallel with no contention - Time to read each word from the document = Time to send the word to - Time to add to a running total of frequencies -\> negligible - Each word occurs *f* times in a document (on average) - Time for computing all *m* frequencies with a single processor = *n × m × f × c* - First approach: - ![](media/image41.png)Each processor reads at most *n × m/p × f* times - Parallel efficiency is calculated as: - Efficiency falls with increasing *p* - *Not scalable* - - ![](media/image2.png)Number of reads performed by each processor = *n/p* × *m* × *f* - Time taken to read = *n/p* × *m* × *f* × *c* - Time taken to write partial frequencies of m-words in parallel to disk = *c × m* - Time taken to communicate partial frequencies to *(p - 1)* processors and then locally adding *p* sub-vectors to generate *1/p* of final m-vector of frequencies = *p × (m/p) × c* - Parallel efficiency is computed as: - - In fist approach, each processor is reading many words that it need not read, resulting in wasted work - - - Efficiency remains constant as both *n* and *p* increases proportionally - Efficiency tends to 1 for fixed *p* and gradually increased *n* ![](media/image3.jpeg)MapReduce Model ===================================== - Parallel programming abstraction - Used by many different parallel applications which carry out large-scale computation involving thousands of processors - Leverages a common underlying fault-tolerant implementation - Two phases of MapReduce: - Map operation - Reduce operation - A configurable number of M 'mapper' processors and R 'reducer' processors are assigned to work on the problem - Computation is coordinated by a single master process - - ![](media/image3.jpeg)Each mapper reads approximately *1/M* of the input from the global file system, using locations given by the master - Map operation consists of transforming one set of key-value pairs to - Each mapper writes computation results in one file per reducer - Files are sorted by a key and stored to the local file system - The master keeps track of the location of these files - **Reduce phase:** MapReduce Model =============== - ![](media/image3.jpeg)The master informs the reducers where the partial computations have been stored on local files of respective mappers - Reducers make remote procedure call requests to the mappers to fetch the files - Each reducer groups the results of the map step using the same key and performs a function *f* on the list of values that correspond to these key value: - ![](media/image3.jpeg)MapReduce: Example ======================================== - - ![](media/image46.png)Map function: - Reduce function: 9/3/2017 14 ![](media/image3.jpeg)MapReduce: Fault Tolerance ================================================ - - Updates are exchanged regarding the status of tasks assigned to workers - Communication exists, but no progress: master duplicate those tasks and assigns to processors who have already completed - - Re-execution is required as the partial computations are written into local files, - MapReduce: Efficiency ===================== - - Takes *wD* time on a uniprocessor (time to read data from disk + performing computation + time to write back to disk) - - Now, the computational task is decomposed into map and reduce stages - Map stage: - Mapping time = **cmD** - Data produced as output **= σD** - Reduce stage: - Reducing time **= crσD** - Data produced as output **= σµD** - ![](media/image2.png)Considering no overheads in decomposing a task into a map and a reduce stages, we have - Now, we use *P* processors that serve as both mapper and reducers in respective phases to - Additional overhead: - Each mapper writes to its local disk followed by each reducer remotely reading from the local disk of each mapper - For analysis purpose: time to read a word locally or remotely is same - Time to read data from disk by each mapper = - Data produced by each mapper **=** - - - - 𝑷𝟐 𝑷 - - 𝗌𝑴𝑹 MapReduce: Applications ======================= - - Important aspect in web search as well as handling structured data - The map task consists of emitting a word-document/record-id pair for - The reduce step groups the pairs by word and creates an index entry for ![](media/image3.jpeg) - - Execute SQL statements (relational joins/group by) on large data sets - Advantages over parallel database - Large scale - Fault-tolerance ![](media/image2.png)20 OPENSTACK: #### ![](media/image49.jpeg)What is OpenStack? 2 #### ![](media/image49.jpeg)Job Trend for Openstack 3 #### ![](media/image51.jpeg)OpenStack Capability - - Browser or Thin Client access - - On top of IaaS e.g. Cloud Foundry - - Provision Compute, Network, Storage 4 #### OpenStack Capability - - Provisioning - Snapshotting - - Storage for VMs and arbitrary files - - Quotas for different project, users - User can be associated with multiple projects 5 #### ![](media/image52.png)OpenStack History 6 ##### ![](media/image49.jpeg)OpenStack Major Components - - Project - Nova 7 #### ![](media/image49.jpeg)OpenStack Major Components - - Project - Neutron - Enables *Network-Connectivity-as-a-Service* for other OpenStack - Provides an API for users to define networks and the attachments - Has a pluggable architecture that supports many popular networking vendors and technologies. 8 #### ![](media/image2.png)OpenStack Major Components - - Project - Swift - Stores and retrieves arbitrary unstructured data objects via a RESTFul, HTTP based API. - It is highly fault tolerant with its data replication and scale-out architecture. Its - In this case, it writes objects and files to multiple drives, ensuring the data is replicated across a server cluster. 9 #### OpenStack Major Components - - Project- Cinder - Provides persistent block storage to running instances. - Its pluggable driver architecture facilitates the creation and management of block storage devices. 10 #### ![](media/image49.jpeg)OpenStack Major Components - - Project - Keystone - Provides an authentication and authorization service for other OpenStack services. - Provides a catalog of endpoints for all OpenStack services. 11 #### OpenStack Major Components - - Project - Glance - Stores and retrieves virtual machine disk images. - OpenStack Compute makes use of this during instance provisioning. 12 #### ![](media/image49.jpeg)OpenStack Major Components - - Project - Ceilometer - Monitors and meters the OpenStack cloud for billing, benchmarking, scalability, and statistical purposes. 13 #### OpenStack Major Components - - Project - Horizon - Provides a web-based self-service portal to interact with underlying OpenStack services, such as launching an instance, assigning IP addresses and configuring access controls. 14 #### ![](media/image53.jpeg)![](media/image2.png)Architecture of Openstack 15 #### ![](media/image54.jpeg)Openstack Work Flow 16 #### ![](media/image55.jpeg)![](media/image2.png)Auth Token Usage 17 #### Provisioning Flow - ![](media/image2.png)Nova API makes rpc.cast to Scheduler. It publishes a short message to scheduler queue with VM - Scheduler picks up the message from MQ. - Scheduler fetches information about the whole cluster from database, filters, selects compute node and updates DB with its ID - Scheduler publishes message to the compute queue (based on host ID) to trigger VM provisioning - Nova Compute gets message from MQ - Nova Compute makes rpc.call to Nova Conductor for information on VM from DB - Nova Compute makes a call to Neutron API to provision network for the instance - Neutron configures IP, gateway, DNS name, L2 connectivity etc. - It is assumed a volume is already created. Nova Compute contacts Cinder to get volume data. Can also attach volumes after VM is built. 18 #### ![](media/image2.png)Nova Compute Driver 19 #### ![](media/image49.jpeg)Nova scheduler filtering ![](media/image57.jpeg) 20 #### ![](media/image58.jpeg)Neutron Architecture 21 #### ![](media/image49.jpeg)![](media/image2.png)Glance Architecture 22 #### ![](media/image2.png)Cinder Architecture 23 #### ![](media/image49.jpeg)![](media/image2.png)Keystone Architecture 24 #### OpenStack Storage Concepts - ![](media/image2.png)**Ephemeral storage:** - Persists until VM is terminated - Accessible from within VM as local file system - Used to run operating system and/or scratch space - Managed by Nova - **Block storage**: - Persists until specifically deleted by user - Accessible from within VM as a block device (e.g. /dev/vdc) - Used to add additional persistent storage to VM and/or run operating system - Managed by Cinder - **Object storage:** - Persists until specifically deleted by user - Accessible from anywhere - Used to add store files, including VM images - Managed by Swift 25 #### Summary - ![](media/image2.png)Users log into Horizon and initiates VM creation - Keystone authorizes - Nova initiates provisioning and saves state to DB - Nova Scheduler finds appropriate host - Neutron configures networking - Cinder provides block device - Image URI is looked up through Glance - Image is retrieved via Swift - VM is rendered by Hypervisor 26 27