Cloud Computing - LaaS Management Techniques
48 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What technique is used to save storage space by reducing duplicate data in VM image storage?

  • Compression
  • Encryption
  • Fragmentation
  • Deduplication (correct)
  • What is the primary function of the virtualization layer in a virtualized system?

  • To monitor the performance of applications
  • To allocate resources to physical machines
  • To orchestrate sharing of physical resources between VMs (correct)
  • To manage network traffic
  • Which of the following is NOT an objective of advanced IaaS management as described?

  • Demonstrate economies of scale in virtualized environments
  • Present techniques for saving storage space
  • Explain fully integrated management of software and hardware (correct)
  • Show how VM migration operates across hosts
  • What does VM migration allow in terms of resource management?

    <p>Decoupled management of OS and physical resources</p> Signup and view all the answers

    Which part of the course outline focuses on techniques to reduce storage consumption?

    <p>Part A</p> Signup and view all the answers

    What is the primary benefit of deduplicating RAM in co-hosted VMs?

    <p>Reduction of memory overhead</p> Signup and view all the answers

    What is the main purpose of deduplication at the main memory level?

    <p>To reduce the amount of duplicate data stored in memory</p> Signup and view all the answers

    Which concept is highlighted in providing economies of scale within IaaS management?

    <p>Resource multiplexing</p> Signup and view all the answers

    Which storage system primarily has a 'Only W' operation mode and does not handle mutable data?

    <p>Archives</p> Signup and view all the answers

    What type of data is typically deduplicated in a backup storage system?

    <p>Mostly mutable data</p> Signup and view all the answers

    What challenge does virtualization attempt to address regarding non-virtualized systems?

    <p>Underutilization of physical resources</p> Signup and view all the answers

    Which of the following describes the purpose of memory-mapped files in the deduplication example?

    <p>To allow multiple VMs to access the same data efficiently</p> Signup and view all the answers

    In which scenario would deduplication in a file system be advantageous?

    <p>When the same data is written multiple times</p> Signup and view all the answers

    What aspect of modern processors aids in memory deduplication?

    <p>Integrated virtual memory support</p> Signup and view all the answers

    What is a common issue raised when multiple VMs use the same OS distribution?

    <p>Higher memory occupancy</p> Signup and view all the answers

    Which type of storage system is most associated with high throughput but primarily operates in read mode?

    <p>Archives</p> Signup and view all the answers

    What is a key advantage of VM migration compared to process migration?

    <p>Hypervisor manages the migration process.</p> Signup and view all the answers

    Which one of the following is NOT a reason for performing VM migration?

    <p>Increased startup time</p> Signup and view all the answers

    What occurs after the guest OS is halted during VM migration?

    <p>The last dirtied pages are copied to the destination.</p> Signup and view all the answers

    What must be reconstructed by the hypervisor on the destination machine?

    <p>Guest physical to host physical page table</p> Signup and view all the answers

    What significant data is migrated along with a virtual machine?

    <p>Kernel data structures and active network connections</p> Signup and view all the answers

    How often did Google reportedly perform VM migrations in 2018?

    <p>More than 1,000,000 times a month</p> Signup and view all the answers

    What is the purpose of the small paravirtualization layer during VM migration?

    <p>To handle the restart of the VM.</p> Signup and view all the answers

    What is the total downtime observed during the migration of the Apache Web server VM?

    <p>165 milliseconds</p> Signup and view all the answers

    What is one of the main challenges faced during process migration?

    <p>Adapting to shared resources complexity</p> Signup and view all the answers

    Which of these metrics indicates a significant performance drop during the first precopy of the web server VM?

    <p>870 Mbit/sec</p> Signup and view all the answers

    Which statement about VM migration is incorrect?

    <p>Kernel and resources are not migrated during VM migration.</p> Signup and view all the answers

    How much memory does the VM being migrated use?

    <p>800 MB</p> Signup and view all the answers

    What is the primary goal of deduplication in cloud storage systems?

    <p>To eliminate duplicate data and save storage space</p> Signup and view all the answers

    What does VM migration allow for in a cloud computing context?

    <p>Dynamic allocation of resources across VMs</p> Signup and view all the answers

    What aspect makes VM migration simpler compared to process migration?

    <p>The hypervisor has full control over virtual resources.</p> Signup and view all the answers

    What was the static page size being served by the Apache Web server VM during the test?

    <p>512 KB</p> Signup and view all the answers

    What does inter-VM caching aim to achieve in cloud computing?

    <p>Speed up I/O by caching disk pages</p> Signup and view all the answers

    What was the throughput after further iterations of migration of the web server VM?

    <p>694 Mbit/sec</p> Signup and view all the answers

    Which technology allows for very fast access between VM memories?

    <p>Remote Direct Memory Access (RDMA)</p> Signup and view all the answers

    In what way does the Puma approach benefit virtual machines?

    <p>It enables sharing of free memory even when VMs are not on the same physical host</p> Signup and view all the answers

    What is a likely consequence of having many copies of the same data in cloud platforms?

    <p>Wasted storage space</p> Signup and view all the answers

    How does the deduplication process work?

    <p>It keeps a single copy of common data while replacing others with pointers</p> Signup and view all the answers

    What type of network interface card (NIC) is required for the RDMA technology?

    <p>Specified RDMA-capable NIC</p> Signup and view all the answers

    What common benefit does deduplication provide to cloud applications?

    <p>Lower costs associated with data storage</p> Signup and view all the answers

    What is the main purpose of structuring a partition into several groups?

    <p>To enhance the locality between metadata and data.</p> Signup and view all the answers

    What role does the super-block play in a partition structure?

    <p>It contains inode and block bitmaps.</p> Signup and view all the answers

    Why is it necessary to detect when a block is not referenced?

    <p>To reclaim disk space that is no longer needed.</p> Signup and view all the answers

    How does deduplication of data blocks impact storage?

    <p>It helps in conserving disk space by avoiding duplicate data.</p> Signup and view all the answers

    What does the inode and block bitmap allow in a partition structure?

    <p>To know the status of blocks as free or used.</p> Signup and view all the answers

    What challenge arises when handling modifications to a block referenced by several files?

    <p>Ensuring modifications do not cause data loss in all files referencing it.</p> Signup and view all the answers

    What is the significance of maximizing locality between metadata and data?

    <p>It increases efficiency by reducing overheads caused by disk seeks.</p> Signup and view all the answers

    What is a key consideration in avoiding storing a block twice within a group?

    <p>Establishing a method to know if a block already exists.</p> Signup and view all the answers

    Study Notes

    Cloud Computing - Lesson 8: LaaS Management

    • Course covers advanced LaaS management techniques
    • Describes economies of scale in virtualized environments
    • Explains VM migration for decoupled OS and physical resource management

    Announcements

    • Quizzes on lectures 5 and 6 are closed
    • Peer review available until Wednesday, November 20th (before class)
    • Another quiz on lectures 7 and 8 (current and next lecture) will be available next week
    • Final quiz on "Big Data" lectures
    • Donatien will give lecture 10 on Big Data processing/stream processing (his PhD topic)

    Outline

    • Part A: Saving Storage Space
      • VM memory sharing
      • Deduplication at memory level
      • Deduplication for VM image storage
    • Part B: Dynamic LaaS Management
      • VM migration across hosts

    Recap: Virtualization

    • Virtualization layer (hypervisor) manages physical resource sharing between VMs
    • Each host OS sees resources allocated to it

    Recap: Virtualization Modes

    • 1st Generation: Full virtualization by binary rewriting (software-based)
    • 2nd Generation: Paravirtualization (software-based collaborative virtualization) requiring modified guest OS
    • 3rd Generation: Hardware-assisted full virtualization (software+hardware-based)

    VM Memory Sharing

    • Hypervisor assigns fixed memory to each VM
    • Predictable and simple but estimating needs can be difficult
    • Ballooning: dynamic memory allocation, changing memory frames allocated to VMs
    • Requires paravirtualization with a specific driver on each VM OS kernel
    • Releasing memory requires access to process memory context (page tables) and semantics.

    Other Approaches

    • Puma: Inter-VM memory sharing that allows host OS to borrow/lend free memory (not necessarily on same physical host)
    • RDMA: Remote Direct Memory Access, allowing direct (fast) access between VM memories, but requires specific NICs

    Data Duplication

    • Cloud platforms handle large data amounts due to multiple companies, applications, and users.
    • Many data copies across clouds and even within a single cloud frequently coexist.
    • Duplication wastes storage space.
    • Deduplication automatically removes duplicate data in a storage system.
    • General principle: detect common data, maintain a single copy and pointers, for all applications and systems.
    • Includes deduplication ratio (before ÷ after)

    Deduplication Targets

    • RAM: minimal overhead, reads/writes (R/W)
    • File System: low delays, high throughput (R/W)
    • VM storage: low delays, high throughput (R/W, but rare)
    • Backups: high throughput (R/Mostly W, but rare)
    • Archives: high throughput (Only W)

    Example 1: Deduplicating RAM

    • Several VMs on the same physical host with the same OS type (e.g., GNU/Linux, base image from LaaS provider).
    • Shared libraries and executable files, and in some cases data, also exist.
    • In Docker, common filesystem (FS) layers are reused but the hypervisor has no knowledge of the FS used by VMs.
    • Memory-mapped files occupy integer numbers of pages or same downloaded data by VMs.
    • Pages for same data exist multiple times in VM memory space; store once and use deduplication.

    VMWare ESX

    • Third-generation "bare-metal" hypervisor
    • Paravirtualization and hardware assisted memory management
    • Host OS manages its own physical memory (virtual during operation)
    • Hypervisor maps guest "virtual physical memory" to actual physical memory (page table levels).

    Deduplication in VMWare ESX: Transparent Page Sharing (TPS)

    • Offline deduplication uses lazy manner for duplicate scans.
    • Scans physical memory. Pages have content checked.
    • Hash function used on pages' content for faster checking/searching in a hash table.
    • When hash values match, pages have same content.
    • 2nd-level table updated when needed.
    • Transparent to guest OS by using copy-on-write on shared pages.

    Hash Function

    • Function over the content of the block, creates a value in a bounded hash space (e.g.,128 bits).
    • Values should have uniform distribution and non-cryptographic (e.g., not MD5) for checking if two pages are the same.

    Copy-on-Write

    • Before VM 1 modifies page C: page A, page B, page C in VM1 and same in VM2.
    • After VM 1 modifies page C: page A, page B, copy of page C, in VM1 and page A, page B, page C in VM2.

    Effectiveness of TPS in ESX

    • Real-world page sharing in production deployments.
    • Datasets of total, shared, and reclaimed memory (MB) across VMs

    Performance Impact (Pshare == TPS)

    • Shows performance impact results from testing on different types of workloads using PS/TPS against a baseline.

    Example 2: Deduplicating a File System for VM Image Storage

    • VM images are stored in Glance (OpenStack) or OpenNebula's image service.
    • Number of VM Images is often high with multiple base VM images, different OS versions, and many images per user.
    • VM image size is often multiple Gigabytes in size.

    Potential for Deduplication in VM Images

    • Study data suggests significant potential for deduplication in VM images with similar OS types/distributions (same OS like Fedora 9 but different installations).

    Case Study: LiveDFS

    • Adapts a legacy file system (ext3) to support deduplication.
    • Uses inline deduplication to detect duplicates during write operations.
    • Adapted to commodity hardware without special hardware requirements.
    • Integrated into OpenStack.

    LiveDFS Design Goals

    • Ensure performance of VM operations
    • Avoid significant impact on VM startup performance.
    • Compatibility with existing tools (POSIX file system interface) for deletion operations.
    • Target "low-cost" commodity hardware (memory is limited).

    LiveDFS Implementation

    • Extends an existing filesystem (ext3) as a kernel module.
    • Minimizes modifications to original filesystem interface.
    • Deduplication at fixed-size block level.

    Structure of an ext3 inode

    • Explains the components of an ext3 inode, its structure, function and data storage.

    ext3 File System Structure

    • Describes the organization of partitions by groups with metadata and block IDs for maximized locality and decreased disk seeks.

    Goal: Deduplicate Data Blocks

    • Describes the structure of partitions as split in groups to maximize locality and reduce disk seeks by storing metadata and data close to each other.

    How To...

    • Check if a block already exists in the group and avoids storing it twice.
    • Manage modifications to a block referenced by several files after deduplication.
    • Detect when blocks are not referenced, freeing disk space.

    Detect that a Block Already Exists

    • Associates 16-byte MD5 hash fingerprints with each block.
    • Maintains a listing of fingerprints.
    • When a new block is written, checks if the fingerprint already exists in the store.
    • Checks content bit by bit if fingerprints match.
    • Stores references to existing blocks and manages fingerprint list updates.

    Maintaining and Accessing the List of Fingerprints

    • Single 4TB disk with 4KB blocks and 16-byte fingerprints.
    • 2^30 blocks, requiring 2^34 bytes or 16GB for fingerprints.
    • Storing fingerprints on disk to avoid memory limitations.
    • Uses a two-step approach (FP filter/store) to handle write operations, allowing disk access for checking, without memory overload.

    LiveDFS Block Writing Procedure

    • Presents the flow of deduplication for block writing operations (Memory Access, Verdict, Consequence).
    • Identifies "NEW BLOCK" or "EXISTING BLOCK" for deciding how to write to disk.

    Fingerprint Store (On Disk)

    • Allocates disk blocks to store fingerprints.
    • Indexed by block number.
    • Does not allow direct block lookups by fingerprints; the in-memory fingerprint filter performs the lookup.
    • Each entry includes a reference count.
    • The counter increases when blocks are referenced.
    • The block is removed if the counter drops to 0.

    Fingerprint Filter (In Memory)

    • Hash table.
    • Key = first n + k bits of fingerprint; value = bucket.
    • Holds multiple values per key if possible.
    • Efficient lookup and possible list handling for fingerprint values.
    • Stores updated entries from the block (when it is written) with new information.
    • Check FP store first for a possible existing block or a match; otherwise process normal write action if no match.

    Generation and Size of FP Filter

    • Re-generating FP filter in RAM during file system mounting.
    • Authors report ~6 minutes/TB of data, indicating faster processing with SSD drives.
    • Size of the filter depends on the values of n and k, influencing false positives and RAM usage.
    • Size remains relatively small regardless of disk size (it remains the same), thus avoids memory overload, but must be multiplied by number of disks.

    Handling Modifications to Deduplicated Blocks

    • If a block is modified, the change must not be visible for other files referencing that same original block.
    • Uses copy-on-write principle if reference count ≥ 2.
    • Copies modified blocks to a new location, updates the inode, decrements the reference counter and then applies the write to the new block.
    • This can involve fragmentation and increased disk seeks.

    Prefetching and Journaling

    • Stream-based block writing for VMs.
    • High probability of one block accessing the FP store will access subsequently accessed pages.
    • Prefetching mechanism pre-reads more FP entries.
    • Journaling for a series of write updates to the disk to recover a stable state.
    • Updates for FP store are also integrated into the journal.

    Integration in OpenStack

    • LiveDFS implementation as a kernel-space module.
    • Compliant with POSIX and operates transparently at the VFS level.
    • Mounts VM image storage partition using LiveDFS.

    Performance Results

    • Testing on a single low-end server (Intel Core i5 760, 8GB RAM, 1TB HDD).
    • Comparison with an unmodified ext3 file system.
    • Testing with different variants (spatial locality, prefetching, journaling).

    Sequential Writes

    • Performance results from testing sequential write requests using multiple variants of LiveDFS compared to the Ext3FS control group.

    Sequential Reads

    • Performance results from testing sequential read requests using multiple variants of LiveDFS compared to the Ext3FS control group.

    Sequential Duplicate Writes

    • Writing a file with the same content as an existing one already on disk.
    • Cost is writing only entries to the FP store (small).
    • No journaling implies disk seeks are possible when blocks aren't in order (sequential).

    OpenStack Integration Evaluation

    • Evaluation of space saving goals.
    • Impact on VM startup/saving times.
    • Authors use a collection of VM images for evaluation of the OpenStack integration.

    Space Usage

    • VM images store data that consumes lots of memory and storage.
    • Deduplication reduces the amount of data stored.
    • Shows comparison between LiveDFS and EXT3FS with zero blocks.

    Store/Startup Time

    • Time to write a VM image, and time to start a VM.
    • Highlights performance improvements from LiveDFS across various workloads/scenarios, including sequential writes and reads.

    Migrating a Complex Web Server (SPECweb99)

    • Benchmark involving intensive disk/network operations with many concurrent connections.

    Migrating an Interactive Application (Quake 3)

    • Testing with 6 concurrent players to simulate an interactive scenario.

    Quake 3 Server VM Migration: Impact on Clients

    • Evaluates the impact of the migration operations on the latency of the client packets showing reduction with LiveDFS.

    Paravirtualized Optimizations

    • OS-level driver improvements for better performance results.
    • Free page cache pages optimization for VM use.
    • Return pages to Xen hypervisor and other VMs to decrease size of first-copy iteration.
    • Kernel monitoring to stop processes with excessive memory use.
    • Write Working Set monitoring for efficient process management.

    Conclusion

    • Overall, both deduplication and VM migration mechanisms successfully optimize LaaS utilization, improving storage/resource use and reducing costs.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    This quiz explores advanced techniques in LaaS management, focusing on economies of scale in virtualized environments. Students will learn about VM migration and resource management strategies for decoupled operating systems. Prepare to test your knowledge on the intricacies of virtualization and dynamic management strategies!

    Use Quizgecko on...
    Browser
    Browser