Podcast
Questions and Answers
What technique is used to save storage space by reducing duplicate data in VM image storage?
What technique is used to save storage space by reducing duplicate data in VM image storage?
What is the primary function of the virtualization layer in a virtualized system?
What is the primary function of the virtualization layer in a virtualized system?
Which of the following is NOT an objective of advanced IaaS management as described?
Which of the following is NOT an objective of advanced IaaS management as described?
What does VM migration allow in terms of resource management?
What does VM migration allow in terms of resource management?
Signup and view all the answers
Which part of the course outline focuses on techniques to reduce storage consumption?
Which part of the course outline focuses on techniques to reduce storage consumption?
Signup and view all the answers
What is the primary benefit of deduplicating RAM in co-hosted VMs?
What is the primary benefit of deduplicating RAM in co-hosted VMs?
Signup and view all the answers
What is the main purpose of deduplication at the main memory level?
What is the main purpose of deduplication at the main memory level?
Signup and view all the answers
Which concept is highlighted in providing economies of scale within IaaS management?
Which concept is highlighted in providing economies of scale within IaaS management?
Signup and view all the answers
Which storage system primarily has a 'Only W' operation mode and does not handle mutable data?
Which storage system primarily has a 'Only W' operation mode and does not handle mutable data?
Signup and view all the answers
What type of data is typically deduplicated in a backup storage system?
What type of data is typically deduplicated in a backup storage system?
Signup and view all the answers
What challenge does virtualization attempt to address regarding non-virtualized systems?
What challenge does virtualization attempt to address regarding non-virtualized systems?
Signup and view all the answers
Which of the following describes the purpose of memory-mapped files in the deduplication example?
Which of the following describes the purpose of memory-mapped files in the deduplication example?
Signup and view all the answers
In which scenario would deduplication in a file system be advantageous?
In which scenario would deduplication in a file system be advantageous?
Signup and view all the answers
What aspect of modern processors aids in memory deduplication?
What aspect of modern processors aids in memory deduplication?
Signup and view all the answers
What is a common issue raised when multiple VMs use the same OS distribution?
What is a common issue raised when multiple VMs use the same OS distribution?
Signup and view all the answers
Which type of storage system is most associated with high throughput but primarily operates in read mode?
Which type of storage system is most associated with high throughput but primarily operates in read mode?
Signup and view all the answers
What is a key advantage of VM migration compared to process migration?
What is a key advantage of VM migration compared to process migration?
Signup and view all the answers
Which one of the following is NOT a reason for performing VM migration?
Which one of the following is NOT a reason for performing VM migration?
Signup and view all the answers
What occurs after the guest OS is halted during VM migration?
What occurs after the guest OS is halted during VM migration?
Signup and view all the answers
What must be reconstructed by the hypervisor on the destination machine?
What must be reconstructed by the hypervisor on the destination machine?
Signup and view all the answers
What significant data is migrated along with a virtual machine?
What significant data is migrated along with a virtual machine?
Signup and view all the answers
How often did Google reportedly perform VM migrations in 2018?
How often did Google reportedly perform VM migrations in 2018?
Signup and view all the answers
What is the purpose of the small paravirtualization layer during VM migration?
What is the purpose of the small paravirtualization layer during VM migration?
Signup and view all the answers
What is the total downtime observed during the migration of the Apache Web server VM?
What is the total downtime observed during the migration of the Apache Web server VM?
Signup and view all the answers
What is one of the main challenges faced during process migration?
What is one of the main challenges faced during process migration?
Signup and view all the answers
Which of these metrics indicates a significant performance drop during the first precopy of the web server VM?
Which of these metrics indicates a significant performance drop during the first precopy of the web server VM?
Signup and view all the answers
Which statement about VM migration is incorrect?
Which statement about VM migration is incorrect?
Signup and view all the answers
How much memory does the VM being migrated use?
How much memory does the VM being migrated use?
Signup and view all the answers
What is the primary goal of deduplication in cloud storage systems?
What is the primary goal of deduplication in cloud storage systems?
Signup and view all the answers
What does VM migration allow for in a cloud computing context?
What does VM migration allow for in a cloud computing context?
Signup and view all the answers
What aspect makes VM migration simpler compared to process migration?
What aspect makes VM migration simpler compared to process migration?
Signup and view all the answers
What was the static page size being served by the Apache Web server VM during the test?
What was the static page size being served by the Apache Web server VM during the test?
Signup and view all the answers
What does inter-VM caching aim to achieve in cloud computing?
What does inter-VM caching aim to achieve in cloud computing?
Signup and view all the answers
What was the throughput after further iterations of migration of the web server VM?
What was the throughput after further iterations of migration of the web server VM?
Signup and view all the answers
Which technology allows for very fast access between VM memories?
Which technology allows for very fast access between VM memories?
Signup and view all the answers
In what way does the Puma approach benefit virtual machines?
In what way does the Puma approach benefit virtual machines?
Signup and view all the answers
What is a likely consequence of having many copies of the same data in cloud platforms?
What is a likely consequence of having many copies of the same data in cloud platforms?
Signup and view all the answers
How does the deduplication process work?
How does the deduplication process work?
Signup and view all the answers
What type of network interface card (NIC) is required for the RDMA technology?
What type of network interface card (NIC) is required for the RDMA technology?
Signup and view all the answers
What common benefit does deduplication provide to cloud applications?
What common benefit does deduplication provide to cloud applications?
Signup and view all the answers
What is the main purpose of structuring a partition into several groups?
What is the main purpose of structuring a partition into several groups?
Signup and view all the answers
What role does the super-block play in a partition structure?
What role does the super-block play in a partition structure?
Signup and view all the answers
Why is it necessary to detect when a block is not referenced?
Why is it necessary to detect when a block is not referenced?
Signup and view all the answers
How does deduplication of data blocks impact storage?
How does deduplication of data blocks impact storage?
Signup and view all the answers
What does the inode and block bitmap allow in a partition structure?
What does the inode and block bitmap allow in a partition structure?
Signup and view all the answers
What challenge arises when handling modifications to a block referenced by several files?
What challenge arises when handling modifications to a block referenced by several files?
Signup and view all the answers
What is the significance of maximizing locality between metadata and data?
What is the significance of maximizing locality between metadata and data?
Signup and view all the answers
What is a key consideration in avoiding storing a block twice within a group?
What is a key consideration in avoiding storing a block twice within a group?
Signup and view all the answers
Study Notes
Cloud Computing - Lesson 8: LaaS Management
- Course covers advanced LaaS management techniques
- Describes economies of scale in virtualized environments
- Explains VM migration for decoupled OS and physical resource management
Announcements
- Quizzes on lectures 5 and 6 are closed
- Peer review available until Wednesday, November 20th (before class)
- Another quiz on lectures 7 and 8 (current and next lecture) will be available next week
- Final quiz on "Big Data" lectures
- Donatien will give lecture 10 on Big Data processing/stream processing (his PhD topic)
Outline
-
Part A: Saving Storage Space
- VM memory sharing
- Deduplication at memory level
- Deduplication for VM image storage
-
Part B: Dynamic LaaS Management
- VM migration across hosts
Recap: Virtualization
- Virtualization layer (hypervisor) manages physical resource sharing between VMs
- Each host OS sees resources allocated to it
Recap: Virtualization Modes
- 1st Generation: Full virtualization by binary rewriting (software-based)
- 2nd Generation: Paravirtualization (software-based collaborative virtualization) requiring modified guest OS
- 3rd Generation: Hardware-assisted full virtualization (software+hardware-based)
VM Memory Sharing
- Hypervisor assigns fixed memory to each VM
- Predictable and simple but estimating needs can be difficult
- Ballooning: dynamic memory allocation, changing memory frames allocated to VMs
- Requires paravirtualization with a specific driver on each VM OS kernel
- Releasing memory requires access to process memory context (page tables) and semantics.
Other Approaches
- Puma: Inter-VM memory sharing that allows host OS to borrow/lend free memory (not necessarily on same physical host)
- RDMA: Remote Direct Memory Access, allowing direct (fast) access between VM memories, but requires specific NICs
Data Duplication
- Cloud platforms handle large data amounts due to multiple companies, applications, and users.
- Many data copies across clouds and even within a single cloud frequently coexist.
- Duplication wastes storage space.
- Deduplication automatically removes duplicate data in a storage system.
- General principle: detect common data, maintain a single copy and pointers, for all applications and systems.
- Includes deduplication ratio (before ÷ after)
Deduplication Targets
- RAM: minimal overhead, reads/writes (R/W)
- File System: low delays, high throughput (R/W)
- VM storage: low delays, high throughput (R/W, but rare)
- Backups: high throughput (R/Mostly W, but rare)
- Archives: high throughput (Only W)
Example 1: Deduplicating RAM
- Several VMs on the same physical host with the same OS type (e.g., GNU/Linux, base image from LaaS provider).
- Shared libraries and executable files, and in some cases data, also exist.
- In Docker, common filesystem (FS) layers are reused but the hypervisor has no knowledge of the FS used by VMs.
- Memory-mapped files occupy integer numbers of pages or same downloaded data by VMs.
- Pages for same data exist multiple times in VM memory space; store once and use deduplication.
VMWare ESX
- Third-generation "bare-metal" hypervisor
- Paravirtualization and hardware assisted memory management
- Host OS manages its own physical memory (virtual during operation)
- Hypervisor maps guest "virtual physical memory" to actual physical memory (page table levels).
Deduplication in VMWare ESX: Transparent Page Sharing (TPS)
- Offline deduplication uses lazy manner for duplicate scans.
- Scans physical memory. Pages have content checked.
- Hash function used on pages' content for faster checking/searching in a hash table.
- When hash values match, pages have same content.
- 2nd-level table updated when needed.
- Transparent to guest OS by using copy-on-write on shared pages.
Hash Function
- Function over the content of the block, creates a value in a bounded hash space (e.g.,128 bits).
- Values should have uniform distribution and non-cryptographic (e.g., not MD5) for checking if two pages are the same.
Copy-on-Write
- Before VM 1 modifies page C: page A, page B, page C in VM1 and same in VM2.
- After VM 1 modifies page C: page A, page B, copy of page C, in VM1 and page A, page B, page C in VM2.
Effectiveness of TPS in ESX
- Real-world page sharing in production deployments.
- Datasets of total, shared, and reclaimed memory (MB) across VMs
Performance Impact (Pshare == TPS)
- Shows performance impact results from testing on different types of workloads using PS/TPS against a baseline.
Example 2: Deduplicating a File System for VM Image Storage
- VM images are stored in Glance (OpenStack) or OpenNebula's image service.
- Number of VM Images is often high with multiple base VM images, different OS versions, and many images per user.
- VM image size is often multiple Gigabytes in size.
Potential for Deduplication in VM Images
- Study data suggests significant potential for deduplication in VM images with similar OS types/distributions (same OS like Fedora 9 but different installations).
Case Study: LiveDFS
- Adapts a legacy file system (ext3) to support deduplication.
- Uses inline deduplication to detect duplicates during write operations.
- Adapted to commodity hardware without special hardware requirements.
- Integrated into OpenStack.
LiveDFS Design Goals
- Ensure performance of VM operations
- Avoid significant impact on VM startup performance.
- Compatibility with existing tools (POSIX file system interface) for deletion operations.
- Target "low-cost" commodity hardware (memory is limited).
LiveDFS Implementation
- Extends an existing filesystem (ext3) as a kernel module.
- Minimizes modifications to original filesystem interface.
- Deduplication at fixed-size block level.
Structure of an ext3 inode
- Explains the components of an ext3 inode, its structure, function and data storage.
ext3 File System Structure
- Describes the organization of partitions by groups with metadata and block IDs for maximized locality and decreased disk seeks.
Goal: Deduplicate Data Blocks
- Describes the structure of partitions as split in groups to maximize locality and reduce disk seeks by storing metadata and data close to each other.
How To...
- Check if a block already exists in the group and avoids storing it twice.
- Manage modifications to a block referenced by several files after deduplication.
- Detect when blocks are not referenced, freeing disk space.
Detect that a Block Already Exists
- Associates 16-byte MD5 hash fingerprints with each block.
- Maintains a listing of fingerprints.
- When a new block is written, checks if the fingerprint already exists in the store.
- Checks content bit by bit if fingerprints match.
- Stores references to existing blocks and manages fingerprint list updates.
Maintaining and Accessing the List of Fingerprints
- Single 4TB disk with 4KB blocks and 16-byte fingerprints.
- 2^30 blocks, requiring 2^34 bytes or 16GB for fingerprints.
- Storing fingerprints on disk to avoid memory limitations.
- Uses a two-step approach (FP filter/store) to handle write operations, allowing disk access for checking, without memory overload.
LiveDFS Block Writing Procedure
- Presents the flow of deduplication for block writing operations (Memory Access, Verdict, Consequence).
- Identifies "NEW BLOCK" or "EXISTING BLOCK" for deciding how to write to disk.
Fingerprint Store (On Disk)
- Allocates disk blocks to store fingerprints.
- Indexed by block number.
- Does not allow direct block lookups by fingerprints; the in-memory fingerprint filter performs the lookup.
- Each entry includes a reference count.
- The counter increases when blocks are referenced.
- The block is removed if the counter drops to 0.
Fingerprint Filter (In Memory)
- Hash table.
- Key = first n + k bits of fingerprint; value = bucket.
- Holds multiple values per key if possible.
- Efficient lookup and possible list handling for fingerprint values.
- Stores updated entries from the block (when it is written) with new information.
- Check FP store first for a possible existing block or a match; otherwise process normal write action if no match.
Generation and Size of FP Filter
- Re-generating FP filter in RAM during file system mounting.
- Authors report ~6 minutes/TB of data, indicating faster processing with SSD drives.
- Size of the filter depends on the values of n and k, influencing false positives and RAM usage.
- Size remains relatively small regardless of disk size (it remains the same), thus avoids memory overload, but must be multiplied by number of disks.
Handling Modifications to Deduplicated Blocks
- If a block is modified, the change must not be visible for other files referencing that same original block.
- Uses copy-on-write principle if reference count ≥ 2.
- Copies modified blocks to a new location, updates the inode, decrements the reference counter and then applies the write to the new block.
- This can involve fragmentation and increased disk seeks.
Prefetching and Journaling
- Stream-based block writing for VMs.
- High probability of one block accessing the FP store will access subsequently accessed pages.
- Prefetching mechanism pre-reads more FP entries.
- Journaling for a series of write updates to the disk to recover a stable state.
- Updates for FP store are also integrated into the journal.
Integration in OpenStack
- LiveDFS implementation as a kernel-space module.
- Compliant with POSIX and operates transparently at the VFS level.
- Mounts VM image storage partition using LiveDFS.
Performance Results
- Testing on a single low-end server (Intel Core i5 760, 8GB RAM, 1TB HDD).
- Comparison with an unmodified ext3 file system.
- Testing with different variants (spatial locality, prefetching, journaling).
Sequential Writes
- Performance results from testing sequential write requests using multiple variants of LiveDFS compared to the Ext3FS control group.
Sequential Reads
- Performance results from testing sequential read requests using multiple variants of LiveDFS compared to the Ext3FS control group.
Sequential Duplicate Writes
- Writing a file with the same content as an existing one already on disk.
- Cost is writing only entries to the FP store (small).
- No journaling implies disk seeks are possible when blocks aren't in order (sequential).
OpenStack Integration Evaluation
- Evaluation of space saving goals.
- Impact on VM startup/saving times.
- Authors use a collection of VM images for evaluation of the OpenStack integration.
Space Usage
- VM images store data that consumes lots of memory and storage.
- Deduplication reduces the amount of data stored.
- Shows comparison between LiveDFS and EXT3FS with zero blocks.
Store/Startup Time
- Time to write a VM image, and time to start a VM.
- Highlights performance improvements from LiveDFS across various workloads/scenarios, including sequential writes and reads.
Migrating a Complex Web Server (SPECweb99)
- Benchmark involving intensive disk/network operations with many concurrent connections.
Migrating an Interactive Application (Quake 3)
- Testing with 6 concurrent players to simulate an interactive scenario.
Quake 3 Server VM Migration: Impact on Clients
- Evaluates the impact of the migration operations on the latency of the client packets showing reduction with LiveDFS.
Paravirtualized Optimizations
- OS-level driver improvements for better performance results.
- Free page cache pages optimization for VM use.
- Return pages to Xen hypervisor and other VMs to decrease size of first-copy iteration.
- Kernel monitoring to stop processes with excessive memory use.
- Write Working Set monitoring for efficient process management.
Conclusion
- Overall, both deduplication and VM migration mechanisms successfully optimize LaaS utilization, improving storage/resource use and reducing costs.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz explores advanced techniques in LaaS management, focusing on economies of scale in virtualized environments. Students will learn about VM migration and resource management strategies for decoupled operating systems. Prepare to test your knowledge on the intricacies of virtualization and dynamic management strategies!