Podcast
Questions and Answers
What technique is used to save storage space by reducing duplicate data in VM image storage?
What technique is used to save storage space by reducing duplicate data in VM image storage?
- Compression
- Encryption
- Fragmentation
- Deduplication (correct)
What is the primary function of the virtualization layer in a virtualized system?
What is the primary function of the virtualization layer in a virtualized system?
- To monitor the performance of applications
- To allocate resources to physical machines
- To orchestrate sharing of physical resources between VMs (correct)
- To manage network traffic
Which of the following is NOT an objective of advanced IaaS management as described?
Which of the following is NOT an objective of advanced IaaS management as described?
- Demonstrate economies of scale in virtualized environments
- Present techniques for saving storage space
- Explain fully integrated management of software and hardware (correct)
- Show how VM migration operates across hosts
What does VM migration allow in terms of resource management?
What does VM migration allow in terms of resource management?
Which part of the course outline focuses on techniques to reduce storage consumption?
Which part of the course outline focuses on techniques to reduce storage consumption?
What is the primary benefit of deduplicating RAM in co-hosted VMs?
What is the primary benefit of deduplicating RAM in co-hosted VMs?
What is the main purpose of deduplication at the main memory level?
What is the main purpose of deduplication at the main memory level?
Which concept is highlighted in providing economies of scale within IaaS management?
Which concept is highlighted in providing economies of scale within IaaS management?
Which storage system primarily has a 'Only W' operation mode and does not handle mutable data?
Which storage system primarily has a 'Only W' operation mode and does not handle mutable data?
What type of data is typically deduplicated in a backup storage system?
What type of data is typically deduplicated in a backup storage system?
What challenge does virtualization attempt to address regarding non-virtualized systems?
What challenge does virtualization attempt to address regarding non-virtualized systems?
Which of the following describes the purpose of memory-mapped files in the deduplication example?
Which of the following describes the purpose of memory-mapped files in the deduplication example?
In which scenario would deduplication in a file system be advantageous?
In which scenario would deduplication in a file system be advantageous?
What aspect of modern processors aids in memory deduplication?
What aspect of modern processors aids in memory deduplication?
What is a common issue raised when multiple VMs use the same OS distribution?
What is a common issue raised when multiple VMs use the same OS distribution?
Which type of storage system is most associated with high throughput but primarily operates in read mode?
Which type of storage system is most associated with high throughput but primarily operates in read mode?
What is a key advantage of VM migration compared to process migration?
What is a key advantage of VM migration compared to process migration?
Which one of the following is NOT a reason for performing VM migration?
Which one of the following is NOT a reason for performing VM migration?
What occurs after the guest OS is halted during VM migration?
What occurs after the guest OS is halted during VM migration?
What must be reconstructed by the hypervisor on the destination machine?
What must be reconstructed by the hypervisor on the destination machine?
What significant data is migrated along with a virtual machine?
What significant data is migrated along with a virtual machine?
How often did Google reportedly perform VM migrations in 2018?
How often did Google reportedly perform VM migrations in 2018?
What is the purpose of the small paravirtualization layer during VM migration?
What is the purpose of the small paravirtualization layer during VM migration?
What is the total downtime observed during the migration of the Apache Web server VM?
What is the total downtime observed during the migration of the Apache Web server VM?
What is one of the main challenges faced during process migration?
What is one of the main challenges faced during process migration?
Which of these metrics indicates a significant performance drop during the first precopy of the web server VM?
Which of these metrics indicates a significant performance drop during the first precopy of the web server VM?
Which statement about VM migration is incorrect?
Which statement about VM migration is incorrect?
How much memory does the VM being migrated use?
How much memory does the VM being migrated use?
What is the primary goal of deduplication in cloud storage systems?
What is the primary goal of deduplication in cloud storage systems?
What does VM migration allow for in a cloud computing context?
What does VM migration allow for in a cloud computing context?
What aspect makes VM migration simpler compared to process migration?
What aspect makes VM migration simpler compared to process migration?
What was the static page size being served by the Apache Web server VM during the test?
What was the static page size being served by the Apache Web server VM during the test?
What does inter-VM caching aim to achieve in cloud computing?
What does inter-VM caching aim to achieve in cloud computing?
What was the throughput after further iterations of migration of the web server VM?
What was the throughput after further iterations of migration of the web server VM?
Which technology allows for very fast access between VM memories?
Which technology allows for very fast access between VM memories?
In what way does the Puma approach benefit virtual machines?
In what way does the Puma approach benefit virtual machines?
What is a likely consequence of having many copies of the same data in cloud platforms?
What is a likely consequence of having many copies of the same data in cloud platforms?
How does the deduplication process work?
How does the deduplication process work?
What type of network interface card (NIC) is required for the RDMA technology?
What type of network interface card (NIC) is required for the RDMA technology?
What common benefit does deduplication provide to cloud applications?
What common benefit does deduplication provide to cloud applications?
What is the main purpose of structuring a partition into several groups?
What is the main purpose of structuring a partition into several groups?
What role does the super-block play in a partition structure?
What role does the super-block play in a partition structure?
Why is it necessary to detect when a block is not referenced?
Why is it necessary to detect when a block is not referenced?
How does deduplication of data blocks impact storage?
How does deduplication of data blocks impact storage?
What does the inode and block bitmap allow in a partition structure?
What does the inode and block bitmap allow in a partition structure?
What challenge arises when handling modifications to a block referenced by several files?
What challenge arises when handling modifications to a block referenced by several files?
What is the significance of maximizing locality between metadata and data?
What is the significance of maximizing locality between metadata and data?
What is a key consideration in avoiding storing a block twice within a group?
What is a key consideration in avoiding storing a block twice within a group?
Flashcards
Deduplication
Deduplication
The process of identifying and eliminating duplicate data within a storage system.
Mutable data
Mutable data
The type of data that can be changed or modified. This includes most data that is actively being used in a system.
Deduplication Targets
Deduplication Targets
A technique that reduces the amount of storage space required by identifying and storing only unique data blocks. Multiple copies of the same data block are replaced with a single reference.
Deduplication in RAM
Deduplication in RAM
Signup and view all the flashcards
Deduplication in File Systems
Deduplication in File Systems
Signup and view all the flashcards
Deduplication in VM Storage
Deduplication in VM Storage
Signup and view all the flashcards
Deduplication in Backups
Deduplication in Backups
Signup and view all the flashcards
Deduplication in Archives
Deduplication in Archives
Signup and view all the flashcards
Virtualization layer (hypervisor)
Virtualization layer (hypervisor)
Signup and view all the flashcards
Virtual Machine (VM)
Virtual Machine (VM)
Signup and view all the flashcards
VM migration
VM migration
Signup and view all the flashcards
VM memory sharing
VM memory sharing
Signup and view all the flashcards
Deduplication at the main memory level
Deduplication at the main memory level
Signup and view all the flashcards
Deduplication for VM image storage
Deduplication for VM image storage
Signup and view all the flashcards
Dynamic IaaS management
Dynamic IaaS management
Signup and view all the flashcards
What is the benefit of virtualization?
What is the benefit of virtualization?
Signup and view all the flashcards
ext3 File System Structure
ext3 File System Structure
Signup and view all the flashcards
Group (ext3 File System)
Group (ext3 File System)
Signup and view all the flashcards
What is the limitation of virtualization?
What is the limitation of virtualization?
Signup and view all the flashcards
Super-Block (ext3 File System)
Super-Block (ext3 File System)
Signup and view all the flashcards
What is VM migration?
What is VM migration?
Signup and view all the flashcards
Block Bitmap (ext3 File System)
Block Bitmap (ext3 File System)
Signup and view all the flashcards
What are the benefits of VM migration?
What are the benefits of VM migration?
Signup and view all the flashcards
Inode Bitmap (ext3 File System)
Inode Bitmap (ext3 File System)
Signup and view all the flashcards
Compare VM migration to process migration?
Compare VM migration to process migration?
Signup and view all the flashcards
Data Deduplication (ext3 File System)
Data Deduplication (ext3 File System)
Signup and view all the flashcards
Why is VM migration simpler than process migration?
Why is VM migration simpler than process migration?
Signup and view all the flashcards
What are the technological advancements in VM migration?
What are the technological advancements in VM migration?
Signup and view all the flashcards
Block Existence Check (ext3 File System)
Block Existence Check (ext3 File System)
Signup and view all the flashcards
What happens during VM migration?
What happens during VM migration?
Signup and view all the flashcards
Handling Block Modifications (ext3 File System)
Handling Block Modifications (ext3 File System)
Signup and view all the flashcards
Inter-VM memory sharing
Inter-VM memory sharing
Signup and view all the flashcards
RDMA (Remote Direct Memory Access)
RDMA (Remote Direct Memory Access)
Signup and view all the flashcards
VM Memory Management
VM Memory Management
Signup and view all the flashcards
Memory Ballooning
Memory Ballooning
Signup and view all the flashcards
Page Swapping
Page Swapping
Signup and view all the flashcards
Page Cache
Page Cache
Signup and view all the flashcards
Deduplication Ratio
Deduplication Ratio
Signup and view all the flashcards
Paravirtualization layer
Paravirtualization layer
Signup and view all the flashcards
Precopy migration
Precopy migration
Signup and view all the flashcards
Precopy time
Precopy time
Signup and view all the flashcards
Throughput
Throughput
Signup and view all the flashcards
Downtime
Downtime
Signup and view all the flashcards
VM restart
VM restart
Signup and view all the flashcards
Physical page table reconstruction
Physical page table reconstruction
Signup and view all the flashcards
Study Notes
Cloud Computing - Lesson 8: LaaS Management
- Course covers advanced LaaS management techniques
- Describes economies of scale in virtualized environments
- Explains VM migration for decoupled OS and physical resource management
Announcements
- Quizzes on lectures 5 and 6 are closed
- Peer review available until Wednesday, November 20th (before class)
- Another quiz on lectures 7 and 8 (current and next lecture) will be available next week
- Final quiz on "Big Data" lectures
- Donatien will give lecture 10 on Big Data processing/stream processing (his PhD topic)
Outline
- Part A: Saving Storage Space
- VM memory sharing
- Deduplication at memory level
- Deduplication for VM image storage
- Part B: Dynamic LaaS Management
- VM migration across hosts
Recap: Virtualization
- Virtualization layer (hypervisor) manages physical resource sharing between VMs
- Each host OS sees resources allocated to it
Recap: Virtualization Modes
- 1st Generation: Full virtualization by binary rewriting (software-based)
- 2nd Generation: Paravirtualization (software-based collaborative virtualization) requiring modified guest OS
- 3rd Generation: Hardware-assisted full virtualization (software+hardware-based)
VM Memory Sharing
- Hypervisor assigns fixed memory to each VM
- Predictable and simple but estimating needs can be difficult
- Ballooning: dynamic memory allocation, changing memory frames allocated to VMs
- Requires paravirtualization with a specific driver on each VM OS kernel
- Releasing memory requires access to process memory context (page tables) and semantics.
Other Approaches
- Puma: Inter-VM memory sharing that allows host OS to borrow/lend free memory (not necessarily on same physical host)
- RDMA: Remote Direct Memory Access, allowing direct (fast) access between VM memories, but requires specific NICs
Data Duplication
- Cloud platforms handle large data amounts due to multiple companies, applications, and users.
- Many data copies across clouds and even within a single cloud frequently coexist.
- Duplication wastes storage space.
- Deduplication automatically removes duplicate data in a storage system.
- General principle: detect common data, maintain a single copy and pointers, for all applications and systems.
- Includes deduplication ratio (before ÷ after)
Deduplication Targets
- RAM: minimal overhead, reads/writes (R/W)
- File System: low delays, high throughput (R/W)
- VM storage: low delays, high throughput (R/W, but rare)
- Backups: high throughput (R/Mostly W, but rare)
- Archives: high throughput (Only W)
Example 1: Deduplicating RAM
- Several VMs on the same physical host with the same OS type (e.g., GNU/Linux, base image from LaaS provider).
- Shared libraries and executable files, and in some cases data, also exist.
- In Docker, common filesystem (FS) layers are reused but the hypervisor has no knowledge of the FS used by VMs.
- Memory-mapped files occupy integer numbers of pages or same downloaded data by VMs.
- Pages for same data exist multiple times in VM memory space; store once and use deduplication.
VMWare ESX
- Third-generation "bare-metal" hypervisor
- Paravirtualization and hardware assisted memory management
- Host OS manages its own physical memory (virtual during operation)
- Hypervisor maps guest "virtual physical memory" to actual physical memory (page table levels).
Deduplication in VMWare ESX: Transparent Page Sharing (TPS)
- Offline deduplication uses lazy manner for duplicate scans.
- Scans physical memory. Pages have content checked.
- Hash function used on pages' content for faster checking/searching in a hash table.
- When hash values match, pages have same content.
- 2nd-level table updated when needed.
- Transparent to guest OS by using copy-on-write on shared pages.
Hash Function
- Function over the content of the block, creates a value in a bounded hash space (e.g.,128 bits).
- Values should have uniform distribution and non-cryptographic (e.g., not MD5) for checking if two pages are the same.
Copy-on-Write
- Before VM 1 modifies page C: page A, page B, page C in VM1 and same in VM2.
- After VM 1 modifies page C: page A, page B, copy of page C, in VM1 and page A, page B, page C in VM2.
Effectiveness of TPS in ESX
- Real-world page sharing in production deployments.
- Datasets of total, shared, and reclaimed memory (MB) across VMs
Performance Impact (Pshare == TPS)
- Shows performance impact results from testing on different types of workloads using PS/TPS against a baseline.
Example 2: Deduplicating a File System for VM Image Storage
- VM images are stored in Glance (OpenStack) or OpenNebula's image service.
- Number of VM Images is often high with multiple base VM images, different OS versions, and many images per user.
- VM image size is often multiple Gigabytes in size.
Potential for Deduplication in VM Images
- Study data suggests significant potential for deduplication in VM images with similar OS types/distributions (same OS like Fedora 9 but different installations).
Case Study: LiveDFS
- Adapts a legacy file system (ext3) to support deduplication.
- Uses inline deduplication to detect duplicates during write operations.
- Adapted to commodity hardware without special hardware requirements.
- Integrated into OpenStack.
LiveDFS Design Goals
- Ensure performance of VM operations
- Avoid significant impact on VM startup performance.
- Compatibility with existing tools (POSIX file system interface) for deletion operations.
- Target "low-cost" commodity hardware (memory is limited).
LiveDFS Implementation
- Extends an existing filesystem (ext3) as a kernel module.
- Minimizes modifications to original filesystem interface.
- Deduplication at fixed-size block level.
Structure of an ext3 inode
- Explains the components of an ext3 inode, its structure, function and data storage.
ext3 File System Structure
- Describes the organization of partitions by groups with metadata and block IDs for maximized locality and decreased disk seeks.
Goal: Deduplicate Data Blocks
- Describes the structure of partitions as split in groups to maximize locality and reduce disk seeks by storing metadata and data close to each other.
How To...
- Check if a block already exists in the group and avoids storing it twice.
- Manage modifications to a block referenced by several files after deduplication.
- Detect when blocks are not referenced, freeing disk space.
Detect that a Block Already Exists
- Associates 16-byte MD5 hash fingerprints with each block.
- Maintains a listing of fingerprints.
- When a new block is written, checks if the fingerprint already exists in the store.
- Checks content bit by bit if fingerprints match.
- Stores references to existing blocks and manages fingerprint list updates.
Maintaining and Accessing the List of Fingerprints
- Single 4TB disk with 4KB blocks and 16-byte fingerprints.
- 2^30 blocks, requiring 2^34 bytes or 16GB for fingerprints.
- Storing fingerprints on disk to avoid memory limitations.
- Uses a two-step approach (FP filter/store) to handle write operations, allowing disk access for checking, without memory overload.
LiveDFS Block Writing Procedure
- Presents the flow of deduplication for block writing operations (Memory Access, Verdict, Consequence).
- Identifies "NEW BLOCK" or "EXISTING BLOCK" for deciding how to write to disk.
Fingerprint Store (On Disk)
- Allocates disk blocks to store fingerprints.
- Indexed by block number.
- Does not allow direct block lookups by fingerprints; the in-memory fingerprint filter performs the lookup.
- Each entry includes a reference count.
- The counter increases when blocks are referenced.
- The block is removed if the counter drops to 0.
Fingerprint Filter (In Memory)
- Hash table.
- Key = first n + k bits of fingerprint; value = bucket.
- Holds multiple values per key if possible.
- Efficient lookup and possible list handling for fingerprint values.
- Stores updated entries from the block (when it is written) with new information.
- Check FP store first for a possible existing block or a match; otherwise process normal write action if no match.
Generation and Size of FP Filter
- Re-generating FP filter in RAM during file system mounting.
- Authors report ~6 minutes/TB of data, indicating faster processing with SSD drives.
- Size of the filter depends on the values of n and k, influencing false positives and RAM usage.
- Size remains relatively small regardless of disk size (it remains the same), thus avoids memory overload, but must be multiplied by number of disks.
Handling Modifications to Deduplicated Blocks
- If a block is modified, the change must not be visible for other files referencing that same original block.
- Uses copy-on-write principle if reference count ≥ 2.
- Copies modified blocks to a new location, updates the inode, decrements the reference counter and then applies the write to the new block.
- This can involve fragmentation and increased disk seeks.
Prefetching and Journaling
- Stream-based block writing for VMs.
- High probability of one block accessing the FP store will access subsequently accessed pages.
- Prefetching mechanism pre-reads more FP entries.
- Journaling for a series of write updates to the disk to recover a stable state.
- Updates for FP store are also integrated into the journal.
Integration in OpenStack
- LiveDFS implementation as a kernel-space module.
- Compliant with POSIX and operates transparently at the VFS level.
- Mounts VM image storage partition using LiveDFS.
Performance Results
- Testing on a single low-end server (Intel Core i5 760, 8GB RAM, 1TB HDD).
- Comparison with an unmodified ext3 file system.
- Testing with different variants (spatial locality, prefetching, journaling).
Sequential Writes
- Performance results from testing sequential write requests using multiple variants of LiveDFS compared to the Ext3FS control group.
Sequential Reads
- Performance results from testing sequential read requests using multiple variants of LiveDFS compared to the Ext3FS control group.
Sequential Duplicate Writes
- Writing a file with the same content as an existing one already on disk.
- Cost is writing only entries to the FP store (small).
- No journaling implies disk seeks are possible when blocks aren't in order (sequential).
OpenStack Integration Evaluation
- Evaluation of space saving goals.
- Impact on VM startup/saving times.
- Authors use a collection of VM images for evaluation of the OpenStack integration.
Space Usage
- VM images store data that consumes lots of memory and storage.
- Deduplication reduces the amount of data stored.
- Shows comparison between LiveDFS and EXT3FS with zero blocks.
Store/Startup Time
- Time to write a VM image, and time to start a VM.
- Highlights performance improvements from LiveDFS across various workloads/scenarios, including sequential writes and reads.
Migrating a Complex Web Server (SPECweb99)
- Benchmark involving intensive disk/network operations with many concurrent connections.
Migrating an Interactive Application (Quake 3)
- Testing with 6 concurrent players to simulate an interactive scenario.
Quake 3 Server VM Migration: Impact on Clients
- Evaluates the impact of the migration operations on the latency of the client packets showing reduction with LiveDFS.
Paravirtualized Optimizations
- OS-level driver improvements for better performance results.
- Free page cache pages optimization for VM use.
- Return pages to Xen hypervisor and other VMs to decrease size of first-copy iteration.
- Kernel monitoring to stop processes with excessive memory use.
- Write Working Set monitoring for efficient process management.
Conclusion
- Overall, both deduplication and VM migration mechanisms successfully optimize LaaS utilization, improving storage/resource use and reducing costs.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz explores advanced techniques in LaaS management, focusing on economies of scale in virtualized environments. Students will learn about VM migration and resource management strategies for decoupled operating systems. Prepare to test your knowledge on the intricacies of virtualization and dynamic management strategies!