nutanix-aos-technical-overview-200-wevmUjGu.pdf
Document Details
Tags
Full Transcript
Nutanix AOS - Technical Overview - 200 The Nutanix Cloud Infrastructure (NCI) is a distributed infrastructure platform for enterprise IT applications. NCI software combines computer, storage, and networking resources from a cluster of servers into a single logical pool with integrated resiliency, s...
Nutanix AOS - Technical Overview - 200 The Nutanix Cloud Infrastructure (NCI) is a distributed infrastructure platform for enterprise IT applications. NCI software combines computer, storage, and networking resources from a cluster of servers into a single logical pool with integrated resiliency, security, performance, and simplified administration. With NCI, organizations can efficiently deploy and manage data and applications across data center, edge, and cloud environments without the complexity or cost of traditional infrastructure. NCI enables IT to spend less time and money on infrastructure management and empowers users to manage their workloads. In this lesson, you will learn more about AOS, Nutanix Cloud Infrastructure's distributed storage layer, including its capabilities, features, and differentiation in today's IT environment. IN TR ODUCTION AN D OVER VIEW Getting Started Nutanix Cloud Infrastructure (NCI) Overview AOS KEY CON CEPTS Understanding Nutanix AOS Cluster Resiliency, Data Redundancy, and Domains Performance and Dynamic Data Placement Data Efficiency and Storage Optimization AOS Platform and Data Security Disaster Recovery and Backup Disaster Recovery Deep Dive SUMMAR Y AN D R ESOUR CES Resources Feedback Lesson 1 of 11 Getting Started Course Navigation Review the following information to learn how to effectively navigate the course. Scroll Click Use Scroll through the course to Click on images to enlarge Use any smart device to progress through each of them or reduce them. proceed through the course the lessons. Try clicking on this image. —phone, tablet, and computer. Navigate Sections Select the "hamburger" in the top left to open/close the navigation menu. Be sure to scroll to the bottom of each section and, if required, select the down arrow to continue to the next lesson. Continue Bar Some lessons and activities are followed by a CONTINUE bar. If you see the lock icon on the bar, go back and make sure you have completed all of the activities. C O NT I NU E Lesson 2 of 11 Nutanix Cloud Infrastructure (NCI) Overview AOS Core Technical Training Nutanix Cloud Infrastructure (NCI) is a distributed infrastructure platform for enterprise IT applications. It combines computing, storage, and networking resources from a cluster of servers into a single logical pool with integrated resiliency, security, performance, and simplified administration. With NCI, organizations can efficiently deploy and manage data and applications across data center, edge, and cloud environments without the complexity or cost of traditional infrastructure. NCI enables IT to spend less time and money on infrastructure management and empowers users to manage their workloads. In this lesson, you will learn more about AOS, Nutanix Cloud Infrastructure's distributed storage layer, including its capabilities, features, and differentiation in today's IT environment. NCI Inclusions Select each checkbox as you review the list of items below. Nutanix AOS: a scale-out distributed storage technology that makes a hyperconverged infrastructure (HCI) and Nutanix Cloud Platform (NCP) possible by delivering enterprise-grade capacities with distributed software architecture Nutanix AHV: a lightweight cloud hypervisor built into NCI that offers enterprise-grade virtualization capabilities and built-in Kubernetes support Nutanix Kubernetes Platform (NKP): an enterprise container solution that simplifies provisioning, operations, and lifecycle management of Kubernetes. Nutanix Disaster Recovery: a natively integrated disaster recovery (DR) solution that helps organizations stay prepared during a disaster while minimizing data loss and downtime Prism: a management user interface (UI) that includes Prism Element and Prism Central for infrastructure management, authentication, and role-based access control (RBAC). Prism includes REST APIs for integration with existing management systems Data and Network Security: a comprehensive data security that uses encryption, a built-in local key manager, and software- based firewalls for networks and applications with Flow Network Security C O NT I NU E Nutanix Cloud Platform The Nutanix Cloud Platform is a secure, resilient, and self-healing platform for building a hybrid multicloud infrastructure to support all kinds of workloads and use cases across public and private clouds, multiple hypervisors and containers, and all with varied compute, storage, and network requirements. The following shows a high-level architecture of the Nutanix Cloud Platform. It visually represents how the Nutanix Cloud Platform integrates various components and services to provide a comprehensive solution for managing enterprise IT applications across hybrid multi-cloud environments. The vision is "One platform to run applications and data anywhere." Layers in the image represent the following elements from top to bottom: Applications: application types supported by the Nutanix Cloud Platform Nutanix Central: centralized management hub with intelligent operations, self-service, and security features Nutanix Data Services: services that provide unified storage, data services, and database management solutions Nutanix Cloud Infrastructure: the foundation of the platform, includes various components for network flow and application security, disaster recovery, security features, and Cloud infrastructures Application and Data Portability: the platform facilitates seamless application and data movement across different environments Building Blocks of the Nutanix Cloud Platform The building blocks of the Nutanix Cloud Platform are Nutanix Cloud Infrastructure and Nutanix Cloud Manager. These are the products that fall under each of them to form a complete solution. Nutanix Cloud Infrastructure (NCI) 1 AOS Scale-Out Storage 2 AHV Hypervisor 3 Virtual Networking 4 Disaster Recovery 5 Container Services 6 Data and Network Security Nutanix Cloud Manager (NCM) 1 AI Operations (NCM Intelligent Operations) 2 Self-Service Infrastructure/App Lifecycle Management (NCM Self-Service) 3 Cost Governance 4 Security Central Lesson 3 of 11 Understanding Nutanix AOS AOS Basics Today's IT environment is challenged by constant technological evolution, rapid data growth, and the failure of some components of varied tools. This makes today's IT environment dynamic in both planned and unplanned ways. Let's learn a bit more about today's IT environment and the modern solutions AOS supplies. The challenges of today's IT environment include maintaining and managing IT systems in the face of continuous technological advancements, data growth, and the inherent fragility of complex systems. Modern Software Solutions For most data centers, VMware and virtualization significantly transformed the industry. IT administrators began managing their application virtual machines and no longer concerned themselves with individual server resources. Servers were aggregated into a cohesive set of resources. However, virtualization only partially solved infrastructure stack concerns, specifically around centralized storage. Those concerns still relied on legacy tech like SAN storage with inefficiencies like: Dual Controller systems, which make it difficult to scale-out with faster media results in controller contention Centralized Storage with complex and rigid configurations (like a limited number of RAID groups and RAID disks), leading to hot-spot inefficiency and creating bottlenecks SAN Systems generally rely on RAID to mitigate failures, making them inefficient and requiring extensive planning and overhead accounting NOTE: vSAN and its competitors are based on centralized storage principles. Nutanix solved issues with scalability, rigidity, and inefficiency by employing distributed systems principles to the data storage layer. Select each tab below to learn more. E X PE CT FA IL U RE DIVE RSIFY FA IL U RE E A SILY SCA L A B L E FL E X IB L E DE SIG N Nutanix recognizes that it is important to handle disruptions in real-time. As a result, Nutanix solutions self-heal to fix issues and apply distributed systems principles to solve the problem. E X PE CT FA IL U RE DIVE RSIFY FA IL U RE E A SILY SCA L A B L E FL E X IB L E DE SIG N Nutanix solutions also recognize that there should be no single point of failure or of control. Instead, Nutanix components have the flexibility to assume any role. E X PE CT FA IL U RE DIVE RSIFY FA IL U RE E A SILY SCA L A B L E FL E X IB L E DE SIG N Nutanix solutions are easily scalable with an infrastructure that can evolve as required and on demand. E X PE CT FA IL U RE DIVE RSIFY FA IL U RE E A SILY SCA L A B L E FL E X IB L E DE SIG N Nutanix solutions avoid rigid design choices and do not require special hardware assignments (e.g., disk-X with a special purpose or node-Y that handles metadata). Instead, they easily leverage advancements in hardware tech, like NVMe, SPDK, and Pmem. C O NT I NU E AOS Eliminates the Need for Centralized Storage Nutanix solutions re-use reliable, proven technology and improve upon it where a special use-case may require it. Consider Nutanix Cassandra, for example. Let's review the challenges of today's IT environment and Nutanix solutions in a graphic below. AOS eliminates the need for centralized storage by using a distributed storage approach across multiple nodes. AOS uses a distributed approach that combines the storage resources of all nodes in a cluster to deliver the capabilities and performance you expect from traditional SAN storage; more easily and at a lower cost. AOS reduces management overhead while leveraging intelligent software to appear on a hypervisor, like VMware ESXi or Nutanix AHV, as a single, uniform storage pool. Select the hotspot (+) to learn more. Additional AOS Features Nutanix has expanded its features and capabilities, making AOS a leader in software-defined distributed storage. Each node in a Nutanix cluster runs a virtual machine (VM) or Controller Virtual Machine (CVM). CVMs run distributed storage and other services necessary for a cluster. Because storage and other services are distributed across the nodes in the cluster, no one entity serves as a single point of failure. Any node can assume leadership of any service. HCI adds different types of nodes based on the specific resource needs of the IT environment. Automated/Flexible Scaling with Hyperconverged Infrastructure (HCI) Automated, flexible scaling helps Nutanix avoid the inefficiency of traditional 3-tiered solutions which can create issues in allocating resources over a solution's lifecycle. Hyperconverged infrastructure (HCI) scales linearly and predictably. This enables users to buy as needed by adding nodes that scale while automatically redistributing data via a node expansion tool, illustrated here. Node Expansion Flexibility With Nutanix, users can add storage-heavy nodes. If additional storage is required, users can add storage-only nodes without a hypervisor license. Users can also mix and match flash and hybrid nodes within a cluster so they are not constrained by their original design. If computing resources run out faster than storage, users can add compute-only nodes. This takes the guesswork out of buying storage and designing infrastructure. HCI allows users to mix different types of nodes within the same cluster, avoiding constraints. Review the following question, choose the best answer, and then select Submit. Which of the following best describes Nutanix Cloud Infrastructure (NCI)? A system which makes it difficult to scale-out with faster media and which results in controller contention with combined computing, storage, and networking resources in a single pool with integrated resiliency, security, performance, and administration A distributed infrastructure platform for enterprise IT applications that combines computing, storage, and networking resources into a single pool with integrated resiliency, security, performance, and administration A persistent write buffer—like a filesystem journal—that handles bursts of random writes, coalesces them, and then sequentially moves the data to an Extent Store: it combines computing, storage, and networking resources into a single pool with integrated resiliency, security, performance, and administration A natively integrated DR solution that helps organizations stay prepared during a disaster while minimizing data loss and downtime by sequentially moving data to the Extent Store, coalescing and storing the data SUBMIT C O NT I NU E Dynamic Placement and Movement of Data One of the key elements of AOS is a unique approach to metadata. Nutanix metadata is divided into global and local stores to optimize locality, limit network traffic for lookups, and improve performance. Global Metadata stores logical information, such as which node data is stored on. Local Metadata stores information about the physical location of data. This provides flexibility and optimizes metadata updates. Additional Metadata Store Features Nutanix's distributed metadata store allows AOS to split vDisks into fine-grained data pieces. This delivers dynamic, automated data placement and management, ensuring better, more consistent performance. This enables missing data copies to be immediately rebuilt when a failure occurs. Review the following graphics and explanations for a demonstration of this process. Each node is responsible for storing and managing data. When data is written to one cluster, it is replicated across multiple nodes. The system detects the missing data if a node fails or is removed. AOS immediately starts rebuilding the missing data by copying it from the surviving nodes to maintain data integrity. In addition to an immediate rebuild on failures, the fine-grained metadata in AOS also makes it easier to do data migrations when a VM moves between nodes. This ensures the data being read by the VM is always local to where the VM is running. C O NT I NU E Nutanix Solutions: AOS Key Concepts Review the key concepts underlying AOS below. Storage Container – A logical segmentation of capacity from the physical storage pool available across a cluster. Storage containers hold the virtual disks (vDisks) used by VMs. Physical storage is allocated to the storage container as needed when data is written. Storage efficiency features like compression, replication factor, and quality of service (QoS) are enabled at the container level or on a per-VM basis. vDisk – Virtual disks, or vDisks, are created within a storage container to provide storage for VMs. OpLog – A persistent write buffer much like a filesystem journal, the OpLog handles bursts of random writes, coalesces them, and then sequentially moves the data to an Extent Store. Extent Store – Persistent bulk storage spanning all device tiers that are present in a cluster (NVMe, SSD, HDD). Unified Cache – A dynamic read cache used for data and metadata that is stored in the CVM’s memory. A read request for data not in the cache is read from the Extent Store and placed into the Unified Cache until evicted. Distributed Scale-Out Storage Review the graphic below to learn more about distributed scale-out storage. Nutanix's approach to storage simplifies management and reduces costs, while providing a scalable and flexible solution to meet growing data and compute needs. Automated Provision and Scaling Linear Performance Scaling These capabilities allow it to automatically add This enables it to scale performance and and manage storage and compute resources as capacity in a linear and predictable needed, which reduces operational costs and manner by adding more nodes as management overhead, making the required. This flexibility allows businesses infrastructure more efficient compared to to start with a small setup and grow as traditional systems like VxRail/VxSAN disk needed, reducing the initial investment groups. and risk. Review the following question, choose the best answer, and then select Submit. Which of the following best describes the distributed systems principles Nutanix has applied to data storage to solve some of the challenges of today's IT environment? React to failure and diversify success; be rigid but ensure scalability Only expect and diversity failure to ensure evolution as technology changes Only provide scalability and flexibility to ensure evolution as technology changes Expect and diversify failure; be scalable and ensure flexibility SUBMIT Lesson 4 of 11 Cluster Resiliency, Data Redundancy, and Domains Resiliency and Data Redundancy Cluster Resiliency A Nutanix Cluster Administrator can set the fault tolerance (FT) level for each cluster, specifying the number of simultaneous failures a cluster can handle. Cluster Fault Tolerance refers to the number of component failures a cluster can tolerate. A failure can occur in any component in a node, like the drive or network interface. Fault Tolerance Level 1 (FT1) This is a cluster that can tolerate one component failure or "fault" at a time. The FT1 Tolerance Level requires 3 copies of metadata. Fault Tolerance Level 2 (FT2) This is a cluster than can tolerate two concurrent failures at any time. The FT2 Tolerance Level requires 5 copies of metadata. An existing cluster can be promoted from FT1 to FT2 - as a background operation - if all conditions are met. C O NT I NU E Data Redundancy AOS uses replication factors (RFs), also known as tunable redundancy. This ensures data redundancy and availability in the face of node or drive failures. The RF value can be set to one, two, or three. Upon write, data is synchronously replicated to either another node (RF2) or two other nodes (RF3) before the write is acknowledged as successful. This ensures the data exists in multiple independent locations for FT. All writes (both local and replica) are dynamically placed, with the following always holding true: 1 Replica Placement – Replicas are dynamically placed based on the FT domain and other factors such as the storage and compute utilization of each node. At RF2, two copies of data are maintained. At RF3, three copies are maintained. 2 No Binding – There is no static binding that determines where replicated node data is placed. All nodes are replicated, eliminating hot spots and ensuring linear performance scaling. 3 Immediate Rebuild – If there is a failure in a cluster—like a node, disk, or CVM becoming unresponsive—AOS can quickly and easily determine which data copies are no longer available. This is because of the distributed metadata store, which enables AOS to begin the rebuild immediately. 4 Auto-Read Data – Data is automatically read from other nodes in a cluster. If a node does not come back online, all affected data is automatically rebuilt from the other copy(ies) to ensure full redundancy and data resilience are restored without operator intervention. Because of the distributed architecture and intelligent copy placement, recovery from failure occurs across multiple nodes and drives, preventing hot spots. 5 Dynamic Writes – Writes are ALWAYS dynamic within the Nutanix architecture. At every new write or overwrite, the system automatically determines the best location to place it based on algorithms and disk fitness scores. One copy is always placed locally, and a replica is always placed based on dynamic decisions. NOTE: All Nutanix writes are acknowledged ONLY after 1–2 copies are saved in remote nodes. Learn more by reviewing the dynamic illustration below. 1. The virtual machine (VM) writes data (Write IO) and sends it to the nodes in the cluster. 2. The data is synchronously replicated to the other nodes before the write operation is acknowledged as successful. 3. The system dynamically determines the best location to store the data based on algorithms and current resource utilization. Review the following question, choose the best answer, and then select Submit. Which of the following best describes the data redundancy features that Nutanix AOS offers? Dynamic data placement and writes with no static binding, auto-read data, and immediate rebuilds Dynamic data replication and storage with automated rebuilds and static binding for consistency Dynamic data encryption that ensures automated storage in multiple locations to avoid data loss only Dynamic data configuration that ensures data is replicated at multiple sites to avoid data loss SUBMIT C O NT I NU E Hardware Failure Resilience Nutanix solutions are self-healing. Fixes begin immediately upon failure and are faster than traditional RAID solutions. Hardware failure resilience is fully distributed across all nodes and disks within a system's architecture. This complete participation in rebuilds helps avoid hot spots and ensures that rebuild performance will increase as cluster size grows. Rebuild progress is monitored by the Data Resiliency Widget. Select each hotspot (+) below to learn how some cluster components participate in data resiliency. CVM CVM auto-pathing—which leverages HA.py—takes care of CVM failure in ESXi/Hyper-V by modifying the route from internal 192.168.5.2 to external IPs of other CVMs. iSCSI redirection handles CVM failure in AHV, and the same approach is used, with graceful I/O handoff, for rolling AOS upgrades. This ensures that there is no interruption to running VMs (NOT a high availability or HA event), with potentially higher storage latency due to remote access. CVM typically reverts to its default internal/local route once it is back up. Disks Processes and procedures with disks are similar to node failure. There is no movement of VMs. Stargate is the process that handles I/O. When Stargate sees I/O errors or the device fails to respond within a certain threshold, it will mark the disk offline. Once that has occurred, Hades will run S.M.A.R.T. and check the device's status. If the tests pass, the disk will be marked online; if they fail, it will remain offline. Re-replication jobs are distributed to all nodes in the cluster. If Stargate marks a disk offline multiple times—currently 3 times in an hour—Hades will stop marking the disk online even if S.M.A.R.T. tests pass. Only the missing data needs to be replicated, regardless of drive/node/cluster capacity. This provides improved re-protection times compared to traditional SAN/RAID or HCI solutions without fine-grained data distribution. This is characterized as when a disk is removed, stops responding (fails), or has multiple I/O errors. Node Failure In the event of a node failure, a VM HA (virtual machine high availability) event will occur restarting the VMs on other nodes throughout the virtualization cluster. Once restarted, the VMs will continue to perform I/Os as usual which will be handled by their local CVMs. Similar to the case of a disk failure above, a Curator scan will find the data previously hosted on the node and its respective replicas. Once the replicas are found all nodes will participate in the reprotection. Select each card to view both sides and use the arrows to navigate through them all. Stargate/Hades leverage S.M.A.R.T. to evaluate disk health (and can proactively re-protect drives before failure in certain circumstances). A curator scan occurs immediately to identify missing data/replica locations. 1 of 3 AOS distributes events across all 2 of 3 disks/nodes evenly, allowing each CVM to actively participate in re- replicating CVM missing partitions will self- storage. heal onceDisk thefailure drive has containing the home been replaced. A node (CVM) partition can lose all of itswill result drives without impacting data availability, while a cluster can continue to lose drives (provided replication restores RF2 or RF3 before additional 3 of 3 C O NT I NU E Availability Domains An availability domain is an AOS construct that determines the placement of data, metadata, and configuration data in the cluster. Combined with data and metadata resiliency, an availability domain places entities (data, metadata, and cluster configuration data) across nodes, blocks, or racks, providing protection and resilience during physical node, block, or rack failures. By combining powerful data redundancy with intelligent data placement, AOS can automatically ensure data, metadata, and configuration data resilience if a physical block or rack fails. Availability domains provide greater cluster-level resilience without increasing the required storage capacity. Nutanix currently supports the following levels of awareness: Disk (always) Node (always) Block Rack NOTE: A minimum of three blocks must be utilized for Block Awareness to be activated; otherwise, node awareness will be defaulted to. It is recommended to utilize uniformly populated blocks or racks to ensure awareness is enabled and no imbalance is possible. The 3-block requirement ensures a quorum. For example, a 3450 is a block that holds 4 nodes. Distributing roles or data across blocks ensures that if a block fails or needs maintenance, the system can continue to run without interruption. Select each hotspot (+) below to learn about levels of awareness. Disk Awareness Disk Awareness allows a cluster to lose one (RF2) or two (RF3) disks without impacting data availability. This is a standard Nutanix Availability Domain, and it is always enabled. Node Awareness Node Awareness allows a cluster to lose one (RF2) or two (RF3) noes without impacting data availability. This is a standard Nutanix Availability Domain, and it is always enabled. Block Awareness Block Awareness allows a cluster to lose one (RF2) or two (RF3) complete enclosures housing multiple nodes without impacting data availability. If the administrator desires, Block Awareness is enabled on a Best Effort basis. Rack Awareness Rack Awareness allows a cluster to lose one (RF2) or two (RF3) complete racks (potentially housing multiple blocks/nodes) without impacting data availability. Additional configuration in Prism is required to map out the physical locations of nodes across racks. NOTE: Within a block, the redundant PSU and fans are the only shared components. NOTE: Rack awareness requires the administrator to define the "racks" in which the blocks are placed. C O NT I NU E Nutanix Clusters Here’s how Nutanix ensures data availability and resilience with awareness. Select each tab below to learn more. DA T A M E T A DA T A Z OOKE E PE R Data replicas are written to other blocks or racks in a cluster to ensure that, in the event of a failure or planned downtime, the data remains available. This is true for both RF2 and RF3 scenarios and in the event of a block or rack failure. An easy comparison would be “node awareness," wherein a replica would need to be replicated to another node, providing protection in the case of node failure. Block and rack awareness further enhance this by providing data availability assurances in the case of outages. DA T A M E T A DA T A Z OOKE E PE R Cassandra peer replication iterates through nodes in a clockwise manner throughout the ring. With block and rack awareness, peers are distributed among the blocks/racks to ensure that no two peers are on the same block or rack. DA T A M E T A DA T A Z OOKE E PE R Nutanix leverages Zookeeper to store essential configuration data for the cluster. This role is also distributed in a block/rack-aware manner to ensure availability in case of a failure. Nutanix's system spreads data and important information across different blocks and racks to maintain availability and resilience even if some blocks or racks fail. Review the dynamic Availability Domain illustration below to learn more. 1. The cluster is divided into multiple blocks/racks, each containing several nodes. 2. Data, metadata, and cluster configuration information are distributed across different blocks/racks. 3. Components: Cluster Configuration (C) data ensures the cluster's configuration is always available, Metadata (M) keeps track of where data is stored, and Data (d) is the actual data used by virtual machines. NOTE: Block and Rack Awareness are enabled through Prism and is considered a Best Effort. If it cannot be satisfied, a cluster will fall back to node awareness for any of the three entities. C O NT I NU E Data Integrity and Availability Nutanix ensures data remains available and intact during failures, maintains application performance during rebuilds, and recovers data quickly to minimize downtime. Review the graphic below to learn more about data integrity and availability. Review the following question, choose the best answer, and then select Submit. Which of the following best describes Nutanix Hardware Failure Resilience? Nutanix solutions are self-healing, so fixes begin immediately upon failure and are faster than traditional RAID solutions. They are also fully distributed across all nodes and disks within a system. Nutanix solutions are provided manually (remotely or physically on-site), so fixes begin upon Nutanix's notification of failure. Fixes are distributed across all nodes and disks within a system's architecture. Nutanix solutions are fully automated, so fixes are done within just 30 minutes of failure. This is faster than traditional RAID solutions. Nutanix fixes are distributed across all nodes/disks at the site of failure. SUBMIT Lesson 5 of 11 Performance and Dynamic Data Placement Performance Disk Breakdown There are two key components in regard to disks, data storage, and data performance. OpLog ensures data is written quickly and then coalesced before being moved to the Extent Store. OpLog: Persistent Write Buffer An OpLog is similar to a Filesystem Journal. An OpLog is a persistent write buffer to handle/coalesce bursts of random writes before sequentially draining data to the Extent Store. An OpLog writes on one node/CVM, which is synchronously replicated to another CVM's OpLog (based on RF2/RF3). An OpLog's replication targets are dynamically chosen based on load. Each OpLog is stored on/spread across the solid state drive (SSD) tier on the CVM for fast random I/O performance. NOTE: For sequential workloads, the OpLog is bypassed with writes going directly to the Extent Store. If data currently sits in the OpLog and has NOT been drained, all read requests will be fulfilled from the OpLog until they have been drained. After they are drained, they will be served by the Extent Store/Unified Cache. For containers where fingerprinting (aka Dedupe) has been enabled, all write I/Os will be fingerprinted using a hashing scheme, allowing them to be deduplicated based on the fingerprint in the unified cache. Extent Store: Persistent Data Storage The Extent Store is the persistent bulk storage of AOS. It does the following: Spans all device tiers (Optane SSD, PCIe SSD, SATA SSD, HDD) and is extensible to facilitate additional devices/tiers. Data entering the Extent Store is either: Data being drained from the OpLog OR Data that is sequential/sustained and has bypassed the OpLog directly Nutanix Information Lifecycle Management (ILM) will determine tier placement dynamically based on I/O patterns, data access numbers, and individual tier weight. Nutanix ILM will move data between tiers. Extent Store handles long-term storage and sequential writes. The user VM reads/writes to all devices. VMs running on the Nutanix AOS can read and write from all storage devices that make up the AOS distributed storage. 1 Writes are done at a very granular level. 2 Balance between usage/performance can be achieved with more granularity. 3 Predictability can be achieved at scale. 4 Writes are dynamic based on disk fitness scores, with no hot- spots. 5 There is no waste of space for caching. All devices become persistent storage. Review the information below to deepen your understanding of the benefits of this capability. This dynamic illustration shows how virtual machines (VMs) interact with AOS to read and write data. 1. VMs run on top of a hypervisor, a software layer that manages multiple VMs on a single physical host. 2. Stargate handles all read and write requests from the VMs to the storage. 3. Each node contains multiple disks. AOS allows VMs to read from and write to all the disks across the cluster. C O NT I NU E Read Performance: Data Locality I/O and local data access are critical to cluster and VM performance because it is a converged (compute+storage) platform. VM data is served locally from the CVM node on which it resides and sits on local disks under the CVM’s control. When a VM is moved from one hypervisor node to another—or during a HA event—the newly migrated VM data will be served by the new local CVM. Select the hotspot (+) below to learn more about performance and data locality. Data Locality: Additional Information When reading old data (stored on the now remote node/CVM), the I/O will be forwarded by the local CVM to the remote CVM. All write I/Os will occur locally right away. AOS will detect that I/Os are occurring on a different node and will migrate the data locally in the background, allowing all read I/Os to be served locally. Data will only be migrated on a read to avoid flooding the network. Please note, data locality occurs in two main ways: Cache Locality refers to pulling remote data into a local node’s Unified Cache. This is done at a 4K granularity. In instances where there are no local replicas, requests are forwarded to the node(s) containing the replicas, which then return the data. The local node will store the data locally and return the I/O. All subsequent requests for that data will be returned from the cache. Extent Group (eGroup) Locality refers to migrating the vDisk extent group(s) (eGroups) to be stored in the local Stargate’s Extent Store. If a replica eGroup is already local, no movement is necessary. In this event, the actual replica eGroup will be re-localized after specific I/O thresholds are met. Data is not automatically re- localized or migrated to eGroups to ensure efficiency. For AES-enabled eGroups, the same horizontal migration occurs when replicas aren’t local and patterns are met. Review the following multiple response question, choose all answers that apply, and then select Submit. Which of the following are some of the benefits of VMs running on the Nutanix Distributed Storage Fabric? That writes are done at a granular level with a balance between usage and performance achieved that causes less granularity The OpLog is a persistent write buffer that is a staging area for writes. It synchronously replicates data stored on the SSD tier of the CVM for fast performance. Flexibility achieved at scale with dynamic configuration and encryption based on VM fitness scores, with no hot-spots or lost data Predictability achieved at scale, and dynamic writes that are based on disk fitness scores, with no hot-spots and no waste of space for caching SUBMIT C O NT I NU E RDMA: Zero-Touch Technology To perform data replication, CVMs communicate over the network. The default stack involves kernel drives and uses TCP for the communication between two CVMs. However, with RDMA, these NICs are passed through to the CVM, bypassing anything in the hypervisor. Within the CVM, all network traffic using RDMA only uses a kernel- level driver for the control path. All actual data I/O is done in user space without context switches. Select each tab below to learn more. RDM A B E N E FIT S Z E RO- T OU CH T E CH SU PPORT E D T OOL S RDMA's benefits include no context switching between kernel space to deliver consistent performance for latency-sensitive applications like SAP HANA and Kafka and throughput- hungry Big Data applications. RDM A B E N E FIT S Z E RO- T OU CH T E CH SU PPORT E D T OOL S Zero-Touch RoCE technology increases ease of use and RDMA manageability while disabling workload interruption. Zero-Touch tech also avoids switch configuration overhead via automated, priority flow-control configuration. RDM A B E N E FIT S Z E RO- T OU CH T E CH SU PPORT E D T OOL S RDMA is supported for Ice-Lake and above systems and those with compatible NICs like CX5 and above. Autonomous Extent Store (AES) Autonomous Extent Store (AES) is a new method for writing and storing data in AOS's Extent Store. AES leverages a mix of primarily local and global metadata. Due to these metadata and data localities, AES allows for a much more efficient and sustained performance. Select Start, then utilize the arrows to learn more about AES's improved Extent Store functionality. Federated Control With access to local metadata via AES, the Extent Store has federated control and decision-making abilities. This gives nodes control over physical layout and data placement management. This means nodes can decide or modify slice location and compaction. This also means that nodes can use node-level dedupe databases to point slices to dedupe data chunks. Highlight 1 Global Metadata The AES global metadata layer is implemented using a heavily customized version of Apache Cassandra. This makes the metadata available throughout the cluster. Any node in the cluster can then access the metadata information. Local metadata is implemented using Rocksdb. Highlight 2 Metadata Locality AOS also brings locality to metadata which reduces on-disk writes by up to 40% and reduces network use for data access by up to 75%. AOS also reduces CPU utilization. Continue Select Continue to learn more about Accelerating AOS with Blockstore and SPDK. C O NT I NU E Accelerating AOS with Blockstore and SPDK Blockstore enables AOS to interact with newer storage media like NVMe directly by using user-space libraries like SPDK. SPDK is capable of saturating eight (8) NVMe (solid state drives (SSDs), delivering over 3.5 million IOPS with a single CPU Core, and fully utilizing new media performance. It is purpose-built for NVMe but also benefits SSDs and HDDs. NOTE: Blockstore is proprietary for Nutanix. Blockstore and SPDK will always provide the leanest and most efficient data path with any hypervisor. The combination of Blockstore and SPDK will enable AOS to extract maximum IOPS and throughput from NVMe media. Together, Blockstore and SPDK provide consistent, lower latency for new applications. They also offer more workloads with higher application density and leverage next-generation enhancements to seamlessly future-proof storage. Blockstore, SPDK, and the new generation of media in Nutanix's storage solutions brought performance improvements. Performance Improvement with RF1 Nutanix AOS now supports storage container creation with Replication Factor 1 (RF1). This reduces cost and improves performance. It also means data is not replicated for redundancy. This can be used for applications that offer resiliency at the application level—like Hadoop and SAS Analytics—and also for traditional databases like SQL, which store temporary data. All of these applications do not require resiliency at the platform level. This enables two key benefits. Select each hotspot (+) to learn more about these benefits. Storage Efficiency When data is not replicated at the storage level, less physical storage media is needed, resulting in at least a 50% reduction in storage space for these kinds of applications. This reduces cost with non-resilient storage and provides application-level resiliency with scratch and ephemeral, non-sensitive data. Performance Gains I/O write performance is improved because data doesn’t need to be replicated across the network, providing true data locality, even for writes. This reduces network overhead. RF1 workloads like Cloudera experience shortened completion times by 3x, and SAS Analytics benefits from throughput increases of up to >50%. Performance Boost for Scale-up Databases Another benefit of Nutanix AOS is it provides a performance boost for scale-up databases. With traditional applications—like SQL Server—Nutanix AOS provides a single large vDisk for DB. It also offers workload migration without best practices, resulting in sub-optimal performance. With a single large vDisk now accessed by multiple vDisk controller threads, IO path and CPU optimizations improve performance. This means that the more workloads a user drives, the more performance benefits they enable with 100% performance improvement vs. a single vDisk for 70/30% of read/write SQL-DB. In short, the larger the workload, the better the performance. For a performance boost for scale-up databases, the best practice is still to do multiple vdisks that allow AOS to maximize performance across a cluster. Nutanix AOS' boost for scale-up databases enables 2X the performance improvement for single vDisks with multiple data files and 77% performance improvement for single vDisks with single data files. Performance is increased by 18-20% for multiple vDisks, with Lift and Shift for traditional DB applications like SQL server. Review the following graph to learn more. The performance improvement of MS SQL Server transactions per minute (TPM) when sharding is used, compared to when it is not used. C O NT I NU E AHV-AOS Fast Path: iSER In the context of performance, an additional Nutanix AOS component we need to review is AHV-AOS Fast Path using iSER. This is a higher storage performance for VMs running on AHV. AHV-AOS Fast Path using iSER offloads communication between AHV and AOS to physical NIC with iSER. This removes context switches and memory transfers between user-space and kernel and builds on datapath optimizations. This includes an AOS datapath optimized for fast NVMe storage and AHV Turbo (Frodo) with multiple request queues for vDisks. The benefits of iSER include the following: AHV Turbo (Frodo) uses iSER (iSCSI Extensions for RDMA) to directly pass operations to Stargate by bypassing the AHV host and CVM kernel space. There is no context switching or software copies between user and kernel space. Frodo: iSER is the initiator, while Stargate: iSER is the target. Provides better IOPS/latency at lower CPU consumption - IOPS/latency improvement of 10-20% Review the following data path comparison between TCP and iSER to learn more. Comparison between the data paths for TCP and iSER in AHV shows how iSER improves performance. C O NT I NU E The Approach to Dynamic Data Placement Nutanix’s approach to dynamic data placement optimizes resource usage, ensures consistent performance, and simplifies storage management, leading to cost savings and reduced complexity. Review the graphic below to learn more about dynamic data placement. Review the following multiple response question, choose all answers that apply, and then select Submit. Which of the following are key benefits of Nutanix's AES and Accelerated AOS? Reduced I/O write speeds and latency due to faster communication Increased storage efficiency with improved I/O write performance and scale-up databases Offloaded communication and I/O latency improvements Decreased storage efficiency with improved I/O write performance and flexible databases SUBMIT Lesson 6 of 11 Data Efficiency and Storage Optimization Storage Capacity and Optimization Capacity Optimization: Compression The Nutanix Capacity Optimization Engine (COE) is responsible for performing data transformations to increase data efficiency on disks. Currently, compression is one of the key features of the engine. AOS provides both inline and offline compression to suit customer and data type needs. As of AOS 5.1, inline compression is enabled by default. Inline compression will compress sequential streams of data or large I/O sizes (>64K) when written to the Extent Store (SSD + HDD) with 0 delay. This includes data draining from OpLog as well as sequential data skipping. Select each hotspot (+) below to learn more. Offline Compression Offline compression will initially write the data as normal (in an uncompressed state) and then leverage the curator framework to compress the data cluster-wide. When inline compression is enabled but the I/Os are random in nature, the data will be written un-compressed in the OpLog, coalesced, and then compressed in memory before being written to the Extent Store. Nutanix leverages LZ4 and LZ4HC for data compression with AOS 5.0 and beyond. Prior to AOS 5.0, the Google Snappy compression library is leveraged, providing good compression ratios with minimal computational overhead and extremely fast compression and decompression rates. Data Compression Normal data will be compressed using LZ4, which provides a very good blend between compression and performance. For cold data, LZ4HC will be leveraged to provide an improved compression ratio. Cold data is characterized into two main categories: Regular data: No read/write (R/W) access for 3 days Immutable data (snapshots): No R/W access for 1 day AOS Optimizes Data Storage and Performance Review the graphic below to learn how AOS handles data compression and storage using its Capacity Optimization Engine (COE). 1. Data is initially written to the OpLog. 2. The OpLog temporarily stores and compresses incoming write operations. 3. Extent Store is the main storage area where data is stored after OgLog processing. 4. Inline compression happens immediately for sequential data; post- process for later compression. 5. Read operations are served from cache or memory for faster access. C O NT I NU E Capacity Optimization: Erasure Coding The Nutanix platform leverages a replication factor (RF) for data protection and availability. This method provides the highest degree of availability because it does not require reading from more than one storage location or data re-computation upon failure. However, this does come at the cost of storage resources, as full copies are required. The block contains four nodes, each represented as a storage unit. Data (A and B) is stored in multiple copies (replicas) across different nodes. To provide a balance between availability and storage amount reduction, AOS provides the ability to encode data using erasure codes or ECs. Select each hotspot (+) to learn more. More on Erasure Codes To provide a balance between availability while reducing the amount of storage required, AOS provides the ability to encode data using erasure codes (EC). Similar to the concept of RAID (levels 4, 5, 6, etc.), ECs encode a strip of data blocks on different nodes and will then calculate parity. In the event of a host or disk failure, the parity can be leveraged to calculate any missing data blocks (decoding). In the case of AOS, the data block is an extent group based on the read nature of the data (read cold vs. read hot); the system will determine the placement of the blocks in the strip. This avoids overhead to the active write path, as this is done by the curator in the background. The post- process on writes for cold data includes no write required for >7 days and a balance between expected overheads and RF-like redundancy. Read Hot vs. Read Cold Data For data that is read cold, Nutanix prefers to distribute the data blocks from the same vDisk across nodes to form the strip (same-vDisk strip). This simplifies garbage collection (GC), as the full strip can be removed if a vDisk is deleted. For read hot data, Nutanix prefers to keep the vDisk data blocks local to the node and compose the strip with data from different vDisks (cross- vDisk strip). This minimizes remote reads as the local vDisk’s data blocks can be local, and other VMs/vDisks can compose the other data blocks in the strip. In the event that a read cold strip becomes hot, AOS will try to recompute the strip and localize the data blocks. Configurable Strips: Data and Parity The number of data and parity blocks in a strip is configurable based on the desired failure tolerance level (FT1 or FT2). The configuration is commonly referred to as the number of /. For example, “RF2-like” availability (N+1) could consist of 3 or 4 data blocks and 1 parity block in a strip (3/1 or 4/1). “RF3-like” availability (N+2) could consist of 3 or 4 data blocks and 2 parity blocks in a strip (3/2 or 4/2). The default strip sizes are 4/1 for RF2-like availability and 4/2 for RF3-like availability. These can be overridden using nCLI if desired. Strip width depends on the number of nodes and the RF. C O NT I NU E Capacity Optimization: Deduplication The Elastic Dedupe Engine is a software-based feature of AOS that enables data deduplication in the Extent Store tiers. Streams of data are fingerprinted during ingest at a 16KB granularity—controlled by stargate_dedup_fingerprint—within a 1MB extent. Before AOS 5.11, AOS only used SHA-1 hash to fingerprint and identify candidates for dedupe. Since AOS 5.11, AOS has used logical checksums to more easily select candidates for dedupe. This fingerprint is done on data ingest and then stored persistently as a part of the written block’s metadata. NOTE: Contrary to traditional approaches, which utilize background scans requiring the data to be re-read, Nutanix performs the fingerprint inline on ingest. For duplicate data that can be deduplicated in the capacity tier, the data does not need to be scanned or re- read; duplicates can be removed. Select each hotspot (+) to learn more. Fingerprinting Details Fingerprinting is done during data ingest of data with an I/O size of 64K or greater (initial I/O or when draining from OpLog). AOS then looks at hashes/fingerprints of each 16KB chunk within a 1MB extent, and if it finds duplicates for more than 40% of chunks, it dedupes the entire extent. This results in many dedupe extents with a reference count of 1 (and no other duplicates). To make the metadata overhead more efficient, fingerprint refcounts are monitored to track dedupability. Fingerprints with low refcounts will be discarded to minimize the metadata overhead. To minimize fragmentation, full extents will be preferred for deduplication. After fingerprinting is done, a background process removes the duplicate data using the AOS MapReduce framework (Curator). Duplicates and Deduplication Intel acceleration is leveraged for the SHA-1 computation, which accounts for very minimal CPU overhead. In cases where fingerprinting is not done during ingest (e.g., smaller I/O sizes), fingerprinting can be done as a background process. As duplicate data is determined—based upon multiple copies of the same fingerprints—a background process will remove the duplicate data using the AOS MapReduce framework (Curator). The data being read will be pulled into the AOS Unified Cache, a multi-tier/pool cache. Any subsequent requests for data having the same fingerprint will be pulled directly from the cache. Algorithm Enhancement With AOS 6.6, the algorithm was further enhanced such that within a 1MB extent, only chunks that have duplicates will be marked for deduplication instead of the entire extent, reducing the metadata required. With AOS 6.6, changes were also made in the way dedupe metadata was stored. Before AOS 6.6, dedupe metadata was stored in a top level vDisk block map, resulting in dedupe metadata being copied when snapshots were taken. This resulted in metadata bloat. With AOS 6.6, metadata is now stored in an extent group ID map, which is a level lower than the vDisk block map. Now, when snapshots are taken of the vDisk, it does not result in the copying of dedupe metadata and prevents metadata bloat. 1. Data is written to the OpLog buffer. 2. During data ingestion, if the data is 64KB or larger, a fingerprint is created and stored in the metadata. 3. Data from OpLog is drained and written to the Extent Store, which is divided into two tiers for efficient storage management. 4. Data read operations are handled by the Unified Cache, which pulls data into memory for faster access. Subsequent read requests for data with the same fingerprint are served directly from the cache. 5. The Elastic Deduplication Engine spans both the Extent Store and the Unified Cache. It removes duplicate data in the background using the AOS MapReduce framework. Review the following question, choose the best answer, and then select Submit. Which of the following is a benefit of the Nutanix Capacity Optimization Engine (COE)? Deduplication with reduced metadata bloat from duplicates Data compression, which helps to balance and improve performance Erasure coding, which balances availability and storage reduction All of these are correct SUBMIT Lesson 7 of 11 AOS Platform and Data Security Key AOS Platform Features Secure Platforms and Data It is important to establish and maintain secure configuration and ensure data is safe from loss or theft. Review the list below to learn more. 1 Identity and Access: Nutanix multi-factor authentication is done via SAML, with role-based access controls and audit logging. 2 Security Baseline and Audit: The platform includes factory- applied security baselines with standards-based configuration and native audit and self-healing. This ensures continuous compliance. 3 Data Protection: The Nutanix platform offers native data-at- rest encryption with key management, FIPS 140-2 validated encryption modules, and replication and recovery planning. 4 Regulatory and Compliance: In a compliance context, the Nutanix platform supports alignment with regulatory policies, including HIPAA, PCI DSS, NIST, GDPR, and more. The platform also offers Audit and Reporting Tools. Nutanix software delivers native AES-256-XTS-based data-at-rest encryption. It is hypervisor agnostic, supported on AHV, ESXi, and Hyper-V, and offers flexibility. It also enables encryption at the cluster or container level for ESXi and Hyper-V but only at the cluster level only for AHV. Key Features of Nutanix's AOS Platform NOTE: NIST is now the defacto globally, with Nutanix's encryption design approach including three main options: native software- based encryption FIPS-140-2 Level-1 (*released in 5.5), Self-Encrypting Drives (SED) (FIPS-140- 2 Level-2), and Software + Hardware Encryption. This is configured at either the cluster or container level and is dependent on the hypervisor type. This is the cluster level for deployments using SED-based encryption, as the physical devices are encrypted. Review the information below to learn more about robust compliance and frameworks alignment. For more, visit nutanix.com/trust. Data-at-Rest Encryption Implementation Nutanix ensures data security through comprehensive data-at-rest encryption across its nodes and storage infrastructure. Review the dynamic illustration below to learn more. C O NT I NU E Centralized Native KMS Nutanix provides native key management—a local key manager (LKM)—and storage capabilities (introduced in 5.8) as an alternative to other dedicated Key Management Servers (KMS) solutions. These were introduced to negate the need for a dedicated KMS solution and simplify the environment. However, external KMS solutions are still supported. Key management is a critical piece of any data encryption solution. Multiple keys are used throughout the stack to provide a secure KMS solution. There are three types of keys: Data Encryption Keys (DEKs) are used to encrypt the data. Key Encryption Keys (KEKs) are used to encrypt the DEK. Master Encryption Key (MEK) are used to encrypt the KEK and are only applicable when using the LKM. This graphic illustrates how Nutanix's Prism Central KMS centralizes encryption key management. A centralized system stores and manages encryption keys, and remote clusters rely on the centralized KMS for encryption key management. NOTE: For more information on Nutanix KMS solutions, visit: https://portal.nutanix.com/page/documents/de tails?targetId=Nutanix-Security-Guide-v6_7:wc- security-data-encryption-wc-c.html. C O NT I NU E Storage Policy: VM Encryption Nutanix solutions set and manage storage policy (encryption) templates by departments, applications, and businesses. For example, QoS (IOPS) may be the setting for specific, mission-critical VMs. Nutanix solutions enable encryption for specific VMs moving from the dev environment into production by migrating workloads across clusters. This can be done without worrying about underlying storage container configuration. Nutanix encryption keys are only different across containers. VM1, CTR1, and Policy 1 can all use a different key from VM2, CTR2, and Policy 2. This simplifies storage configuration for multi-cluster-focused configurations. Review the illustration below to learn more. This graphic illustrates how Nutanix Prism Central manages storage policies centrally and applies them flexibly across multiple clusters, ensuring secure and efficient data management. C O NT I NU E Nutanix AOS Security Nutanix AOS Security provides robust built-in security features that help reduce risk and operational costs, making it easier to maintain a secure infrastructure. Review the following question, choose the best answer, and then select Submit. Which of the following best describes key AOS Platform features? Single-factor authentication, data protection, static set configuration, and external-based key management (KMS) The AOS Platform provides an OpsLog, which stores data. This is the main key feature of the platform. Multi-factor authentication, data protection, standards-based configuration, and native key management (KMS) The AOS provides granular latency, which improves performance. This is the main key feature of the platform. SUBMIT Lesson 8 of 11 Disaster Recovery and Backup Data Protection, Recovery, and Resiliency Comprehensive Business Continuity and Disaster Recovery (BCDR) Solution Let's take a moment to review the architecture of Nutanix Data Protection & Disaster Recovery. Nutanix's architecture offers elements of Business Continuity and Disaster Recovery (BCDR) and DR and Integrated Backup. Each of these solutions are zero- touch, with no manual intervention required to ensure data protection. This means that customers only need to click once within Prism to failover in the event of a disaster. This is a significant improvement over traditional disaster recovery and data protection solutions, where orchestrating a response to a disaster is arduous, with lots of manual intervention required. Select Start, then utilize the arrows to review key concepts and illustrations of Nutanix's Data Protection and Disaster Recovery Architecture. Nutanix Data Protection and Disaster Recovery Architecture Details Highlight 1 Consuming Nutanix DR Technology If a secondary data center is CapX-heavy (capital expenditure-heavy), Nutanix DR technology can be consumed as a disaster recovery as a solution or a DRaaS-managed service solution. Note that Nutanix DR Technology backs up and archives data using popular data protection software. Highlight 2 Wholistic BC/DR Viewpoint Nutanix’s integrated storage backup solution offers one-click data protection and archival in the event of a failover, with failback and auto-tests. Nutanix solutions automatically orchestrate recovery plans and provide stable ecosystem partners for backup with multi-site replication. Continue Select Continue to learn more about data resiliency in the Nutanix Data Protection and Disaster Recovery Architecture. C O NT I NU E True Resiliency f rom A to Z and Beyond When we think about data resiliency in the Nutanix context, it should be noted that you're in great hands! The Nutanix system was designed to handle multiple failures, from disk drives to whole rack failures. Ideally, you would never have to use disaster recovery while running Nutanix software due to its self-healing nature. However, if a disaster occurs, Nutanix provides a wide variety of protections based on customer requirements and needs. From protecting individual VMs to orchestrating full data center outage recovery, Nutanix covers it all. We even offer cloud solutions that help spin up new environments and remove the management burden of a secondary data center. Review the graphic below to learn more. Nutanix provides robust, multi-layered resiliency solutions. This graphic details how data is protected at various levels, from individual hardware components to entire sites, and the cloud. Review the following question, choose the best answer, and then select Submit. How is Nutanix Data Protection and Disaster Recovery consumed and managed? As a DRaaS-managed service with tech that backs up and archives data using data protection software As a configuration and encryption system based on data scores that determine the hot spot availability Both of these are correct None of these are correct SUBMIT Lesson 9 of 11 Disaster Recovery Deep Dive Snapshots, Clones, and Disaster Recovery Local and Remote Snapshots A snapshot is a point-in-time state of entities like VM or Volume Groups and is typically used for the restoration and replication of data. You can generate snapshots and store them either locally or remotely. They are essentially a mechanism to capture delta changes that have occurred over time. Snapshots are primarily used for data protection and disaster recovery—in the sense that they depend on the underlying VM infrastructure and other snapshots for restoration—because they are not autonomous like backups. Ultimately, snapshots consume fewer resources compared to a full autonomous backup and capture the following: 1 The State – This includes the power state of VMs, such as powered-on, powered-off, or suspended. 2 The Data – This includes all of the files that make up the VM, as well as all data from disks, configurations, and devices, like virtual network interface cards. Select each hotspot (+) below to learn more. Highlights Nutanix snapshots create unlimited local VM snapshots and include policy-based snapshot management. They also offer application and crash-consistent policies with support from and replication across multiple hypervisors. Benefits Nutanix snapshots come complete with self-service file restore functionality. This ensures that performance is not impacted through redirects on write and ensures efficient storage utilization. This also reduces virtualization licensing costs. To ensure data protection and disaster recovery, data can be copied in three different ways from a primary site (where it lives) to a secondary site (backup location). C O NT I NU E Snapshots and Clones Nutanix's AOS provides native support for offloaded snapshots and clones. This can be leveraged via VAAI, ODX, nCLI, REST, Prism, etc. Both snapshots and clones use an effective and efficient redirect-on-write algorithm. A vDisk is composed of extents, which are logically contiguous chunks of data stored in extent groups. The extent groups are physically contiguous data stored as files on storage devices. When a snapshot or clone is taken, the base vDisk is marked immutable and another vDisk is created as read or write. This ensures that both vDisks have the same metadata map of the vDisk and its extents. Select each tab below to learn more. CL ON E S SH A DOW CL ON E S The same methods are used for both snapshots and/or clones of a VM or vDisk(s). When a VM or vDisk is cloned, the current block map is locked, and clones are created. These updates are metadata only, so no I/O takes place. This also applies to clones of clones. Essentially, the previously cloned VM acts as the “Base vDisk." Upon cloning, that block map is locked, and two new “clones” are created: one for the VM being cloned and another for the new one. There is no imposed limit on the maximum number of clones. This allows for distributed caching of particular vDisks or VM data in a "multi-reader" scenario. For example, during a virtual desk infrastructure (VDI) deployment, many "linked clones" will forward read requests to a central master or "Base VM." In the case of VMware View, this is called the replica disk and is read by all linked clones. In XenDesktop, this is called the MCS Master VM. CL ON E S SH A DOW CL ON E S With Shadow Clones, AOS will monitor vDisk access trends as it does for data locality. However, with Shadow Clones, requests occur from more than two remote CVMs (as well as the local CVM), and all of the requests are read I/O. In this case, the vDisk will be marked as immutable and can then be cached locally by each CVM making read requests (also known as Shadow Clones of the base vDisk). This allows VMs on each node to read the Base VM’s vDisk locally. In the case of virtual desktop infrastructure (VDI), this means the replica disk can be cached by each node, and all read requests for the base will be served locally. Nutanix handles snapshots and clones by managing metadata and block mappings to efficiently create read-only and writable copies of virtual disks without duplicating data unnecessarily. Review the dynamic illustration below to learn more. 1. No Snapshot: The base virtual disk (vDisk) is directly mapped to data blocks (A1, B1, C1, D1). 2. Snapshot Taken: A snapshot of the base vDisk is created (vDisk1). This read-only snapshot points to the same data blocks as the base vDisk. 3. Block Updated: When a block (e.g., D1) is updated, the new data (D2) is written to a new location. The base vDisk continues to point to the original blocks, while vDisk1 points to the updated blocks. 4. Clone(s) with New Block(s) Written: When a clone (Clone 1) is created from the base vDisk, it starts with the same block map. Any new data written (E1, F1) by the clone is stored separately, allowing both the base vDisk and the clone to maintain their integrity. C O NT I NU E Disaster Recovery Orchestration Prism Central provides a single web console for monitoring and managing multiple clusters. Nutanix protection policies and recovery plans are available in Prism Central for AHV and ESXi to orchestrate operations around migrations and unplanned failures. You can apply orchestration policies from a central location, ensuring consistency across all sites and clusters. Select each checkbox as you review the list items below. Availability Zones – Nutanix uses availability zones to manage new protection policies and recovery plans. On-premises availability zones include all Nutanix clusters managed by one Prism Central instance. An availability zone can also represent a region in Nutanix DRaaS. Disaster Recovery – For disaster recovery, availability zones exist in pairs, either on-premises to on-premises or on-premises to the cloud. Clusters deployed in the cloud work the same way as on-premises availability zones. Paired Zones – After an on-premises environment is paired to a cloud-based availability zone, Nutanix DRaaS can be leveraged. There are multiple Nutanix DRaaS subscription plans available, so customers don't need to pay the full cost associated with buying a secondary cluster up-front. This also reduces the time required to manage and operate additional infrastructure. Review the following question, choose the best answer, and then select Submit. What is the purpose of a snapshot in the context of data storage? Snapshots refer to the updates made regarding the state of entities like VMs or Volume Groups, and they are ongoing and unending at all times, remotely. A snapshot is a record of the architecture of a data storage platform or solution that can be used to repair and restore data during a failover or outage. A snapshot is a point-in-time state of entities like VMs or Volume Groups used to restore and replicate data. Snapshots can be generated/stored locally or remotely. Snapshots are automated reports on the overall architecture of a data storage platform or solution that the system generates periodically. SUBMIT C O NT I NU E Key Recovery and Resilience Features The table below highlights important features of Nutanix's recovery and resilience capabilities, detailing each feature's additional information, highlights, and benefits. Review the table below to learn more. Additional Information Highlights Benefits Recovery Plan – Nutanix orchestrates restoring protected VMs at a backup location—either on-premises or Nutanix DRaaS- 1-Click Reduce risk based. All specified VMs can be Failover, with recovered at once or, with what is Failback, and Nutanix's essentially runbook functionality Testing are easy-to-use using power-on sequences. This key highlights policy-based enables optional configuration of of Nutanix approach to interstage delays to recover disaster Disaster applications in the required order. recovery Recovery Recovery plans that restore orchestration. (DR). applications in Nutanix DRaaS can also create the required networks during failover and assign public- facing IP addresses to VMs. Embedded Scripts – Recovery Auto-protect Recover plans allow protected VMs to run applications from a custom-embedded scripts. If you using Nutanix disaster have NGT installed, a custom Categories immediately script can be used to perform and or days after varied customizations, like Protection an attack. changing desktop wallpaper or Policies. You updating existing management can also software after a failover. On orchestrate failover, the recovery plan provides Recovery a variety of options to change or maintain VM IP addresses. The Plans plan will show the last four digits (runbooks). of the VM's new IP address so users can take the correct actions. IP Mapping – Offset-based IP mapping tracks the last octet of VM IP addresses so they can be Restore apps automatically maintained or selectively or changed based on the subnet site-wide configuration of the recovery plan. using If the source and destination Nutanix's subnets are the same, the IP unique data address can be maintained. If the recovery destination subnet is changed to a architecture. new value, the new subnet keeps the last four digits. Synchronizing and Recovery – Nutanix Recovery plans created at one provides a location can synchronize with the unified, paired location and work bi- consumer- directionally. After the VMs failover grade from the primary location to the interface via recovery location, the same Prism. recovery plan can be used to return the VMs to the original location using the failback mechanism. After a recovery plan is created, it can be validated and tested to ensure the system can recover if it needs to failover. Snapshot Availability – Recovery plans don't reference protection policies in their configuration information or create snapshots. However, during an unplanned Recover from failover, they rely on the the latest availability of snapshots at the recovery designated recovery location. point or a These can either be manually previous replicated snapshots or point in time automatically created and to ensure replicated using protection smooth data policies. Keep in mind that a recovery planned failover doesn't require without lost snapshots to be present at the information. recovery location in advance. The network mapping portion of the recovery plan allows for full and partial subnet failover for Nutanix DRaaS. Subnet Fails – If the source and destination availability zones have the same network mapping, the full subnet fails over so applications can retain their IP addresses. This means the domain name system (DNS) does not need to be scripted or updated. When an administrator runs a planned full subnet failover, even the MAC addresses are retained. This is crucial when dealing with applications licensed through MAC and IP addresses. If AHV's built-in IPAM is used, no installation is required in the guest VM to retain the addresses. If a DHCP server is used or VMs are set up with manual IP addresses, the NGT software package can simply be installed. NGT maintains network info regardless of the hypervisor. Disaster Recovery Orchestration Nutanix Prism Central provides a unified platform for configuring, monitoring, and managing disaster recovery operations across primary and secondary sites, ensuring data protection and business continuity. Review the dynamic illustration to learn more. 1. Prism Central includes configuration, reporting, and operations. 2. The Primary Site is where user workloads and data are stored and processed. The Secondary Site is the backup location where data is replicated for disaster recovery purposes. 3. Data and configurations are synchronized between the primary and secondary sites to ensure that the secondary site can take over in the event of a failure at the primary site. C O NT I NU E vSphere: AHV Cross-Hypervisor Disaster Recovery Cross-hypervisor disaster recovery (CHDR) provides an ability to migrate the VMs from one hypervisor to another (ESXi to AHV or AHV to ESXi) by using protection domain semantics. This includes protecting VMs, generating snapshots, replicating the snapshots, and recovering the VMs from the snapshots. Review the requirements and limitations of CHDR in the list below. Then, we'll look at a quick example. Requirements: Requirement 1: Nutanix supports only VMs with flat files. Nutanix does not support vSphere snapshots or delta disk files. This means that delta disks attached to the VM are lost during a migration. Please note that Nutanix does not support VMs with attached volume groups or shared virtual disks. Requirement 2: Nutanix supports IDE/SCSI and SATA disks only. PCI/NVMe disks are not supported. CHDR is supported on both sites with all AOS versions that are not end of life (EOL). Requirement 3: Users must set the SAN policy to OnlineAll for all the Windows VMs and all non-boot SCSI disks so they can automatically be brought online. To learn more about setting SAN policy, review Bringing Multiple SCSI Disks Online. Requirement 4: For migration from AHV to ESXi, automatic reattachment of iSCSI-based volume groups (VGs) fails after VMs are migrated to ESXi. vCenter and, in turn, the Nutanix cluster do not obtain the IP addresses of VMs. Instead, after migration, users should manually reattach VGs to VMs. Requirement 5: Users should not enable VMware vSphere Fault Tolerance (FT) on VMs that are protected with CHDR. If it is already enabled, users should disable VMware FT, as it does not allow registration of VMs enabled with FT on any ESXi node after migration. This also results in a failure or crash of Uhura service. To learn more about the OS supporting UEFI and Secure Boot, review UEFI and Secure Boot Support for CHDR. Limitations: Limitation 1: CHDR does not support metro availability and cannot perform CHDR on a vCenter VM. Limitation 2: CHDR has limitations depending on the type of replication configured in a cluster. To learn more about limitations, review Limitations of Data Protection with Asynchronous Replication or Limitations of Data Protection with NearSync Replication. Select each tab to learn more about an example of a vSphere scenario. SCE N A RIO CH A L L E N G E SOL U T ION B E N E FIT S A U.S.-based Agriculture Firm with more than 40 different sites has decided to use a mix of AHV & ESX Hypervisor for their data storage needs. SCE N A RIO CH A L L E N G E SOL U T ION B E N E FIT S The firm needs a single solution to backup and recover all of its sites at one central digital location. SCE N A RIO CH A L L E N G E SOL U T ION B E N E FIT S Nutanix's CHDR enables this customer's hypervisor choice while also providing the ability to restore any of the VMs to the Primary Datacenter on AHV. SCE N A RIO CH A L L E N G E SOL U T ION B E N E FIT S Nutanix solutions lower the TCO from AHV and provide the ability to restore the entire site locally. They also provide standardization across the environment. C O NT I NU E AHV Metro Availability and Replication Metro Availability synchronously replicates data to alternate sites, ensuring that real-time data copies exist at different locations. During a disaster, VMs can fail over from a primary site to a secondary site, guaranteeing nearly 100% uptime for applications. Metro Availability is a continuous solution that provides a global file system namespace across a container stretched between Nutanix clusters. Synchronous storage replication across independent clusters supports the stretched container using the protection domain and remote site constructs. This enables synchronous replication at the container level, with all VMs and files stored in a container replicate concurrently to another Nutanix cluster. Select each hotspot (+) below to learn more. Protection Domains Each Metro Availability protection domain maps to one container. Multiple protection domains can be created to enable different policies, including bi-directional replication. In bi-directional replication, each cluster replicates synchronously to another. This protects mission-critical applications against complete site failure. Metro Availability supports a one-to-one relationship between protection domains, meaning that each container can replicate to one other remote site. Containers have two primary roles while enabled for Metro Availability: Active and Standby. Active containers replicate data synchronously to standby containers. The Witness There is Zero Touch, an automated witness-based failover with Nutanix. Failover between sites can be manual or automatic with the use of a witness. While the connection between Site 1 and Site 2 must have a round-trip time (RTT) of five milliseconds or less, the witness between sites can have an RTT of up to 200 ms. The witness is simply a VM deployed in a separate failure domain. It uses only a small amount of resources and must reside on a different network than both Site 1 and Site 2. The witness protects against network partitions and primary and secondary site failures. 1-Click live migration of VMs between sites allows greater flexibility for tasks like DC maintenance or Disaster Avoidance. Existing Features Metro Availability works in conjunction with existing Nutanix data management features and is policy-based for predictable outcomes. Existing features include compression, erasure encoding, deduplication, and tiering. Metro Availability also enables compression over the network for synchronous replication traffic between sites, reducing the total bandwidth required. Nutanix solutions are easy to set up and manage, removing the burden of specialized personnel and equipment. There is 0 RPO and near 0 RTO using synchronous replication, and REST-API allows Nutanix workflows to be included as a part of a larger DR runbook. Metro Availability ensures high availability and disaster recovery by synchronously replicating data between a primary and a secondary site, with live migration and failover capabilities, all managed and monitored by Prism Central. C O NT I NU E Multi-Site DR and Replication Encryption Nutanix allows remote sites to be set up and selected for use as simple backup or as both backup and disaster recovery. Remote sites are a logical construct. First, users should configure an AHV cluster—either physical or cloud-based—that functions as the snapshot destination and as a remote site from the perspective of the source cluster. On this secondary cluster, the user should similarly configure the primary cluster as a remote site before snapshots from the secondary cluster begin to replicate to it. By configuring backup on Nutanix, users can use its remote site as a replication target. Data can be backed up to this site with snapshots retrieved from it for local restoration. Please note that failover protection cannot be enabled by running failover VMs directly from the remote site. Backup does support using multiple hypervisors. DR options can be configured to use the remote site as both a backup target and a source. This enables dynamic recovery, with failover VMs running directly from the remote site. Nutanix also provides cross-hypervisor DR between AHV and ESXi clusters. Select each hotspot (+) below to learn more. Multi-Site DR Nutanix Multi-Site DR highlights include: Addresses zero data loss DR DR testing Datacenter migration Data corruption restore Nutanix Multi-Site benefits include: Eliminates complex installation and administration Avoids vendor and hypervisor lock-in DR Replication Encryption DR Replication Encryption is an enterprise-readiness feature for competitive parity that helps bolster Nutanix's security story. It is an easy- to-use replication traffic encryption feature that is hypervisor-agnostic. It uses system-managed keys and gRPC TLS for the control path and TCP TLS for the data path. In version 6.8, replication and encryption will be enabled by default. C O NT I NU E Review the following question, choose the best answer, and the select Submit. What is the primary purpose of or use case for Nutanix DRaaS? To protect on-prem VMs via a managed, onsite service so users can manually choose and manage their level of protection based on workload requirements To maximize efficiency and performance by providing data on-demand based on workload requirements with a dedicated, onsite DR setup To achieve flexibility at scale with dynamic configuration and encryption without losing hot spot accessibility or experiencing data loss To protect on-prem VMs to a managed service in the cloud so users can choose their level of protection based on workload needs without a dedicated DR site SUBMIT Lesson 10 of 11 Resources Additional resources for AOS Technical Training can be found here: Definitive Guide to HCI This document describes how Nutanix Cloud Platform works and how HCI is the foundation to make it all happen. XPRESS HCI - eBook Top 20 HCI Questions Answered XPRESS A Brief History of Nutanix Spot page on Dheeraj Pandey's writings on Nutanix history. A great read! HISTORY Test Drive Check out the Nutanix platform yourself. Available for customers to explore. TEST DRIVE Lesson 11 of 11 Feedback We greatly appreciate any comments or feedback you can provide to help us improve the learning experience for the Sales community. Please select the following link to provide feedback to our team. Feedback for Learning Academy on this course