Podcast
Questions and Answers
AHV leverages a traditional storage stack, similar to ESXi or Hyper-V, for managing disk I/O.
AHV leverages a traditional storage stack, similar to ESXi or Hyper-V, for managing disk I/O.
False (B)
The iSCSI redirector in AHV uses NOP commands to periodically check the health of Stargates within the cluster.
The iSCSI redirector in AHV uses NOP commands to periodically check the health of Stargates within the cluster.
True (A)
QEMU is configured to directly use the Stargate service as the iSCSI target portal.
QEMU is configured to directly use the Stargate service as the iSCSI target portal.
False (B)
IDE devices are the preferred controller type for virtual disks in AHV for optimal performance.
IDE devices are the preferred controller type for virtual disks in AHV for optimal performance.
In the traditional I/O path, the QEMU main loop processes I/O requests concurrently using multiple threads.
In the traditional I/O path, the QEMU main loop processes I/O requests concurrently using multiple threads.
Frodo, also known as AHV Turbo Mode, is disabled by default on VMs powered on after AOS 5.5.X.
Frodo, also known as AHV Turbo Mode, is disabled by default on VMs powered on after AOS 5.5.X.
Frodo accelerates I/O by replacing the QEMU main loop with vhost-user-scsi.
Frodo accelerates I/O by replacing the QEMU main loop with vhost-user-scsi.
With Frodo enabled, a VM with one vCPU will create two Frodo threads per disk device to maximize throughput.
With Frodo enabled, a VM with one vCPU will create two Frodo threads per disk device to maximize throughput.
Acropolis IP Address Management (IPAM) relies on traditional DHCP servers in an 'unmanaged' network configuration.
Acropolis IP Address Management (IPAM) relies on traditional DHCP servers in an 'unmanaged' network configuration.
Acropolis IPAM uses VXLAN and OpenFlow to intercept and respond to DHCP requests, providing IP address management.
Acropolis IPAM uses VXLAN and OpenFlow to intercept and respond to DHCP requests, providing IP address management.
VM High Availability in AHV is triggered immediately after the Acropolis Leader detects a disruption in the libvirt connection to a host.
VM High Availability in AHV is triggered immediately after the Acropolis Leader detects a disruption in the libvirt connection to a host.
In an AHV cluster, VM HA guarantees VM restarts within 5 minutes of a host failure.
In an AHV cluster, VM HA guarantees VM restarts within 5 minutes of a host failure.
In Guarantee mode, AHV HA restarts all VMs if sufficient resources are available, but does not reserve resources to guarantee restart capability.
In Guarantee mode, AHV HA restarts all VMs if sufficient resources are available, but does not reserve resources to guarantee restart capability.
When using Guarantee mode with all containers at RF2, AHV reserves resources equivalent to two host's worth of memory.
When using Guarantee mode with all containers at RF2, AHV reserves resources equivalent to two host's worth of memory.
Segment-based reservations distribute the HA resource reservation evenly across all hosts in the cluster.
Segment-based reservations distribute the HA resource reservation evenly across all hosts in the cluster.
In AHV, only host based reservations are automatically implemented when the Guarantee HA mode is selected.
In AHV, only host based reservations are automatically implemented when the Guarantee HA mode is selected.
When a local CVM's Stargate recovers, remote Stargates immediately transfer all iSCSI sessions back to the local Stargate.
When a local CVM's Stargate recovers, remote Stargates immediately transfer all iSCSI sessions back to the local Stargate.
In AHV, a virtual machine's OS sends SCSI commands directly to the physical storage devices.
In AHV, a virtual machine's OS sends SCSI commands directly to the physical storage devices.
If the Acropolis Leader is running remotely, VXLAN tunnel will be leveraged to handle the request over the network for DHCP requests using Nutanix IPAM solution.
If the Acropolis Leader is running remotely, VXLAN tunnel will be leveraged to handle the request over the network for DHCP requests using Nutanix IPAM solution.
The algorithm used to determine the total number of reserved segments and per host reservation is called MTKM.
The algorithm used to determine the total number of reserved segments and per host reservation is called MTKM.
Flashcards
AHV Storage I/O Path
AHV Storage I/O Path
AHV does not use a traditional storage stack, passing disks to VMs as raw SCSI block devices for a lightweight I/O path.
iSCSI Redirector
iSCSI Redirector
An iSCSI redirector on each AHV host checks Stargate health using NOP commands.
QEMU and iSCSI Redirector
QEMU and iSCSI Redirector
QEMU is configured with the iSCSI redirector as the target, which redirects login requests to a healthy Stargate.
Preferred Controller Type
Preferred Controller Type
Signup and view all the flashcards
Frodo I/O Path
Frodo I/O Path
Signup and view all the flashcards
Frodo Key Features
Frodo Key Features
Signup and view all the flashcards
Acropolis IPAM
Acropolis IPAM
Signup and view all the flashcards
AHV VM High Availability
AHV VM High Availability
Signup and view all the flashcards
Acropolis Leader Role in HA
Acropolis Leader Role in HA
Signup and view all the flashcards
HA Modes in AHV
HA Modes in AHV
Signup and view all the flashcards
Resource Reservations for HA
Resource Reservations for HA
Signup and view all the flashcards
Segment Based HA
Segment Based HA
Signup and view all the flashcards
Study Notes
- AHV does not use a traditional storage stack and passes disks to VMs as raw SCSI block devices
Storage I/O Path
- AOS handles backend configuration of kvm, virsh, qemu, libvirt, and iSCSI, abstracting these from the user
- Each AHV host runs an iSCSI redirector which monitors the health of Stargates via NOP commands
- The iscsi_redirector log shows the health of each Stargate
- The local Stargate is shown via its internal address (192.168.5.254)
iSCSI Configuration
- The iSCSI redirector listens on 127.0.0.1:3261
- QEMU is configured with the iSCSI redirector as the iSCSI target portal
- Upon a login request, the redirector performs an iSCSI login redirect to a healthy Stargate (preferably the local one)
iSCSI Multi-Pathing
- Virtio-scsi is the preferred controller type (default for SCSI devices)
- IDE devices are possible but not recommended
- Virtio drivers, Nutanix mobility drivers, or Nutanix guest tools must be installed for Windows to use virtio
- Modern Linux distros ship with virtio pre-installed
- If the active Stargate goes down, the iSCSI redirector marks it as unhealthy and redirects the login to another healthy Stargate
Local CVM Handling
- Once the local CVM’s Stargate comes back up, the remote Stargate quiesces and kills connections to remote iSCSI sessions
- QEMU then attempts an iSCSI login again and is redirected to the local Stargate
Traditional I/O Path
- VMs perform SCSI commands to virtual devices
- Virtio-scsi places requests in the guest’s memory
- The QEMU main loop handles these requests
- Libiscsi inspects each request and forwards it
- The network layer forwards requests to the local CVM (or externally if the local CVM is unavailable)
- Stargate handles the requests
Network Sessions
- qemu-kvm establishes sessions with a healthy Stargate using the local bridge and IPs
- For external communication, the external host and Stargate IPs are used
- There is one session per disk device
Inefficiencies
- The main QEMU loop is single-threaded
- libiscsi inspects every SCSI command
Frodo I/O Path (AHV Turbo Mode)
- Frodo is an optimized I/O path for AHV that allows for higher throughput, lower latency, and less CPU overhead
- It is enabled by default on VMs powered on after AOS 5.5.X
Frodo I/O Process
- VMs perform SCSI commands to virtual devices
- Virtio-scsi places requests in the guest’s memory
- Frodo handles the requests
- A custom libiscsi appends the iSCSI header and forwards the requests
- The network layer forwards requests to the local CVM (or externally if the local one is unavailable)
- Stargate handles the requests
Key Differences
- Replaces the QEMU main loop (qemu-kvm) with Frodo (vhost-user-scsi)
- Exposes multiple virtual queues (VQs) to the guest (one per vCPU) and leverages multiple threads for multi-vCPU VMs
- The standard libiscsi is replaced by a lightweight version
Performance
- VMs will have multiple queues for disk devices
- Achieves up to 3x performance increases compared to Qemu
- Results in a CPU overhead reduction of 25% for I/O
- Frodo processes are visible for each running VM (qemu-kvm process)
vCPU Considerations
- To take advantage of multiple threads / connections, a VM must have >= 2 vCPUs when powered on
- 1 vCPU UVM: 1 Frodo thread / session per disk device
-
= 2 vCPU UVM: 2 Frodo threads / sessions per disk device
- Frodo establishes sessions with a healthy Stargate using the local bridge and IPs
IP Address Management
- Acropolis IP address management (IPAM) solution allows for the establishment of a DHCP scope and the assigning of addresses to VMs
- It uses VXLAN and OpenFlow rules to intercept DHCP requests and respond with a DHCP response
- Traditional DHCP / IPAM solutions can be used in an ‘unmanaged’ network scenario
VM High Availability (HA)
- AHV VM HA ensures VM availability in the event of a host or block outage
- VMs previously running on a failed host will be restarted on other healthy nodes in the cluster
- The Acropolis Leader is responsible for restarting the VMs on the healthy host(s)
- The Acropolis Leader tracks host health by monitoring its connections to libvirt on all cluster hosts
HA Process
- Once the libvirt connection goes down, a countdown to the HA restart begins
- If the libvirt connection fails to be re-established within the timeout, Acropolis restarts VMs that were running on the disconnected host, generally within 120 seconds
- If the Acropolis Leader fails or becomes partitioned, a new Acropolis Leader is elected
- If a cluster becomes partitioned and quorum is achieved VMs restart
HA Modes
- Default: Requires no configuration, VMs restart on available hosts depending on resource availability
- Guarantee: Reserves space on AHV hosts to guarantee that all failed VMs can restart
Resource Reservations
- Guarantee mode reserves host resources for VMs
Resource Amounts:
- If all containers are RF2 (FT1): One “host” worth of resources
- If any containers are RF3 (FT2): Two “hosts” worth of resources
- The system uses the largest host’s memory capacity when determining how much to reserve per host when hosts have uneven memory capacities
Segment Based Reservations
- Resource reservations are now segment based and are implemented when Guarantee HA mode is selected (5.0+)
- Reserve segments distributes the resource reservation across all hosts
- Each host shares a portion of the reservation for HA, ensuring the cluster has failover capacity
- VMs are restarted throughout the cluster on the remaining healthy hosts
Reservations Calculation
- The system automatically calculates the total number of reserved segments and per-host reservation
- The algorithm used for memory reservations is called MTHM
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.