Summary

This document details the configuration of Cluster Protect, a feature of Nutanix Cloud Clusters running on AWS. It outlines the entities protected by the feature, the prerequisites, and the processes for using Cluster Protect. The document includes illustrations and diagrams to aid in understanding the concepts.

Full Transcript

# Cluster Protect Configuration Nutanix Cloud Clusters (NC2) on AWS run on bare metal instances that capitalize on local NVMe storage. A data loss risk might occur in case of failures caused by scenarios, including but not limited to Availability Zones (AZ) failures or users terminating bare-metal...

# Cluster Protect Configuration Nutanix Cloud Clusters (NC2) on AWS run on bare metal instances that capitalize on local NVMe storage. A data loss risk might occur in case of failures caused by scenarios, including but not limited to Availability Zones (AZ) failures or users terminating bare-metal nodes from the AWS console. ## Cluster Protect Features With the Cluster Protect feature, you can protect your NC2 cluster data, including Prism Central configuration, UVM data, and volume group data, with snapshots stored in AWS S3 buckets. When using Cluster Protect, you can recreate your cluster with the same configurations and recover your UVM data and Prism Central configuration from S3 buckets. The Cluster Protect feature thus helps ensure business continuity even in the event of a complete AWS AZ-level failure. ## Entities Protected by Cluster Protect With Cluster Protect, you can protect the following entities: - VM configuration and data snapshots, such as VM Disks and Volume Groups. - Prism Central configuration data. - Flow Network Security policies. - DR configurations, such as Protection Policies, Recovery Plans, and VM categories. - Data snapshots from EBS volumes that are attached to the cluster. When NC2 on AWS runs AOS 6.8 or later, you can also protect your virtual Trusted Platform Module (VTPM) enabled AHV UVMs in all the replication schedules and perform their disaster recovery (DR) when planned or unplanned failure events occur. You can store snapshots of vTPM-enabled VMs in the S3 bucket. To use the Cluster Protect feature, you must set up two Amazon S3 buckets, one to back up the UVM and Volume Groups data, and another to back up the Prism Central configuration. You must also use AOS or NCI Ultimate licensing tier. You can use an S3 bucket to protect your Prism Central configuration through the Prism Central UI using the Point-in-Time Backup feature. It schedules backups of your Prism Central deployment (and various service configurations within it) in intervals to an S3 bucket on AWS, allowing recovery to specific points in time. ## Cluster Protect Illustration The image shows a scenario where multiple VMs run on various NC2 clusters within the same AZ. At least one Prism Central instance runs on one of the clusters and is configured to manage multiple NC2 clusters in the same AZ. * **Availability Zone 1**: Represents the AWS Availability zone. * **VM**: Represents the virtual machine. * **Cluster A, B, C, and D**: Represent NC2 clusters in the Availability Zone. * **Prism Central (PC)**: Represents a Prism Central instance. * **Amazon S3**: Represents an AWS S3 bucket. * **Bucket for PC State**: Represents a bucket for storing the Prism Central configuration data. * **Bucket for Cluster State & Guest VMs**: Represents a bucket for storing UVM and Volume Group data. ## Cluster Protect in case of Failure Events In the event of a failure that impacts multiple clusters, you can first recover a Prism Element cluster that will be used to recover Prism Central (if the failure event impacted Prism Central) and then recover the remaining failed Prism Element clusters and their associated VMs and Volume Groups from the backups in the S3 buckets. If the failure is not AZ-wide and Prism Central of the impacted cluster is hosted on another cluster and that Prism Central is not impacted, then you can restore the impacted cluster from that existing Prism Central. ## Additional Notes * With Cluster Protect, all the VMs in a cluster are auto-protected using a single category value, hence, recovered by a single Recovery Plan. A single Recovery Plan can recover up to 1000 entities. * Nutanix does not support multiple recovery plans in parallel, irrespective of the number of entities in the recovery plan. * Currently, Cluster Protect can protect up to **five** NC2 clusters registered with one Prism Central that is in the same AWS AZ. * Cluster Protect can protect the following services and recover the associated metadata: * Leap * Flow Network Security * Prism Pro (AIOps) * VM management * Cluster management * Identity and Access Management (IAMv1) * Categories * Networking * The following services continue to run, though they are not protected, so data associated with them is not recovered. * Nutanix Files * NCM Self-Service * LCM * Nutanix Kubernetes Engine * Objects * Catalog * Images * VM templates * Reporting Template ## Prerequisites for Cluster Protect You must meet the following requirements to use the Cluster Protect feature: - The AOS version **must be 6.8 or higher**, and the Prism Central version **must be pc.2024.1 or higher**. - The license tier **must be AOS Ultimate or NCI Ultimate.** - Subnets used for Prism Central and Multicloud Snapshot Technology (MST) **must be different than the UVM subnet.** - Clusters to be protected by Cluster Protect **must be registered with the same Prism Central instance.** These clusters **must be in the same AWS AZ as Prism Central.** - Prism Central, which manages protected clusters, **can also be protected by Prism Central Disaster Recovery.** - Two new AWS S3 buckets **must be manually created for:** - **UVM Snapshots**: S3 bucket for storing UVM snapshots - **Prism Central Backup and Restore:** S3 bucket for storing Prism Central backup data. - **Nutanix Guest Tools (NGT) must be installed on all UVMs.** - You **must re-run the CloudFormation script** if you have already added your AWS account in the NC2 console so that the IAM role that has the required permissions to access only the S3 buckets with the nutanix-clusters prefix comes into effect. - When creating a DHCP pool in Prism Element, **ensure that at least 3 IP addresses are kept outside the DHCP pool for MST.** - If you choose to use IPs from the DHCP pool, you can run the following aCLI command to reserve the IPs in a network from the DHCP pool: ``` acli net.add_to_ip_blacklist <network_name> ip_list=ip_address1,ip_address2 ``` - While deploying Prism Central, **do not change the Microservices Platform (MSP) settings** because these are required to enable MST. You must choose Private network (defaults) in the MSP configuration when prompted. - **You must not use managed networks** for CMSP clusters with Cluster Protect enabled. CMSP cluster is deployed in the VXLAN/kPrivateNetwork mode only. ## Limitations of Cluster Protect Understand the following limitations while using the Cluster Protect feature: - The Cluster Protect feature and Protection Policies **cannot be used at the same time** in the same cluster to protect the data. If a user-created protection or DR policy already protects a VM or Volume Group, it cannot also be protected with the Cluster Protect feature. If you need to use DR configurations for a cluster, you must use those protection policies instead of Cluster Protect to protect your data. A new DR policy creation fails if the cluster is already protected using the Cluster Protect feature. - You **cannot hibernate or terminate** the clusters that are protected by the Cluster Protect feature. You must disable Cluster Protect before triggering hibernation or termination. - All clusters being protected **must be in the same Availability Zone.** The protected cluster and the recovered cluster must be in the same Availability Zone. - Prism Central **must be deployed within the same Availability Zone** as the clusters it is protecting. - The Cluster Protect feature is **available only for new cluster deployments.** - A cluster **cannot be recovered** if your NC2 on AWS cluster uses Flow Virtual Networking. - A recovered VDI cluster might **consume more storage space** than the initial storage space consumed by the protected VDI cluster. This issue might arise because the logic that efficiently creates VDI clones is inactive during cluster recovery. This issue might also occur if there are multiple clones on the source that are created from the same image. As a workaround, you can add additional nodes to your cluster if your cluster runs out of space during the recovery process. ## Protecting NC2 Clusters Follow these steps to protect your NC2 clusters: 1. **Get ready for cluster protection:** - Create clusters in **a new VPC or an existing VPC** using the NC2 console. **Ensure that you select the option to protect the cluster.** - Create **two new S3 buckets** in the AWS console. - Deploy Prism Central on one of these clusters and then **register the remaining NC2 clusters with Prism Central.** You can register the cluster with an existing Prism Central. - You **must upgrade Prism Central to pc.2024.1 or later** before using Prism Central Backup and Restore to protect Prism Central configuration. 2. **Protect the Cluster:** - Protect Prism Central data configuration using the Point-in-Time Backup feature on the Prism Central UI. - Enable the Multicloud Snapshot Technology (MST) by running CLI commands to protect UVM data. - Protect NC2 clusters by running CLI commands. You can protect your NC2 clusters even without protecting the Prism Central instance that is managing these NC2 clusters; however, Nutanix recommends protecting your Prism Central instance as well. 3. **Validate:** - After you complete all of these steps, wait for an hour, and then check that at least one backup of Prism Central has been completed. One Prism Central backup must be completed after backing up the UVM data so that protection policies, recovery points, and so on created during UVM backups are included in the Prism Central backup. - Validate that the Prism Central replication to the S3 bucket has happened successfully. To do so, check the Prism Central protection status from the Prism Central web console. ## Creating S3 Buckets You must set up two new Amazon S3 buckets with the default settings, one to back up the UVMs and volume group data, and another to back up the Prism Central configuration. These S3 buckets must be empty and exclusively used only for UVMs, volume groups, and Prism Central backups. Ensure that public access to these S3 buckets is blocked by default. - **S3 bucket for UVM snapshots**: The S3 bucket that will be used for storing UVM data **must have the bucket name prefixed with nutanix-clusters.** If the S3 bucket name does not have the nutanix-clusters prefix, the commands to protect the cluster fail. - **S3 bucket for Prism Central configurations:** The new S3 bucket that you create for backing up the Prism Central configuration data **must have the mandatory configurations** listed in the Configuring Amazon S3 Object Lifecycle and Policies the Prism Central Infrastructure Guide. ## Protecting Prism Central Configuration You can protect your Prism Central configuration to an S3 bucket through the Prism Central UI using the Point-in-Time Backup feature. It schedules backups of your Prism Central deployment and various service configurations within it to an S3 bucket on AWS, allowing recovery from specific points in time. - Configure the Prism Central backup to protect your Prism Central configuration to an S3 bucket. - Ensure that you have created a new AWS S3 bucket for Prism Central backup and configured it as listed in Configuring Amazon S3 Object Lifecycle and Policies. ## Disabling Cluster Protect Follow these steps to disable Cluster Protect for Prism Central and NC2 clusters: 1. Check the Prism Central protection status from the Prism Central Web console. 2. Disable the Prism Central backup and restore by following the steps in Disabling Prism Central Backup and Restore. 3. Run the following command to disable cluster protection for an NC2 cluster: ``` nutanix@pcvm$ clustermgmt-cli unprotect-cluster -u cluster_uuid ``` Replace cluster_uuid with the UUID of the NC2 cluster for which you want to disable Cluster Protect. You can find the UUID listed as Cluster ID under General in the cluster Summary page in the NC2 console. ## Recovering NC2 Clusters When you have protected your NC2 clusters and the associated Prism Central by following the steps listed in Protecting NC2 Clusters, and you observed a cluster failure due to reasons, such as AWS Availability Zones (AZs) failures or all nodes from the EC2 management console being shut down, you can configure cluster recovery to recover your NC2 clusters and Prism Central. - **Before you initiate the recovery of an NC2 cluster, ensure that you have protected Prism Central, deployed Multicloud Snapshot Technology, and protected UVMs and volume groups.** Also, after completing these cluster protection steps, wait for one hour and then check that at least one backup of Prism Central is completed. One Prism Central backup must be completed after backing up the UVM data so that protection policies, recovery points, and so on created during UVM backups are included in the Prism Central backup. - The protected cluster and the recovered cluster **must be in the same Availability Zone.** ## Setting Clusters to Failed State When a protected cluster fails, and you need to recover it, you need to first set the cluster to the Failed state to initiate the recovery process. - **The NC2 console automatically detects if an EC2 instance is deleted and then flags the cluster status as Failed.** However, the cluster might fail for any reason that NC2 might not detect. Therefore, it is recommended that these steps be performed to set the cluster to a Failed state whenever a failed cluster needs to be recovered. - **Sign in to the NC2 console:** - Sign in to https://my.nutanix.com using your My Nutanix credentials. - Select the correct workspace from the Workspace dropdown list that you want to use for NC2. - Go to Cloud Services > Nutanix Cloud Clusters (NC2). - Click Launch. - On the Clusters page, click the name of the cluster you want to set to the Failed state. - Ensure that the cluster Summary page shows the Cluster Protect field under General settings as Enabled. - On the Settings page, click the Advanced tab. - Under Cluster Failure, click Set Cluster to Failed State. - On the confirmation page, click Yes, Set Cluster State to Failed. - Ensure that the cluster status is changed to Failed for the cluster on the Clusters page. - Go to the cluster Summary page to validate that the Cluster Recovery workflow is displayed. ## Recreating a Cluster When a protected cluster fails, and you set the cluster state to Failed, you need to redeploy the cluster. - **Sign in to the NC2 console:** - Sign in to https://my.nutanix.com using your My Nutanix credentials. - Select the correct workspace from the Workspace dropdown list that you want to use for NC2. - Go to Cloud Services > Nutanix Cloud Clusters (NC2). - Click Launch. - On the Clusters page, click the name of the failed cluster that you want to redeploy. - On the cluster Summary page, under Cluster Recovery, click Start Cluster Recovery. - Click Recreate Cluster. - Review or specify the following details on the Cluster Configuration page, and then click Next. - **Cluster Name:** Enter a name for the cluster. The recovery cluster name must be different than the failed cluster and will be enforced by the NC2 console during recovery cluster creation. - **Cloud Account, Region, and Availability Zone**: These configurations from the failed cluster that you are recreating are displayed. Your recovery cluster will use the same configuration. - **Under Network Configuration:** - When manually created VPC and subnets are used to deploy the failed cluster, the previously used resources are displayed. You must recreate the same VPC and subnets that you had previously created in your AWS console. - When VPC and subnets created by the NC2 console are used to deploy the failed cluster, the NC2 console will automatically recreate the same VPCs and subnets during the cluster recovery process. - Review the cluster summary on the Summary page, and then click Recreate Cluster. - To continue the cluster recovery process, click Go to new cluster to navigate to the redeployed cluster. - The failed cluster gets terminated. - The cluster Summary page of the newly created cluster shows the cluster status as Creating Cluster. - After the cluster is created, the status changes to Recovery in Progress. ## Recovering Prism Central and MST In case the failure was not AZ-wide, and one of the clusters protecting Prism Central survived, you must restore Prism Central from the cluster that it was backed up to rather than from S3. - **If all clusters protecting Prism Central are unavailable**, then recover Prism Central from S3. **Steps to recover Prism Central:** 1. Ensure that you have redeployed a cluster to the Failed state. 2. Restore Prism Central from the S3 bucket where Prism Central configuration was backed up. 3. Track the Prism Central recovery status in the Tasks section on the recreated Prism Element console. **Redeploying Multicloud Snapshot Technology** - After recreating NC2 clusters and redeploying Prism Central, you must redeploy Multicloud Snapshot Technology before recovering UVM and Volume Groups data from the S3 bucket where they were backed up. - The recovery Prism Central configuration must be recovered from the Prism Central S3 bucket before recovering the UVM data on the recovery clusters. **Steps to redeploy Multicloud Snapshot Technology:** 1. Sign in to the Prism Central VM using the credentials provided while installing Prism Central. 2. Run the following CLI command: ``` nutanix@pcvm$ clustermgmt-cli deploy-cloudSnapEngine --recover -b S3_bucket -r AWS_region -i IP1,IP2,IP3 -s PC-Subnet ``` **Recovering UVM and Volume Groups Data** Before recovering the UVM and Volume Groups data from the S3 bucket where they were backed up, ensure the following requirements are met: - NC2 clusters are recreated. - Prism Central is redeployed. - Multicloud Snapshot Technology is redeployed. - Disaster Recovery must be enabled. **Steps to recover UVM and Volume Groups Data:** 1. Sign in to the Prism Central VM using the credentials provided while installing Prism Central. 2. Run the following command to get a list of NC2 clusters that were registered with the recovery Prism Central prior to failure: ``` nutanix@pcvm$ nuclei cluster.list ``` 3. Recreate subnets on the recovery Prism Elements. - Run the following command to list all the subnets associated with the protected Prism Elements: ``` nutanix@pcvm$ clustermgmt-cli list-recovery-info -u UUID_OldPE ``` - Recreate these subnets on the recovery Prism Elements in the same way they were created in the first place. 4. Run the following command to create a Recovery Plan to restore UVM data from the S3 buckets: ``` nutanix@pcvm$ clustermgmt-cli create-recovery-plan -o UUID_OldPE -n UUID_NewPE ``` 5. Execute the recovery plans: - Sign in to Prism Central using the credentials provided while installing Prism Central. - Go to Data Protection > Recovery Plans. You can identify the appropriate recovery plan to use by looking at the recovery plan name. - Start the recovery plan execution by triggering a failover. 6. Go to Compute & Storage > VMs to see the list of recovered VMs. 7. Run the following command on all NC2 clusters that are recovered after the cluster failure to remove the category values and protection policies associated with the old clusters that no longer exist: ``` nutanix@pcvm$ clustermgmt-cli finalize-recovery -u UUID_OldPE ``` - Manually turn on all UVMs. ## CLI commands The following table lists CLI commands you can use for the Cluster Protect feature end-to-end workflow. While these commands and their examples are already listed in the respective procedures, this table intends to provide additional reference information. >*Note: In the NC2 on AWS clusters running AOS 6.8 or later and Prism Central pc.2024.1, the backup and recovery of Prism Central configuration using CLI commands has been replaced with the Point-in-Time Backup option on the Prism Central UI.* | Purpose | Command | Command available on | Flags | Description | |:---|:---|:---|:---|:---| | Create a recovery plan | nutanix@pcvm$ clustermgmt-cli create-recovery-plan | Prism Central VM | -h, --help | Help for create-recovery-plan command. | | | nutanix@pcvm$ clustermgmt-cli create-recovery-plan -n 00000000-0000-0000-0000-000000000000 -o 00000000-0000-0000-0000-000000000001 --output string | | -n, --new_cluster_uuid string | UUID of the new recovery NC2 cluster | | | nutanix@pcvm$ clustermgmt-cli create-recovery-plan -n 00000000-0000-0000-0000-000000000000 -o 00000000-0000-0000-0000-000000000001 -- output string | | -o, --old_cluster_uuid string | UUID of the old failed NC2 cluster | | | | | --output string | Supported output formats: ['default', 'json'] (default "default") | | Deploy MST, which can be used to protect NC2 clusters | nutanix@pcvm$ clustermgmt-cli deploy-cloudSnapEngine | Prism Central VM | -b, --bucket string | Name of the S3 bucket that will be used to store the backup of NC2 clusters. | | | | | -h, --help | Help for the deploy-cloudSnapEngine command. | | | nutanix@pcvm$ clustermgmt-cli deploy-cloudSnapEngine -b nutanix-clusters-XXXXX-XXXXX-XXXXX-r us-west-1 -i 10.0.xxx.11,10.0.xxx.12,10.0.xxx.13 -s PC-Subnet | | --recover | Deploys MST using old configuration data, if available on Prism Central. If old configuration data is unavailable, the provided or default config is used for the deployment. | | | | | -r, --region string | Name of the AWS region where the provided S3 bucket exists. | | | | | -i, --static_ips strings | Comma-separated list of 3 static IPs that are part of the same subnet specified by the subnet_name flag. | | | | | -s, subnet_name string | Name of the subnet which can be used for MST VMs. | | | | | --output string | Supported output formats: ['default', 'json'] (default "default") | | Delete or clean up the failed MST deployments | nutanix@pcvm$ clustermgmt-cli delete-cloudSnapEngine | Prism Central VM | -f, --force | Force delete the MST. | | | | | -h, --help | Help for delete-cloudSnapEngine command. | | | | | --output string | Supported output formats: ['default', 'json'] (default "default") | | Mark completion of recovery of a cluster | nutanix@pcvm$ clustermgmt-cli finalize-recovery | Prism Central VM | -u, --cluster_uuid string | UUID of the old NC2 cluster. | | | nutanix@pcvm$ clustermgmt-cli finalize-recovery -u 00000000-0000-0000-0000-000000000001 --output string | | -h, --help | Help for the finalize-recovery command. | | | nutanix@pcvm$ clustermgmt-cli finalize-recovery -u 00000000-0000-0000-0000-000000000001 -- output string | | --output string | Supported output formats: ['default', 'json'] (default "default") | | Get a list of recovery information, such as subnets that were available on the original (failed) NC2 cluster. | nutanix@pcvm$ clustermgmt-cli list-recovery-info | Prism Central VM | -u, --cluster_uuid string | UUID of the NC2 cluster. | | | | | -h, --help | Help for the list-recovery-info command. | | | nutanix@pcvm$ clustermgmt-cli list-recovery-info -u 00000000-0000-0000-0000-000000000001 --verbose -- output string| | --verbose | With the verbose flag, a detailed JSON output is returned. If the verbose flag is not specified, only the important fields, such as subnet name, IP Pool ranges, and CIDR, are returned. | | | nutanix@pcvm$ clustermgmt-cli list-recovery-info -u 00000000-0000-0000-0000-000000000001 --output string | | --output string | Supported output formats: ['default', 'json'] (default "default") | | Protect clusters against AZ failures by backing up the clusters in AWS S3. | nutanix@pcvm$ clustermgmt-cli protect-cluster | Prism Central VM | -u, --cluster_uuid string | NC2 cluster UUID | | | | | -h, --help | Help for the protect-cluster command. | | | | | -l, --local_snapshot_count int | Local snapshot retention count. The default count is 2. | | | | | -r, --rpo int | Protection RPO in minutes. The default RPO is 60 minutes. | | | | | --output string | Supported output formats: ['default', 'json'] (default "default") | | Unprotect a cluster | nutanix@pcvm$ clustermgmt-cli unprotect-cluster | Prism Central VM | -u, --cluster_uuid string | UUID of the NC2 cluster. | | | | | -h, --help | Help for the unprotect-cluster command. | | | nutanix@pcvm$ clustermgmt-cli unprotect-cluster -u 00000000-0000-0000-0000-000000000001 --output string | | --output string | Supported output formats: ['default', 'json'] (default "default") |

Use Quizgecko on...
Browser
Browser