VMware Private AI Foundation with NVIDIA PDF

**VMware Private AI Foundation with NVIDIA** - Generative AI and Large Language Models 1. Artificial Intelligence - Mimicking the intelligence or behavioral pattern of humans or any other living entity 2. Machine Learning - Computer can learn from data, without using a complex set of different rules. Mainly based on training a model with datasets 3. Deep Learning - A technique to perform machine learning inspired by our brain's own network of neurons 4. Generative AI -- form of LLMs offer human like creativity reasoning and language understanding 5. Revolutionized natural language processing tasks, enabling machines to understand, generate and interact with human language in a human-like manner 6. LLMS Examples -- (chat)GPT-4, MPT, Vicuna and Falcon have gained popularity because of their ability to process vast amounts of text data and produce coherent and contectually relevant responses 7. Components of LLMs - Deep-learning (transformers) neural nets - Hardware accelerators - Machine learning software stack - Pre-training tasks - Fine-tuning tasks - Inference (prompt completion) tasks A screenshot of a computer Description automatically generated **Architecture and Configuration of NVIDIA GPUs in Private AI Foundation** - GPUs are preferred over CPUs to acceleate computational workloads in modern high-performance computing (HPC) and machine learning or deep learning landscapes - A GPU has significantly more cores than a CPU -- can be used for processing tasks in parallel - Tolerant of memory latency because it is designed for higher throughput - Works with fewer, smaller cache layers because it has more components dedicated to computation - Comparison 8. CPU only virtualization - Apps & vms - Hypervisor - Server 9. NVIDIA with GPU - Configuration Modes - Dynamic DirectPath (I/O) passthrough mode - Entire GPU is allocated to a specific VM-based workload - Nvidia vGPU (Shared GPU) - Multiple running vm workloads (or Tanzu worker node VMs) on a host have direct access to parts of the physical GPU at the same time - Time-Slicing Mode - Workloads share a physical GPU and operate in series - vGPU processing scheduled between multiple VM-based workloads using best effort, equal shares, or fixed shares - Default Setting - Supported by NVIDIA A30,A100 and H100 Devices - Can be configured to support one VM to one full GPU or one VM to Multiple GMs - Best Used for - Resource contention is not a priority - Max GPU utilization by running as many workloads/VMs as possible - 100% of cores given to a single workload for a fraction of a second - Use for large workloads that need to consume more than one physical PU device - MIG Mode (Multi-Instance GPU Mode) - Fractions a physical GPU into multiple smaller GPU instances - Helps to maximize utilization of GPU devices - Fractioned to a max of 7 slices individually represent as a vGPU profile - Isolates internal hardware resources and pathways in a GPU device - Enabled by the nvidia-smi command at the ESXi host level, after the NVIDIA host vGPU manager driver is installed - Best Used For - Multiple workloads that need to operate in parallel - Use when GPU resources need to be run by multiple VMs in parallel - Allocate 1-7 physical slices of GPU to a single workload - Workloads that need secure, dedicated and predictable level of performance - Apps and VMs - Nvidia Computer Driver (Guest OS) - Nvidia Host software (VIB) - VMware vSphere - Nvidia GPU - Nvidia-certified system - Configurations - Workflow to Configure a NVIDIA GPU in VCF 10. ESXi Host Configuration - Add NVIDIA GPU PCIe Device(s) - Enable SR-IOV - Pre-Mage ESXI with NVIDIA VIB - Enable MIG Mode if desired 11. SDDC Manager Configuration - Commission Host(s) - Into VCF Inventory - Cluster Assignment - Assign Host(s) to a workload domain cluster 12. VM/TKG Configuration - Configure VGPU Profile - Allocate vGPU resources (time sharing or MIG) - Configure NVIDIA Guest Driver - Install & Configure NVIDIA Guest Driver to the Workload - Assigning a VGPU Profile to a VM -- Time Slicing 13. Default - Equal shares of GPU resources based on preconfigured profiles - Consists of 4 parts - Assigning a VGPU Profile to a VM -- MIG 14. 1-7 Slices - Consists of 5 parts - Creating VM Class for a TKG Worker Node VM 15. Tanzu Kubernetes Grid Work node VM with a GPU, you must create a VM CLASS ![A screenshot of a computer Description automatically generated](media/image2.png) - Nvidia GPUDirect RDMA - 10x performance - Direct comm between NVIDIA GPUs - Remote Direct Memory Access (RDMA) gives direct access to GPU Memory A diagram of a computer Description automatically generated GPU's for Machine Learning - Graphics processing units (GPUs) are preferred over CPUs to accelerate computational workloads in modern high-performance computing (HPC) and machine learning or deep learning landscapes: - Latency versus throughput: CPUs are optimized to reduce latency for processing tasks in a serialized way. GPUs focus on high throughput volumes. - A GPU has significantly more cores than a CPU. These additional cores can be used for processing tasks in parallel. - The GPU architecture is tolerant of memory latency because it is designed for higher throughput. - A GPU works with fewer, relatively small memory cache layers because it has more components dedicated to computation Nvidia NVLINK ![A diagram of a computer system Description automatically generated](media/image4.png) - Piece of hardware that allows high-speed connection between multiple GPUs on the same server - Provides simplified device consumption with device groups - Available on VCF 5.1 - Groups of multiple PCIe devices share a common PCIe switch or a direct interconnect (NVLINK). - It is defined at the hardware layer and presented to vSphere. - It is added to a virtual machine as a single unit. NVIDIA NVSwitch - Connects multiple NVLinks to provide all-to-all GPU comm at full NVlink speed in a single node and between nodes. Increases the speed of GPU-to-GPU comm for larger AI/ML workloads A screenshot of a computer Description automatically generated - Up to 8 GPUs can be on the same host - All 8 GPUs can be allocated (or a subset) to the same VM with the vSPhere device-group capability - Comm Traffic & CPU overhead are significantly reduced ![A screenshot of a computer Description automatically generated](media/image6.png) **Private AI Foundation with Nvidia Architecture and Components** - Platform for provisioning AI workloads on ESXI hosts with NVIDIA GPUs - Configure & control access to AI and machine learning optimized resources for on-demand developer access - Secure & manage the lifecycle of AI infrastructure using familiar tools, without the need to manage a disparate AI/L silo - Use vSphere vMotion migration and DRS initial placement with NVIDIA-powered GPU workloads - Use Cases - Dev - Cloud and Devops engineers provision AI workloads, including Retrieval-augmented Generation (RAG) in the form of deep learning - Data scientists for AI dev - Prod - Cloud admins provide devops engineers with Private AI foundation with NVIDIA env for production ready AI workloads on Tanzu Kubernetes Grid clsuters on vSphere with Tanzu - Components for AI Workloads in Private AI Foundation with NVIDIA - vSphere Lifecycle Manager - All hosts in a cluster require the same GPU devie and image - NVIDIA AI ent suite licensing is required - VCF Tanzu Kubernetes Grid - GPU-enabled TKG vms must be manually powered off before vSphere lifecycle manager ops - Re-instantiate on TKG worker node VM on another host - VCF vSphere Cluster - GPU-enabled VMs must be manually powered off before vSPhere Lifecycle mgr ops - vMotion IS supported for maintenance ops ONLY (non-vSphere lifecycle manager ops) Termonology DirectPath I/O SR-IOV -- single PCIe device under a single root port to appear as multiple separate physical devices to the hypervisor or guest OS Time-slicing -- operate in series. Contention is not a priority aka time sharing Multi-instance GPU -- operate in parallel -- maximize utilization of GPU devices and provides dynamic scalability by fractioning a physical GPU device into smaller instances Nvidia NVSwitch -- ties GPUs Nvidia AI Suite -- cloud-native suite of AI and data analytics software, optimized, certified, includes key enabling tech for rapid deply, mgmt. and scalability Nvidia GPUDirect RDMA -- direct path for data exchange between GPU and 3^rd^ part peer devices using std features of PCIe -- ex: network interface, video or storage adapters NVLink bridge -- hardware that allows conn between mult GPU on the SAME server. Point to point connection between any 2 GPUs A screenshot of a computer Description automatically generated ![A screenshot of a computer Description automatically generated](media/image8.png) A screenshot of a computer Description automatically generated ![A screenshot of a computer Description automatically generated](media/image10.png)

VMware Private AI Foundation with NVIDIA PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue