Lecture 3 - Fall 2023 - Deep Learning
Document Details
Uploaded by EarnestGreenTourmaline7771
Lakehead University
2023
Tags
Related
Summary
This document is a lecture summary about deep learning concepts, covering key topics such as neural network fundamentals, approaches, and optimization techniques, within the context of computing, using frameworks like TensorFlow and PyTorch.
Full Transcript
Duration: 13 Sessions Graduate Program Fall 2023 HW/SW Co-Design Real-Time AI Instructor: Dr. Amin Safaei Fall 2023 Xilinx 2021 Department: Computer Science Duration: 180 min Graduate Program Fall 2023 Introduction to AI, basics of Machine Learning and Deep Learning (Recap) Instructor: Dr....
Duration: 13 Sessions Graduate Program Fall 2023 HW/SW Co-Design Real-Time AI Instructor: Dr. Amin Safaei Fall 2023 Xilinx 2021 Department: Computer Science Duration: 180 min Graduate Program Fall 2023 Introduction to AI, basics of Machine Learning and Deep Learning (Recap) Instructor: Dr. Amin Safaei Fall 2023 Xilinx 2021 Department: Computer Science 2.1 Deep Neural Network • Neural Network • Multiple layer to process the information • Layers are highly interconnected • Each processing node has its own small sphere of knowledge • Answer read from the output layer Xilinx 2021 • Neural Network Training • Forward Propagation • Backward Propagation Xilinx 2021 2.1 Deep Neural Network • Deep Learning Approaches • Supervised Learning • Supervised learning is based on labeled data sets. • In the supervised learning approach the network parameters will be modified iteratively for better approximation of the desired outputs. • Deep neural networks (DNN) • Convolutional neural networks (CNN) • Recurrent neural networks (RNN), including long short-term memory (LSTM), and gated recurrent units (GRU) • Semi-Supervised Learning • Semi-supervised learning is based on partially labeled data sets. • Recurrent neural networks, including long short-term memory, and gated recurrent units are used for semi- supervised learning. • Unsupervised Learning • Unsupervised learning is based on no data labels. • RNNs, such as LSTM and reinforcement learning (RL), are also used for unsupervised learning 2.3 Deep Neural Network (Cont.) • Convolutional Neural Networks (CNN) • Designed to process images or pixel data • Classification Networks [GoogleNet, AlexNet, ResNet] • Detection Networks [SSD, YOLO, Fast/Faster, R-CNN] • Segmentation Networks [FPN, FCN, SegNet] • Machine Learning Deployment • Training • Inference Xilinx 2021 • CNN • Convolution: Helps to produce feature map following striding and padding • Pooling: Reduce the matrix dimensionality by half • Rectified Linear Layer Unit (RELU): An activation function in the neural network • CNN Output: Used as input to FC, classification layer Xilinx 2021 2.3 Deep Neural Network (Cont.) • AlexNet • VGGNet https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf https://ethereon.github.io/netscope/#/preset/alexnet 2.4 Deep Neural Network (Cont.) • GoogleNet with Inception module • Residual Network https://github.com/KaimingHe/deep-residual-networks https://ethereon.github.io/netscope/#/preset/googlenet 2.5 Convolutional Neural Networks (CNN) • Details and Performance of Each Model: Xilinx 2021 Duration: 180 min Graduate Program Fall 2023 Deep Learning Instructor: Dr. Amin Safaei Fall 2023 Xilinx 2021 Department: Computer Science 3.1 Deep Learning Challenges • AlexNext • Weights: 61 M • MACs: 724 • Major Challenges • Computational Intensive • Memory Bandwidth Intensive • Deployment Power Efficiency Xilinx 2021 3.1 Deep Learning Challenges (Cont.) • AlexNext • Weights: 61 M • MACs: 724 • Major Challenges • Computational Intensive • Deeper Layers • Billions of Compute Operations Per Inference • Memory Bandwidth Intensive • Requires high bandwidth memory access between layers • Faster throughput → Memory bandwidth • Deployment Power Efficiency • Fast growing market adaptions • More power/cost efficient deployment solutions Xilinx 2021 3.1 Deep Learning Challenges (Cont.) • There are several challenges in developing deep learning solutions ✓ Challenge1: Chose the Right Deep Learning Network • Different Networks • Huge Networks • Small Networks • Different Merits • Speed • Latency • Energy • Accuracy Xilinx 2021 3.1 Deep Learning Challenges (Cont.) • There are several challenges in developing deep learning solutions ✓ Challenge1: Chose the Right Deep Learning Network ✓ Challenge2: Billions of multiply-accumulate operations and tens of megabytes of parameter data • Sea Operations • Custom Math • Memory Hierarchy Xilinx 2021 3.1 Deep Learning Challenges ( • There are several challenges in developing deep learning solutions ✓ Challenge1: Chose the Right Deep Learning Network ✓ Challenge2: Billions of multiply-accumulate operations and tens of megabytes of parameter data ✓ Challenge3: Continuous stream of new algorithms • Flexibility • Scalability • Power efficiency • Real-time performance Xilinx 2021 3.2 Deep Learning Frameworks • Comprised of an Interface, Library, or Tool ✓ Allows developers to more easily and quickly build machine learning models ✓ Open-source software ✓ Has built-in algorithms ✓ Trained on some models ✓ Used for deep learning algorithms • Frameworks ✓ Caffe ✓ TensorFlow ✓ Mxnet ✓ Darknet ✓ Keras ✓ PyTorch 3.2 Deep Learning Frameworks (Cont.) • Comprised of an Interface, Library, or Tool ✓ Allows developers to more easily and quickly build machine learning models ✓ Open-source software ✓ Has built-in algorithms ✓ Trained on some models ✓ Used for deep learning algorithms • Example • Image recognition applications • ImageNet: 1000 classes, 800 images each • No need to write algorithms from scratch and perform training Xilinx 2021 3.2 Deep Learning Frameworks (Cont.) • Comprised of an Interface, Library, or Tool Xilinx 2021 3.3 Optimized Network Models • ML tasks and Xilinx Xilinx 2021 3.3 Optimized Network Models (Cont.) • ML tasks and Xilinx Xilinx 2021 3.3 Optimized Network Models (Cont.) • ML tasks and Xilinx ✓ Classification • Example Networks • VGGNet • GoogleNet • ResNet • MobileNet • Applications • Handwritten Digit Recognition, • Attribute Recognition • License Plate Recognition • Evaluation Metric • Accuracy Metric Xilinx 2021 3.3 Optimized Network Models (Cont.) • ML tasks and Xilinx ✓ Classification ✓ Detection • Example Networks • SSD • YOLO • Faster RCNN • Applications • Face detection, • License Plate Detection • Pedestrian • Cyclist • Car • Surveillance/Automotive • Evaluation Metric • Precision, • Mean Average Precision (mAP) Xilinx 2021 3.3 Optimized Network Models (Cont.) • ML tasks and Xilinx ✓ Classification ✓ Detection ✓ Segmentation 1. Semantic Segmentation 2. Instance Segmentation • Example Networks • SegNet • FPN • FCN • Application Xilinx 2021 • Lane Detection, • Drivable Space Detection • Environmental Perception in Automotive • Evaluation Metric • Mean Intersection Over Union (mIOU) 3.3 Optimized Network Models (Cont.) • ML tasks and Xilinx ✓ Classification ✓ Detection ✓ Segmentation 1. Semantic Segmentation 2. Instance Segmentation • Example Networks • SegNet • FPN • FCN • Application Xilinx 2021 • Lane Detection, • Drivable Space Detection • Environmental Perception in Automotive • Evaluation Metric • Average Precision (AP) 3.4 AI Optimizer • Pruning • Most neural networks are typically over-parameterized with significant redundancy to achieve a certain accuracy. • Inference in machine learning is computation intensive and requires high memory bandwidth to meet the low-latency and high-throughput requirements of various applications. • Pruning is the process of eliminating redundant weights while keeping the accuracy loss as low as possible. Xilinx 2021 3.4 AI Optimizer (Cont.) • Xilinx Al optimizer • The Al pruner (or VAI pruner) prunes redundant connections in neural networks and reduces the overall required operations. Xilinx 2021 3.4 AI Optimizer (Cont.) • The benefits are • The Al optimizer can reduce the model size by up to 90% • Runtime is reduced by 1.5 to 10x Xilinx 2021 3.4 AI Optimizer (Cont.) • Pruning Methods • Fine-grained pruning is the simplest form Xilinx 2021 3.4 AI Optimizer (Cont.) • Pruning from Fine-grained and Coarse-grained • Pruning cuts off redundant connections in the neural networks; that is, setting the corresponding weights to 0 Xilinx 2021 3.4 AI Optimizer (Cont.) • Pruning from Fine-grained and Coarse-grained • The zero block is randomly distributed in the kernel. We can also cut off the entire 3D kernel; like this, called coarse-grained pruning. Xilinx 2021 3.4 AI Optimizer (Cont.) • There are three aspects in coarse-grained pruning: • Sparsity Determination • How many flops or weights to prune in each layer. • Sensitive analysis • Accuracy drops significantly after pruning • Channel Selection • More channels a layer has, the more pruning can be done in that layer • Solves the problem of which channel to be pruned in a layer based on various criteria • Accuracy • Accuracy drops after pruning. • Finetuning is the standard method to improve the accuracy 3.4 AI Optimizer (Cont.) • Iterative Pruning • The AI pruner is designed to reduce the number of model parameters while minimizing accuracy loss. • Pruning results in accuracy loss and retraining, or finetuning, recovers accuracy. • Note that the reduction parameter is gradually increased in every iteration to help better recover accuracy during the finetuning stage. Xilinx 2021 3.4 AI Optimizer (Cont.) • Iterative Pruning • Pruning cannot be done to a smaller size at once. Xilinx 2021 3.4 AI Optimizer (Cont.) • The AI optimizer automatically prunes the network models to the desired sparsity and significantly reduces the operations and parameters of CNN models without losing much accuracy. Xilinx 2021 3.4 AI Optimizer (Cont.) • The AI optimizer automatically prunes the network models to the desired sparsity and significantly reduces the operations and parameters of CNN models without losing much accuracy. Xilinx 2021 3.4 AI Optimizer (Cont.) • The AI optimizer automatically prunes the network models to the desired sparsity and significantly reduces the operations and parameters of CNN models without losing much accuracy. Xilinx 2021 3.4 AI Optimizer (Cont.) • The AI optimizer automatically prunes the network models to the desired sparsity and significantly reduces the operations and parameters of CNN models without losing much accuracy. Xilinx 2021 3.4 AI Optimizer (Cont.) • The AI optimizer automatically prunes the network models to the desired sparsity and significantly reduces the operations and parameters of CNN models without losing much accuracy. Xilinx 2021 3.4 AI Optimizer (Cont.) • The AI optimizer automatically prunes the network models to the desired sparsity and significantly reduces the operations and parameters of CNN models without losing much accuracy. Xilinx 2021 3.4 AI Optimizer (Cont.) • Generally, follow these steps to prune a model: 1. Analyze the original baseline model. 2. Prune the input model. 3. Finetune the pruned model. 4. Repeat step 2 and 3 several times, and 5. Transform the pruned sparse model to final dense model Xilinx 2021 3.5 Inference Process • Quantization • Quantization and channel pruning techniques are employed to address these issues while achieving high performance and high energy efficiency with little degradation in accuracy. • Quantization makes it possible to use integer computing units and to represent weights and activations by lower bits, while pruning reduces the overall required operations. Xilinx 2021 3.5 Inference Process • Quantization • Generally, 32-bit floating-point weights and activation values are used when training neural networks. • By converting the 32-bit floating-point weights and activations to 8-bit integer (INT8) format, the AI quantizer can reduce computing complexity without losing prediction accuracy. • The fixed-point network model requires less memory bandwidth, thus providing faster speed and higher power efficiency than the floating-point model. • The AI quantizer supports common layers in neural networks, including, but not limited to, convolution, pooling, fully connected, and batchnorm. Xilinx 2021 3.5 Inference Process • AI Quantizer Flow Xilinx 2021 Next Section • Introduction to the Deep Learning Processor Unit (DPU)