Lecture 2: HW/SW Co-Design Real-Time AI PDF

Duration: 13 Sessions Graduate Program Fall 2023 HW/SW Co-Design Real-Time AI Instructor: Dr. Amin Safaei Fall 2023 Xilinx 2021 Department: Computer Science Duration: 180 min Graduate Program Fall 2023 Course Introduction (Recap) Instructor: Dr. Amin Safaei Fall 2023 Xilinx 2021 Department: Computer Science 1.1 Inference Compute and Memory • Across a Spectrum of Neural Networks Quenton Hall, Accelerating AI Camera Development with Xilinx VITIS Michaela Blott, Hotchips 2018 1.1 Inference Compute and Memory (Cont.) • AI Compute Compared to CPUs and GPUs Quenton Hall, Accelerating AI Camera Development with Xilinx VITIS 1.1 Inference Compute and Memory (Cont.) • ZYNQ 7000 • ZYNQ UltraSCALE Xilinx 2021 Xilinx 2021 1.2 Vitis AI Xilinx 2021 1.4 PYNQ • PYNQ is a Framework Xilinx 2021 Python Productivity for Zynq 1.4 PYNQ • PYNQ-Z2 Board Xilinx 2021 Python Productivity for Zynq Duration: 180 min Graduate Program Fall 2023 Introduction to AI, basics of Machine Learning and Deep Learning Instructor: Dr. Amin Safaei Fall 2023 Xilinx 2021 Department: Computer Science 1.1 Artificial Intelligence, Machine Learning and Deep Learning • Artificial Intelligence, or Al, is the branch of computer science concerned with making computers or systems behave like humans [Xilinx 2021]. • Machine Learning is the branch of Al where algorithms learn and improve through the use of training data, rather than using explicit programming. Unlike traditional programming which involves coding a specific set of rules, machine learning involves learning from data, not from rules [Xilinx 2021]. • Deep Learning is a branch of machine learning that uses multiple layers of neural networks to process a given input, as does the human brain [Xilinx 2021]. • CNN • RNN • MNN https://www.intel.ca/content/www/ca/en/artificial-intelligence/posts/differencebetween-ai-machine-learning-deep-learning.html 1.2 Introduction to Machine Learning Machine Learning is a subfield of artificial intelligence. It involves designing a system than can learn, make decisions, and predict based on the given data set. What is deep learning? Deep learning is a branch of machine learning that use multi-layer neural networks Learning can be categorized into: • Supervised • Semi-Supervised • Unsupervised https://www.intel.ca/content/www/ca/en/artificial-intelligence/posts/differencebetween-ai-machine-learning-deep-learning.html 1.3 Machine Learning Applications • Rapidly evolving technology • ML Applications • Deployed in both cloud and edge • Accelerated using Xilinx FPGAs Xilinx 2021 1.3 Machine Learning Applications (Cont.) • Edge ML Application • Cloud ML Application Xilinx 2021 1.3 Machine Learning Applications (Cont.) • Deep Learning Applications 1.4 Deep Neural Network • Neural Network • Multiple layer to process the information • Layers are highly interconnected • Each processing node has its own small sphere of knowledge • Answer read from the output layer Xilinx 2021 1.4 Deep Neural Network (Cont.) • Neural Network • Multiple layer to process the information • Layers are highly interconnected • Each processing node has its own small sphere of knowledge • Answer read from the output layer • Tuning a network/model requires a large amount of computational power • Example: 2000 GPUs were used to train the Google translate model • Increased computing power • Huge amounts of training data • Efficient network models Xilinx 2021 1.4 Deep Neural Network (Cont.) • Neural Network • Forward Propagation • Backward Propagation Xilinx 2021 1.4 Deep Neural Network (Cont.) • Deep Learning Approaches • Supervised Learning • Supervised learning is based on labeled data sets. • In the supervised learning approach the network parameters will be modified iteratively for better approximation of the desired outputs. • Deep neural networks (DNN) • Convolutional neural networks (CNN) • Recurrent neural networks (RNN), including long short-term memory (LSTM), and gated recurrent units (GRU) • Semi-Supervised Learning • Semi-supervised learning is based on partially labeled data sets. • Recurrent neural networks, including long short-term memory, and gated recurrent units are used for semi- supervised learning. • Unsupervised Learning • Unsupervised learning is based on no data labels. • RNNs, such as LSTM and reinforcement learning (RL), are also used for unsupervised learning 1.4 Deep Neural Network (Cont.) • Convolutional Neural Networks (CNN) • Trained by using a large set of labeled data • MINST: Handwritten Digits – 60,000 samples • CASIA: Handwritten Characters – 3.9 millions samples • Cifar-10: 10 classes – 6000 images each MINST CASIA CIFAR-10 1.4 Deep Neural Network (Cont.) • Convolutional Neural Networks (CNN) Xilinx 2021 1.4 Deep Neural Network (Cont.) • Convolutional Neural Networks (CNN) • CNN – Designed to process images or pixel data • Convolution – Mathematical operation on two functions to produce a third function • Convolves learned features with input data • Uses 2D convolutional layers • Suited for image processing Xilinx 2021 1.5 Machine Learning • Training • Learning a new capability from existing Data • Inference • Applying this capability to New Data Xilinx 2021 1.5 Machine Learning (Cont.) • Training • Learning a new capability from existing Data • Inference • Applying this capability to New Data Xilinx 2021 1.5 Machine Learning (Cont.) • Machine Learning Model • Efficient Deep Learning Architecture • A machine learning model is an artifact created through a training process. • A model uses training data with known answers to train from • After training, a model is used to predict outcomes based on a new set of inputs. Xilinx 2021 1.5 Machine Learning (Cont.) • Training • Training is the process of tuning a machine learning model to have better accuracy • An iterative process is used to feed back error information. • The derivative of the cost function (a measure of the error) is iteratively minimized to increase accuracy. Xilinx 2021 1.5 Machine Learning (Cont.) • Training • The training process is usually done with GPUs. • GPUs are strong in training due to both cost and non-optimized (full precision) implementations. • GPUs also benefit from high-bandwidth matrix multiply capabilities, which are critical in training neural networks • Algorithms are usually in terms of floating point, and it is much quicker to train on GPUs which can be done on AWS. • After lots of data are passed through the model, it develops a set of weights and biases ultimately resulting in a “trained model”. • A trained model can then be used for inference, but it has to be evaluated for accuracy before it can be deployed. • Xilinx solutions focus on inference, or the deployment stage. At this stage, the models will already have been trained using the training data sets https://www.fast.ai/2017/11/16/what-you-need/ 1.5 Machine Learning (Cont.) • Training Data Examples • MINST: Handwritten Digits – 60,000 samples • CASIA: Handwritten Characters – 3.9 millions samples • Cifar-10: 10 classes – 6000 images each • Google Street View – 600,000 samples • ImageNet – 800 Image Each • Public Data Sets: • For general scopes: ImageNet, VOC, COCO • For facial detection and recognition: FODB, Wider Face,LFW, MegaFace, MS-1M • For automotive: Kitti, Cityscapes, BDD100K MINST CIFAR-10 CASIA Google Street View 1.5 Machine Learning (Cont.) • Inference • Inference uses a trained machine learning model to predict and/or estimate outcomes from new observations. Xilinx 2021 1.5 Machine Learning (Cont.) • Inference • Inference uses a trained machine learning model to predict and/or estimate outcomes from new observations. Xilinx 2021 1.6 Convolutional Neural Networks (CNN) • CNN • Most popular deep neural network • Designed to process images and pixel data ➢Convolution (Conv) ➢Pooling (Pool) ➢Fully connected (FC) • First Layer: Detects low-level features • Middle Layer: Extract the features from the previous layers • Last Layer: Output Xilinx 2021 1.6 Convolutional Neural Networks (CNN) • CNN • Convolutions https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1 1.6 Convolutional Neural Networks (CNN) • CNN • Convolutions https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1 1.6 Convolutional Neural Networks (CNN) • CNN • Padding https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1 1.6 Convolutional Neural Networks (CNN) • CNN • Striding https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1 1.6 Convolutional Neural Networks (CNN) • CNN • Pooling https://towardsdatascience.com/understanding-convolutions-and-pooling-in-neural-networks-a-simple-explanation-885a2d78f211 1.6 Convolutional Neural Networks (CNN) • CNN • RelU https://towardsdatascience.com/understanding-convolutions-and-pooling-in-neural-networks-a-simple-explanation-885a2d78f211 1.6 Convolutional Neural Networks (CNN) • Convolution: Mathematical operation on two functions to produce a third function • Convolution Filter: Helps to produce feature map following striding and padding • Pooling: Reduce the matrix dimensionality by half • Pooling Layer: Inserted between successive conv layers • Reduces number of network parameters and computations • Rectified Linear Layer Unit (RELU): An activation function in the neural network • Ramp function • CNN Output: Used as input to FC, classification layer Xilinx 2021 1.6 Convolutional Neural Networks (CNN) • • • • AlexNet VGGNet GoogleNet with Inception module Residual Network Xilinx 2021 1.7 AlexNet • AlexNet composed of 8 layers • • • • • 5 Convolution (Conv) layers • 3 fully connected (FC) layers • Softmax layer 60 million parameters 650,000 neurons Main innovation of AlexNet is Rectified Linear Units (ReLU) as the activation function RelU layer placed after each and every CONV and FC layer Input samples: 227×227×3 Total Number of Wights: 61M First convolution layer: Size of 11, stride is 4 Output first layer: 290,400; 55×55×96 364 Weights: 11×11×3 = 363+1 https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf 1.8 Visual Geometry Group (VGGNet) • VGG architecture consists of several layers • Convolutional layers that use the RELU activation • Following the activation function is a single max pooling layer (which uses a 2x2 size) • Softmax layer • There are three models available in VGGNet: • VGG-11 (which has 11 layers) • There are eight convolutional layers in this network • VGG-16 (which has 16 layers) • This network contains 13 convolutional layers. • VGG-19 (which has 19 layers) • This network contains 16 convolutional layers. https://ethereon.github.io/netscope/#/preset/alexnet This network is the most computationally expensive model. It contains 138M weights and 15.5M MACs. 1.9 GoogleNet • Inception layer • GoogleNet has introduced a new layer called the inception layer. • It uses convolutions of different sizes to capture details at varied sizes, such as 5x5, 3x3, and 1x1. • GoogLeNet consists of 22 layers in total, far greater than any other network. However, the number of network parameters used here is much lower than its predecessors, such as AlexNet or VGG. • GoogLeNet has 7 million network parameters, fewer compared to AlexNet (60M) and VGG-19 (138M). • GoogLeNet’s computation is 1.53G MACs, which is far lower than AlexNet and VGG https://ethereon.github.io/netscope/#/preset/googlenet 1.10 Residual Network (ResNet) • The error rate for this network is 3.75%, which is much lower than the human error rate. • ResNet comes in many different numbers layers 34, 50, 101, 152, and even 1202, out of which ResNet50 is the most popular. • ResNet50 consists of: • 49 convolution layers • One fully connected layer at the end of the network • The total number of weights and MACs are 25.5M and 3.9M, respectively. https://github.com/KaimingHe/deep-residual-networks 1.11 Convolutional Neural Networks (CNN) • Details and Performance of Each Model: Xilinx 2021 Next Section • • • • • Deep Learning Challenges Deep Learning Framework AI Optimizer Team Members First Lab with PYNQ

Lecture 2: HW/SW Co-Design Real-Time AI PDF

Document Details

Tags

Related

Summary

Full Transcript