23-24 - M2AI - DL4CV - 1 - Deep Learning 57-92.pdf
Document Details
Uploaded by CoherentYtterbium
Instituto Politécnico do Cávado e do Ave
Tags
Full Transcript
Training Neural Networks Training means optimizing the parameters so that the network produces outputs equal (or close) to the Groundtruth Parameters are optimized using examples with corresponding Groundtruth Labels...
Training Neural Networks Training means optimizing the parameters so that the network produces outputs equal (or close) to the Groundtruth Parameters are optimized using examples with corresponding Groundtruth Labels Output neurons are supposed to estimate the Groundtruth Labels José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 57 57 Class encoding 2-Class In a 2-class problem (e.g. healthy vs cancer), the label for each sample is either 0 or 1 There is typically only 1 output neuron The output neuron provides: p = probability of class 1 or conversely 1-p = probability of class 0 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 58 58 1 Class encoding Multiclass Labels may be specified as integers Class Person Cat Dog Car Airplane Encoding 0 1 2 3 4 or as “one-hot” vectors Class Person Cat Dog Car Airplane 1 0 0 0 0 0 1 0 0 0 One-hot 0 0 1 0 0 Encoding 0 0 0 1 0 0 0 0 0 1 Neural networks for classification usually generate one-hot vectors on the output José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 59 59 Sigmoid activation Usefull for 2-class classification layer (last layer) Forces values to be between 0 and 1 Enables us to interpret neuron output as class probability 1 𝑆𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 = 1 + 𝑒 −𝑥 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 60 60 2 Sigmoid activation 1 Sigmoid examples – 1 output neuron – 𝑆𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 = 1+𝑒 −𝑥 x e-x 1 + e-x S(x) = 1 / (1 + e-x) -100 2.68811714 x 1043 2.68811714 x 1043 3.72007598 x 10-44 (0.0%) -10 2202.64658 22027.4658 0.00004.53978687 (0.0%) -1 2.71828183 3.71828183 0.268941421 (26.8%) -0.1 1.10517092 2.10517092 0.475020813 (47.5%) 0 1.00000000 2.00000000 0.500000000 (50.0%) 0.1 0.904837418 1.90483742 0.524979187 (53.5%) 1 0.367879441 1.36787944 0.731058579 (73.1%) 10 0.0000453999298 1.00004540 0.999954602 (99.9%) 100 3.72007598 x 10-44 1.00000000 1.00000000 (100%) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 61 61 Softmax activation Usefull for multiclass classification layer (last layer) Forces sum of elements to be equal to 1 Enables us to interpret network outputs as class probability distribution 𝑒 𝑥𝑖 𝑆𝑜𝑓𝑡𝑚𝑎𝑥 𝑥𝑖 = 𝑛 σ𝑗=0 𝑒 𝑥𝑗 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 62 62 3 Softmax activation 𝑒 𝑥𝑖 Softmax example – 3 classes – 3 output neurons – 𝑆 𝑥𝑖 = 𝑥𝑗 σ𝑛 𝑗=0 𝑒 x exi Softmax(x) 47 2.581 x 1020 2.581 x 1020 / 5.443 x 1021 = 0.047 (4.7%) 50 5.185 x 1021 5.185 x 1021 / 5.443 x 1021 = 0.953 (95.3%) 12 1.628 x 105 1.628 x 105 / 5.443 x 1021 = 3.139 x 10-17 (0.0%) Total = 5.443 x 1021 x exi Softmax(x) 3 2.009 x 101 20.086 / 5.185 x 1021 = 3.874 x 10-21 (0.0%) 50 5.185 x 1021 5.185 x 1021 / 5.185 x 1021 = 1.0 (100.0%) 12 1.628 x 105 1.628 x 105 / 5.185 x 1021 = 3.139 x 10-17 (0.0%) Total = 5.185 x 1021 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 63 63 Demo Sigmoid and Softmax examples José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 64 64 4 Convolutional Neural Networks Using fully connected layers is overkill You are trying to look at all the pixels at the same time For data that has spatial continuity (images) it makes more sense to analyze regions of the input data A filter or kernel uses less weights This means less weights to train José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 65 65 2D Convolutions (Grayscale) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 66 66 5 2D Convolutions (Grayscale) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 67 67 2D Convolutions (RGB) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 68 68 6 2D Convolutions (RGB) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 69 69 2D Convolutions (RGB) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 70 70 7 2D Convolutions (RGB) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 71 71 Types of convolutions Normal convolution Normal convolution with Atrous convolution Transpose convolution no padding and stride of 2 https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 72 72 8 Typical CNN structure Building blocks Convolution Activation Pooling Stanford CS231n José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 73 73 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 74 74 9 Activation José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 75 75 Poolling José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 76 76 10 MaxPoolling Another choice would be average pooling José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 77 77 Keras implementation … model.add(Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=input_shape)) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Conv2D(64, (5, 5), activation='relu’)) model.add(MaxPooling2D(pool_size=(2, 2))) … José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 78 78 11 Typical CNN structure Layers have progressively smaller feature maps (due to pooling layers) and more kernels (more feature maps) Building blocks Convolution Activation Pooling Stanford CS231n José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 79 79 Frameworks Popular frameworks: TensorFlow PyTorch Caffe Caffe2 CNTK MatConvNet Theano Torch … José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 81 81 12 Agenda Artificial Intelligence and Computer Vision Application Domains Artificial Intelligence and Computer Vision tasks Machine Learning and Deep Learning Neural Networks Neural Networks for Classification in Computer Vision LeNet, AlexNet GoogLeNet, VGG ResNet Evaluation and Metrics Training Neural Networks Implementation challenges Neural Networks for other Computer Vision tasks More Neural Networks José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 82 82 LeNet (1989) https://medium.com/@pechyonkin/key-deep-learning-architectures-lenet-5-6fc3c59e6f4 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 83 83 13 AlexNet (2012) ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winner 2012 Error rate (Top 5) 15% (Previous state-of-the-art 25%) Acquired by Google Jan 2013 Deployed on Google+ Photo Tagging in May 2013 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 85 85 GoogLeNet (2014) ILSVRC14 Winners: ~6.6% Top-5 error + depth - GoogLeNet: composition of multi-scale dimension-reduced modules + data - VGG: 16 layers of 3x3 convolution interleaved with max pooling + 3 + dimensionality fully-connected layers reduction José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 87 87 14 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 88 88 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 89 89 15 ImageNet Human Performance https://coggle.it/diagram/WgPeVuojMQABBOPO/t/machine-learning José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 90 90 Image Classification Standard Networks AlexNet, VGG, GoogLeNet (Inception), ResNet, SqueezeNet, DenseNet, MobileNet, NASNet, EfficientNet https://github.com/weiaicunzai/awesome-image-classification Standard Datasets ImageNet, MNIST, Fashion MNIST, Pascal VOC, CIFAR10, CIFAR100, KITTI José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 91 91 16 Demo Neural Network Inference with pretrained networks VGG19, ResNet, MobineNet Arbitrary input images José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 92 92 17