23-24 - M2AI - DL4CV - 1 - Deep Learning 135-145.pdf
Document Details
Uploaded by CoherentYtterbium
Instituto Politécnico do Cávado e do Ave
Tags
Full Transcript
Layer types Core (Input, Dense, Activation…) Convolution (Conv1D, Conv2D, Conv3D…) Pooling (MaxPooling1D/2D/3D, AveragePooling1D/2D/3D, GlobalMaxPooling1D/2D/3D) Reshaping (Reshape, Flatten, Cropping1D/2D/3D...
Layer types Core (Input, Dense, Activation…) Convolution (Conv1D, Conv2D, Conv3D…) Pooling (MaxPooling1D/2D/3D, AveragePooling1D/2D/3D, GlobalMaxPooling1D/2D/3D) Reshaping (Reshape, Flatten, Cropping1D/2D/3D, UpSampling1D/2D/3D, ZeroPadding1D/2D/3D…) Merging (Concatenate, Average, Maximum, Minimum…) Normalization (BatchNormalization, LayerNormalization) Regularization (Dropout, SpatialDropout1D/2D/3D, GaussianDropout, GaussianNoise, …) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 135 135 Layer types Core (Input, Dense, Activation…) Convolution (Conv1D, Conv2D, Conv3D…) Pooling (MaxPooling1D/2D/3D, AveragePooling1D/2D/3D, GlobalMaxPooling1D/2D/3D) Reshaping (Reshape, Flatten, Cropping1D/2D/3D, UpSampling1D/2D/3D, ZeroPadding1D/2D/3D…) Merging (Concatenate, Average, Maximum, Minimum…) Normalization (BatchNormalization, LayerNormalization) Regularization (Dropout, SpatialDropout1D/2D/3D, GaussianDropout, GaussianNoise, …) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 136 136 1 Data Normalization Normalization is changing the range of input values [0,1] [-1,1] mean=0, std_dev=1 This stabilizes the model’s behavior in training and speeds up training Training process becomes: Normalize inputs and outputs Train model with normalized inputs and outputs Inference process becomes: Normalize inputs Run inputs through the model to get normalize outputs Denormalize outputs José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 137 137 Normalization Layers Normalization can be done within the network LayerNormalization Normalize the activations of the previous layer for each given example, applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1. BatchNormalization Normalize the activations of the previous layer across a batch, applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1 Normalization norms L1 L2 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 138 138 2 Demo Train a network with raw data and normalized data José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 139 139 Dropout Main scientific advance of the Deep Learning era AlexNet, NIPS 2012 Randomly cancel features during training Forces the networks to learn in a more generic way, when information is incomplete Dropout is a regularization strategy It helps the network avoid overfitting SpatialDropout1D/2D/3D drops entire feature maps in 1D, 2D, 3D GaussianDropout multiplies with 1-centered Gaussian noise GaussianNoise adds 0-centered Gaussian noise José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 140 140 3 Loss functions Probabilistic losses Regression losses Hinge losses for "maximum-margin" classification José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 141 141 Probabilistic Losses Binary Cross-entropy (log-loss, binary problems) −(1/N) (ygt.log(ypred)+(1−ygt).log(1−ypred)) Categorical Cross-entropy (log-loss, multiple classes, one-hot representation) −(1/N) ygt.log(ypred) Shape of ypred and ygt is [batch_size, num_classes] Sparse Categorical Cross-entropy (log-loss, multiple classes, labels provided as integers) Shape of ygt is [batch_size], shape of ypred is [batch_size, num_classes] José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 142 142 4 Probabilistic Losses GT Pred Binary Cross-entropy example 0 0 −(1/N) (ygt.log(ypred)+(1−ygt).log(1−ypred)) 0 0 BCEL = -(1/5) ( [0 0 1 1 0]. log2 [0+ε 0+ε 0+ε 1-ε 1-ε] + 1 0 1 1 [1 1 0 0 1]. log2 [1-ε 1-ε 1-ε 0+ε 0+ε] ) 0 1 BCEL = -(1/5) ( [0 0 1 1 0]. [-16 -16 -16 0- 0-] + [1 1 0 0 1]. [0- 0- 0- -16 -16] ) BCEL = -(1/5) ( [0 0 -16 0 0] + [0 0 0 0 -16] ) BCEL = -(1/5) ( [0 0 -16 0 -16] ) = -1/5 (-32) = 6.4 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 143 143 Probabilistic Losses GT Categorical Cross-entropy example a b d −(1/N) ygt.log(ypred) 0 1 0 CCEL = -1/3 ([0 0 1 0]. log[0+ε 0+ε 1 0+ε] + 0 0 1 [1 0 0 0]. log[0.9 0+ε 0.1 0+ε] + 1 0 0 [0 1 0 0]. log[0.2 0.3 0.5 0+ε]) 0 0 0 CCEL = -1/3 ([0 0 1 0]. [-ꝏ -ꝏ 0 -ꝏ] + [1 0 0 0]. [-0.105 -ꝏ -2.302 -ꝏ] + [0 1 0 0]. [-1.0609 -1.203 -0.693 -ꝏ]) Prediction CCEL = -1/3 ([0 0 0 0] + a b d [-0.105 0 0 0] + 0 0.9 0.2 [0 -1.203 0 0] ) 0 0 0.3 CCEL = -1/3 (0 -0.105 -1.203) = -1/3 (-1.308) = 0.436 1 0.1 0.5 0 0 0 José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 144 144 5 Demo Binary Cross-entropy Loss example Categorical Cross-entropy Loss example José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 145 145 6