23-24 - M2AI - DL4CV - 1 - Deep Learning 146-174.pdf
Document Details
Uploaded by CoherentYtterbium
Instituto Politécnico do Cávado e do Ave
Tags
Related
Full Transcript
Probabilistic Losses Poisson loss loss = ypred- ygt * log(ypred) Kullback-Leibler divergence loss loss = ygt * log(y_gt / ypred) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI...
Probabilistic Losses Poisson loss loss = ypred- ygt * log(ypred) Kullback-Leibler divergence loss loss = ygt * log(y_gt / ypred) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 146 146 Optimizers SGD (Stochastic Gradient Descent) - classic choice w = w - learning_rate * gradient José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 147 147 1 Optimizers SGD with momentum velocity = momentum * velocity - learning_rate * gradient w = w + velocity José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 148 148 Optimizers SGD with Nesterov momentum velocity = momentum * velocity - learning_rate * gradient w = w + momentum * velocity - learning_rate * gradient José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 149 149 2 Optimizers SGD (Stochastic Gradient Descent) - classic choice w = w - learning_rate * gradient SGD with momentum velocity = momentum * velocity - learning_rate * gradient w = w + velocity SGD with Nesterov momentum velocity = momentum * velocity - learning_rate * gradient w = w + momentum * velocity - learning_rate * gradient José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 150 150 Optimizers Other optimizers use some variant of SGD or other strategies Other optimizers include Adam Adadelta Adagrad Adamax Nadam Ftrl RMSprop Adam is a popular choice and is the de facto standard José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 151 151 3 Learning rate schedules Exponential Decay Polynomial Decay Piecewise Constant Decay Inverse Time Decay https://neptune.ai/blog/how-to-choose-a-learning-rate-scheduler José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 152 152 Model training specifics Building an image classifier from scratch usually involves: Network structure creation Accessing a dataset, writing a training generator and a validation generator Setting up callbacks for the end of each batch/epoch Logging Checkpoints Early stopping … Training the network on the training set Loading the best checkpoint Evaluating the network on the validation set In the real world, you usually write 2 scripts, one for training and one for evaluation José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 153 153 4 Callbacks Callbacks are user provided functions that run at the end of each batch or at the end of each epoch Most common and useful callback are: Logging metrics (Tensorboard) Saving the model (a checkpoint) at the end of each epoch if the metrics improve Stopping training if it hasn’t improved in a long time (early stopping) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 154 154 Callbacks #Checkpoint checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(modelFilePath, monitor='val_loss', verbose=1, save_best_only=True) #Tensorboard log_dir = os.path.join("logs", "fit", datetime.datetime.now().strftime("%Y%m%d-%H%M%S")) tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir) #Early Stopping early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss’, patience=patience) … model.fit(trainGen, steps_per_epoch=stepsPerEpoch, epochs=epochs, callbacks=[checkpoint_callback, tensorboard_callback, early_stopping_callback], validation_data=valGen, validation_steps=validationSteps) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 155 155 5 Tensorboard Training takes a long time; during training you would like to have some feedback on how training is evolving You can define metrics that are evaluated during training Metrics regarding the training set, to be evaluated at every batch Metrics regarding the validation set, to be evaluated at the end of every epoch Metrics (plus the loss) are printed during training Tensorboard can automatically generate graph for the metrics Tensorboard can be activated as a callback José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 156 156 Tensorboard Command line: tensorboard --logdir logs/fit José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 157 157 6 Custom Callback class BatchLossHistory(tf.keras.callbacks.Callback): def on_train_begin(self, logs={}): self.batch_losses = [] self.batch_accuracies = [] def on_batch_end(self, batch, logs={}): self.batch_losses.append(logs.get('loss')) self.batch_accuracies.append(logs.get('accuracy')) José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 158 158 Demo Training with: images from disk Callbacks Takeaways val_loss and val_accuracy are initially better than loss and accuracy This is because val_loss and val_accuracy are only evaluated at the end of the epoch after all the updates The reported (training) loss and accuracy are the average training loss and training accuracy over the whole epoch, and are negatively affected by the initial (untrained) parameters José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 159 159 7 Agenda Artificial Intelligence and Computer Vision Application Domains Artificial Intelligence and Computer Vision tasks Machine Learning and Deep Learning Neural Networks Neural Networks for Classification in Computer Vision Evaluation and Metrics Training Neural Networks Implementation challenges Training challenges Transfer Learning, Data Augmentation, Synthetic Datasets Inference challenges, Model Compression Neural Networks for other Computer Vision tasks More Neural Networks José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 160 160 Training Approach Deep Learning is unreasonably effective Throw good data at a suitable network and it will learn from it Get good data for your problem Good = quantity + quality You may have to make a compromise between quantity and quality Abundant and accessible (cheap) data is often low-quality High quality data may need to be (expensively) hand labeled Get a pretrained network and retrain it on your data José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 161 161 8 Training challenges Dataset building Training hardware Large datasets Compute capability Data quality Memory size Dataset building tricks Training tricks Data harvesting – Mechanical Turk Use decent hardware Data augmentation For mortals: laptop with Intel i7/i9 + GeForce RTX Synthetic data – games For organizations: buying a physical server w/ multiple GPUs, renting a cloud server (AWS, Azure, …) Distributed training Use pretrained backbones and finetune on new data José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 162 162 Data augmentation Real examples may be reused with small random changes/effects This produces realistic/real additional examples at a very low cost Common augmentation strategies: Random translation (horizontal/vertical) Random rotation Random flip (horizontal/vertical) Random zoom Random skew/tilt/stretch Random noise addition Random Distortion Augmentation = regularization José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 163 163 9 Training with our own data For your own application you will most likely have your own data that you would like to feed the network during training You might also want to automatically apply augmentation to the images Both Tensorflow and PyTorch provide data augmentation functions José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 164 164 Demo Train a custom network on data from disk with data augmentation José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 165 165 10 Custom Generators Sometimes the default generators do not provide all the flexibility you need In this case you will need to write a custom Generator José Henrique Brito | 2Ai - EST - IPCA, PT | DL4CV @ M2AI 166 166 Custom Generators def train_val_Generator(batch_size, trainSetX, trainSetY, inputSize=(256, 256, 3)): if batch_size > 0: while 1: i=0 n_Batches = int(np.ceil(len(trainSetX)/batch_size)) for batchID in range(nBatches): images = np.zeros(((batch_size,) + inputSize)) labels = np.zeros(((batch_size,) + (numClasses,))) i_inBatch = 0 while i_inBatch 0: while 1: i=0 n_Batches = int(np.ceil(len(trainSetX)/batch_size)) for batchID in range(nBatches): images = np.zeros(((batch_size,) + inputSize)) labels = np.zeros(((batch_size,) + (numClasses,))) i_inBatch = 0 while i_inBatch