Deep Learning _ Techniques.pdf

Deep Learning : Complementary Techniques and Strategies AGENDA A. Tackling Model Uncertainty B. Tackling Model Improvement : Online/Active Learning C. Augmenting Vision with Knowledge from other sources D. Infusing structured knowledge into the DL model E. Infusing knowledge from training dataset during inference F. Improving model inference time G. Handling 3D models A. Tackling Model Uncertainty Statement : Say, a model is trained on images of cats and dogs to classify them. If given images of cars, this model might classify it as cat or a dog sometimes with very high conﬁdence, instead of being able to say that “it doesn’t know”. 1. Model Uncertainty is a measure of how good the model is at knowing what it doesn’t know. 2. This problem can also be framed as the model being identify inputs which are very different from its training dataset. 3. The conventional deep learning models have inbuilt conﬁdence scores which are supposed to convey to its sense of certainty in its predictions, but these conﬁdence scores have been proven to be unreliable for OOD (Out Of Distribution inputs). A. Tackling Model Uncertainty : Ensemble Models 1. A model when given different initialization will have a different set of weights at the end even with the same dataset. All these models will behave similarly on the training dataset, but their behaviour will be vastly different on a sample that is very different from training dataset. 2. So we can train 3 models this way with different initial weights. 3. And in inference, if their outputs are too different from each other, we can conclude that the image is from out-of-distribution. 4. CONS : Training and inferencing on multiple models instead of a single model. Increase in RAM required, and also increase in execution time. A. Tackling Model Uncertainty : Bayesian Models 1. The deep learning we currently use have ﬁxed weights. 2. There are a group of models called Bayesian models where each weight is a random distribution. At inference, for a given weight, it will get its distribution, and will randomly sample a value from this distribution and use it as weight. 3. Once you train this model, during inference, we run the same sample through this model 5-10 times. Note that each time, the input will be interacting with a slightly different value in each weight. If the outputs across 5-10 forward passes are very different, then we can conclude that the input is out of distribution or that the model uncertainty is too high. 4. Mostly employed in medical use cases. B. Tackling Model Improvement : Online/Active Learning Statement : Say, a model is trained on some dataset and has been deployed. In deployment, there are few new instances found that are to be included in this model. B. Tackling Model Improvement: Annotation Approaches: 1. Manually annotating the incoming data. 2. The model will make generate the annotation data, and the operator then corrects them if required. 3. The model has very good estimate of its uncertainty, and will annotate all images, and escalate only those images to the operator which it is unsure of. Note that this requires that the model has to be good at gauging its own uncertainty, so as to reduce the work of the operator. B. Tackling Model Improvement : Training Approaches: 1. Train the model on the whole updated dataset by adding these new found instances to the old dataset. 2. Finetuning the model only on the newly found instances: a. However, this can cause the model to forget or perform worse on the older data it was initially trained on. b. We can always have a sample set of the older examples in training and validation set so as to avert this problem. c. However for our scenario, the model can always be trained on whole at regular intervals if needed. C. Augmenting Vision with Knowledge from other sources Statement : Say, we have a vision model that is supposed to segment a given image. Would augmenting this DL model with more knowledge like image descriptions or knowledge graph will make the model predictions better? Approach: 1. We initially just have the image segmentation network that will segment the given input image. 2. We collect text descriptions, and train a text encoder model to generate text descriptions given the image. And we also have a language model which generally converts this to embeddings. C. Augmenting Vision with Knowledge from other sources 3. During inference, given an image, we ﬁrst pass it to the trained text encoder model to generate textual descriptions. We will then pass image through the Image network layers and pass the textual image through the language network layers. And in the end, we take the embeddings from both the networks, fuse them and make the ﬁnal segmentation. Image Image Text 4. Note that combining knowledge from different sources is as simple as generating embeddings via Text Encoder separate networks and concatenating them. Image C. Augmenting Vision with Knowledge from other sources 5. This is the case since the neural networks are just matrices which are able to learn anything given large enough data. You can extend this to knowledge from any other source. 6. The above described approach is a simple one. There are more advanced architectures which fuse the networks at multiple junctions or completely novel architectures focused on that particular combination of sources which make the information fusion more comprehensive and thorough, instead of a simple concatenation at the end. C. Augmenting Vision with Knowledge from other sources 7. But generally, infusing from other sources offer very marginal improvement in accuracy in most cases. And are useful to making the model more ﬂexible rather than more accurate. 8. Eg : In case of image classiﬁcation, in training we have to have ﬁxed number of classes. During inference, if we need to add one other class, we have to train the model again. Augmenting the model with language, and reimagining the last layer as a continuous embedding space of words which the model learns to project onto, we can do away with this constraint of having to train again for a new class. Because here, the model is augmented with language during training, and can directly map to the word. D. Infusing structured knowledge into the DL model 1. The same can be extended for other external knowledge sources like knowledge graphs, which is a pre-built graph of facts with nodes as objects, and connections between nodes as relations. 2. Here we again train a text encoder which generates textual descriptions, we then extract the entities from the text, ﬁnd their embeddings in the knowledge graph, fuse them into the Image segmentation network similar to the above. 3. Infusing structured knowledge is a technique which is used to project model trust. This technique avoids the model making very inaccurate predictions for simple concepts. E. Infusing knowledge from training dataset during inference 1. In this technique, during inference, for a given input image, we extract its embeddings. We then ﬁnd images in the training dataset which have embeddings close to this, and construct a contextual embedding combining these, and fuse them with the normal embedding to make the ﬁnal embedding. F. Improving model inference time 1. Quantisation: a. The weights in the model generally are float32. b. If we convert these float32 values to float16, the execution time would be one-third of the original. c. The drop in accuracy on data which is similar to training data is very minimal, and it will work as good as the original model, but it loses the generalization capability and may behave erratically on data which is very different from training data. d. We can also convert these values to int8 to further decrease the executive time, but the drop would be drastic. F. Improving model inference time 2. Model Pruning: a. In deep learning, most of the weights have little to no impact on the final decision. b. Finding these weights and removing them reduces the number of calculations to be made and decreases the inference time. F. Improving model inference time 3. Knowledge distillation: a. Given a model, we can train a smaller model to imitate the original model, and generate the same output as it. b. The smaller model trains better this way than directly being trained on the data, because here it has continuous probabilistic outputs from the larger model, while if it trains on the original data, it will just have 0 or 1. G. Handling 3D models : Annotation 1. Links: a. 3D annotation demo 2. In the above video, they are four views, a. A free camera view where you can freely move around, change the angle, move up and down. And also the other 3 top, side, front view. b. During annotation, the user will place a random size cuboid in the free view, and adjust their sizes in the individual views. 3. To make the annotation possible, building the free camera view where the user can change his angles, move up and down may be challenging if we can not find any ready made library. 4. It is working fluently in CVAT without any lag, so there might not be any bottlenecks due to it being process intensive. G. Handling 3D models : Models 1. We will need to test and integrate the 3D version of the paddle library a. LINK 2. Also when it comes to 3D, alongside the implementation issues, we also need to thoroughly understand the limitations, which we had by default in case of 2D because of our experience. THE END

Deep Learning _ Techniques.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue