CS826 Deep Learning Theory and Practice Meta Learning Algorithms PDF
Document Details
Uploaded by TopQualityBlessing7037
University of Strathclyde
Nur Naim
Tags
Summary
This document provides an overview of meta-learning, including learning objectives, what meta-learning is, its benefits, different formulations, and applications. It targets an undergraduate level audience and is structured as a presentation or lecture notes, covering topics such as tasks, data augmentation, and generalizing to unseen labels and attributes.
Full Transcript
CS826 Deep Learning Theory and Practice Meta Learning Algorithms Edited by Nur Naim, Oct 2021 Learning objectives ▪ Understand the concept of meta learning. ~ Few shot ~ One shot ~ Zero shot What is meta-learning? If you’ve learned 100 tasks already, can you figure out how to learn more...
CS826 Deep Learning Theory and Practice Meta Learning Algorithms Edited by Nur Naim, Oct 2021 Learning objectives ▪ Understand the concept of meta learning. ~ Few shot ~ One shot ~ Zero shot What is meta-learning? If you’ve learned 100 tasks already, can you figure out how to learn more efficiently? Now having multiple tasks is a huge advantage! Meta-learning = learning to learn In practice, very closely related to multi-task learning Many formulations Learning an optimizer Learning an RNN that ingests experience Learning a representation Meta-learning includes machine learning algorithms that learn from the output of other machine learning algorithms. image credit: Ke Li Why is meta-learning a good idea? Deep reinforcement learning, especially model-free, requires a huge number of samples If we can meta-learn a faster reinforcement learner, we can learn new tasks efficiently! What can a meta-learned learner do differently? Explore more intelligently Avoid trying actions that are know to be useless Acquire the right features more quickly Why work on meta learning? 1. It brings the DL closer to real-world business use cases. Companies hesitate to spend much time and money on annotated data for a solution that they may profit. Relevant objects are continuously replaced with new ones. DL has to be agile. 2. It involves a bunch of exciting cutting- edge technologies. ▪ learn from the output of other machine learning algorithms. ▪ https://www.researchgate.net/figure/Traditional-ML-versus-Transfer-Learning-versus-Meta-Learning_fig3_339895075 Standard learning: training task-specific data a learner model data data instances on the data data knowledge specific classes meta- task data model data training learner data learner data New target data task Training a learning Meta learning: data data tasks meta learner meta- strategy to learn on learner task-agnostic each task task- model specific Meta-learning with supervised learning image credit: Ravi & Larochelle ‘17 Meta-learning with supervisedlearning input (e.g., image) output (e.g., label) training set test label How to read in training set? Many options, RNNs can work test input (few shot) training set Few-shot Learning Using a large annotated offline dataset Perform given task for novel categories, represented by just a few samples each. dog lemur knowledge transfer model elephant for novel … Online rabbit categories training Offline training mongoose monkey Meta-learning Learn a learning Each category is strategy to adjust represented by just well to a new few- a few examples shot learning task Learn to perform classification, Few-shot detection, learning regression Data augmentation Synthesize more data from the novel classes to facilitate the regular learning Recurrent meta-learners Matching Networks in Vinyals et.al., NIPS 2016 Distance-based classification: based on similarity between the query and support samples in the embedding space (adaptive metric): 𝑦ො = 𝑎 𝑥, ො 𝑥𝑖 𝑦𝑖 , 𝑎 𝑥, ො 𝑥𝑖 = 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑓 𝑥, ො 𝑆 , 𝑔 𝑥𝑖 , 𝑆 𝑖 reprinted from Vinyals et.al., 2016 Memory-augmented neural networks (MANN) in Santoro et. al., ICML 2016 Concept of episodes: test conditions Neural Turing Machine = differentiable MANN in the training. Learn to predict the distribution 𝑝(𝑦𝑡 |𝑥𝑡 , 𝑆1:𝑡−1 ; 𝜃) N new categories Explicitly store the support samples in the external M training examples per category memory one query example in {1..N} categories. Typically, N=5, M=1, 5. Optimizers Optimize the learner to perform well after fine-tuning on the task data done by a single (or few) step(s) of Gradient Descent. MAML (Model-Agnostic Meta-Learning) Finn et.al., ICML 2017 Standard objective (task-specific, for task T): min ℒT θ , learned via update θ′ = θ − α ∙ 𝛻θ ℒT (θ) θ Meta-objective (across tasks): min σT~p(ℑ) ℒT θ′ , learned via an update θ ← θ − θ β𝛻θ σT~p(ℑ) ℒT θ′ reprinted from Meta-SGD Li et.al., 2017 Li et.al., 2017 Meta-SGD Li et.al., 2017 Render α as a vector of size θ. “Interestingly, the learning process can Training objective: minimize the continue forever, thus enabling life-long loss ℒT θ′ with respect to the learning, and at any moment, the meta- learner can be applied to learn a learner θ =weights initialization, α = for any new task.” update direction and scale, across the tasks. Metric Learning Matching Networks, Vinyals et.al., NIPS 2016 Objective: maximize the cross-entropy for the non- parametric softmax classifier σ(𝑥,𝑦) 𝑙𝑜𝑔𝑃𝜃 𝑦 𝑥, 𝑆 , with 𝑃𝜃 𝑦 𝑥, 𝑆 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑐𝑜𝑠 𝑓 𝑥, ො 𝑆 , 𝑔 𝑥𝑖 , 𝑆 Prototypical Networks, Snell et al, 2016: Each category is represented by it mean sample (prototype). Each category is represented by a single Objective: maximize the cross-entropy with the prototypes- prototype 𝑐𝑖. based probability expression: 𝑃𝜃 𝑦 𝑥, 𝐶 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥 −𝐿2 𝑓 𝑥ො , 𝑓 𝑐𝑖 Metric Learning Relation networks, Sung et.al., CVPR 2018 Use the Siamese Networks principle : Concatenate embeddings of query and support samples Relation module is trained to produces score 1 for correct class and 0 for others Extends to zero-shot learning by replacing support embedding with semantic features. replicated from Sung et.al., Learning to Compare - Relation network for few-shot learning, CVPR 2018 One-shot learning Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean Image Annotation (100,000+ classes) Apple Lion Tiger Orange Labeled Datasets … Lion Tiger Training 1 Lion 0 Apple 0 Orange 0 Tiger 0 Bear Training 0 Lion 0 Apple 0 Orange 1 Tiger 0 Bear Testing Generalize to Unseen Images Lion? Tiger? How to Generalize to Unseen Labels Training: Test: ►Wolf ► Lion ►Cougar ► Apple ►Grapefruit “Side information” ► Orange “Representation of new classes” ► Tiger ► Bear Zero-Shot Learning by Supervised Attributes ▪ Mammal, quadruped, white / gray / black, big pointed ears, not spotted, … Wolf ▪ Mammal, quadruped, brown / gray, wild, long tail, not spotted, … Cougar [Ali Farhadi et al. Describing objects by their attributes, 09] [Christoph Lampert et al. Learning to detect unseen object classes by between-class attribute transfer, 09] Zero-Shot Learning by Unsupervised ▪ Dog – Bear Wolf ▪ Cat – Tiger – Lion Cougar ▪ Orange – Lemon Grapefruit ▪ Use embedding of labels in a vector space and a notion of semantic similarity in that space [Andrea Frome, Greg S. Corrado, Jon Shlens et al. DeViSE: A Deep Visual-Semantic Embedding Model, 13] [Richard Socher et al. Zero-shot learning through cross-modal transfer, 13] Semantic Embedding of Labels 𝑠(bear) 𝑠(orange) 𝑠(apple) 𝑠(lion) 𝑠(tiger) Semantic Embedding of Labels 𝑠(bear) 𝑠(orange) 𝑠(wolf) 𝑠(grapefruit) 𝑠(apple) 𝑠(lion) 𝑠(tiger) 𝑠(cougar) Semantic Embedding of Images 𝑠(cougar) 𝑓( ) ► How to define the semantic embedding of labels ► How to project images into that space ► We use kNN search for label retrieval kNN Unsupervised Label Embedding Word with similar context will get similar vectors [Tomas Mikolov et al. Efficient estimation of word representations in vector space, 13] (word2vec) Unsupervised Label Embedding Zero Shot Object Tracking Resources ▪ https://digitalcommons.fiu.edu/cgi/viewcontent.cgi?article=1020&context=cs_fac ▪ https://ov-research.uwaterloo.ca/MSCI641/Week10_Transfer_learning.pptx ▪ https://research.ibm.com/haifa/dept/imt/IMVC19/IMVC2019%20-%20Few- shot%20learning.pptx ▪ https://www.researchgate.net/publication/335420271_An_Introduction_to_Advanced_Machine_ Learning_Meta_Learning_Algorithms_Applications_and_Promises/fulltext/5d64a11f299bf1f70b0e bbcd/An-Introduction-to-Advanced-Machine-Learning-Meta-Learning-Algorithms-Applications- and-Promises.pdf ▪ https://towardsdatascience.com/one-shot-learning-with-siamese-networks-using-keras- 17f34e75bb3d ▪ Google (image) – one-shot learning ▪ https://medium.com/@vishnuvijayanpv/deep-reinforcement-learning-value-functions-dqn- actor-critic-method-backpropagation-through-83a277d8c38d ▪ https://blog.fastforwardlabs.com/2020/11/15/representation-learning-101-for-software- engineers.html ▪ References: Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra, Matching networks for one shot learning. In NIPS 2016 ▪ Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap, Meta- Learning with Memory-Augmented Neural Networks, ICML 2016