Deep Learning and Variants_Lecture 7_20240211.pdf

Full Transcript

Presents Deep Learning & its variants GGU DBA Autoencoders, Embeddings Dr. Anand Jayaraman Professor, upGrad; Chief Data Scientist, Agastya Data Solutions [email protected] Denoising autoencoders ∙ Simple regularization scheme is denoising which mainly involves reconstructing the original input...

Presents Deep Learning & its variants GGU DBA Autoencoders, Embeddings Dr. Anand Jayaraman Professor, upGrad; Chief Data Scientist, Agastya Data Solutions [email protected] Denoising autoencoders ∙ Simple regularization scheme is denoising which mainly involves reconstructing the original input from a corrupted version of it. ∙ In this case, corruption involves randomly setting a portion of the input dimensions to zero also termed as zero-mask noise. Original data Corrupted data Hidden layer Target: Original data Denoising Autoencoder: MNIST example https://blog.keras.io/building-autoencodersin-keras.html Denoising Autoencoder From: https://www.v7labs.com/blog/autoencoders-guide Autoencoder for colorization From: Autoencoder:Grayscale to color image | Kaggle Autoencoders applications 1. Image processing application: - De-noising - Auto filling 2. Anomaly detection. 3. Feature generation. 4. Learning generative models 5. Text translation... www.cc.gatech.edu SegNet: Object Segmentation Object Segmentation https://youtu.be/e9bHTlYFwhg CATEGORICAL EMBEDDINGS Every categorical attribute can be embedded Let us say, country is an attribute and there are 25 nations in your database. How do you feed them to the model – Assigning a number based on alphabetical order: We are telling the model that USA is more similar to Uganda than to Canada; confusing the model – One hot encoding: Apriori, all countries are equidistant from each other. Main problem: Increased dimensionality – What options are there? High Cardinality Categorical Variables High cardinality variable Target Encoding The mean value of the target is actually used to code the category Why it makes sense? – The numerical coding uses the impact of the category has on the Target as a way to code it Effective for high cardinality variables Disadvantages: – Often can lead to overfitting – Information leakage (Have to be careful during CV) Commonly used along with Regularization Every categorical attribute can be embedded Let us say, country is an attribute and there are 25 nations in your database. How do you feed them to the model – Assigning a number based on alphabetical order: We are telling the model that USA is more similar to Uganda than to Canada; confusing the model – One hot encoding: All countries are equidistant from each other: Equally bad – Intelligently assign values that represent the similarities and dissimilarities: Very difficult to scale – Learn them! High Cardinality Categorical Variables Neural Networks are already determining a lot of parameters Can they figure out a good way to embed the categorical variables as a part of back-propagation? BookVal SMA PE Vol … Up/Down Prediction Visualizing the Categorical Embeddings Entity Embeddings Entity embeddings reduce memory usage and speed up neural networks compared to one-hot encoding. Intrinsic properties of categorical variables can be revealed and visualized easily The learnt embeddings can be used to in other ML algorithms to represent the categorical variables. Neural networks for embedding The purpose of embedding is to get closer categorical levels to the similar Mapping RECOMMENDATION SYSTEMS Why use Recommender Systems? Value for the customer – Find things that are interesting – Narrow down the set of choices – Help me explore the space of options – Discover new things – Entertainment – …. Value for the provider – Additional and probably unique personalized service for the customer – Increase trust and customer loyalty – Increase sales, click through rates, conversion etc. – Opportunities for promotion, persuasion – Obtain more knowledge about customers Real-World Check Industry – Amazon.com: Increased sales through the recommendation lists (~ 35% ) – Netflix (DVD rental and movie streaming) generates X percent of their sales through the recommendation lists (65% movie selections based on recommendations) There must be some value in it – See recommendation of groups, jobs or people on LinkedIn – Friend recommendation and ad personalization on Facebook – Song recommendation at last.fm – News recommendation at Forbes.com ( 37% increase in Click Thru Rate ) Personalized Recommendations Recommender Systems – Recommend items (content) based on user ratings of items – “Ratings” may be Explicit, e.g. buying or rating an item Implicit, e.g. browsing time, no. of mouse clicks Recommender Systems RS seen as a function Given: – User model (e.g. ratings, preferences, demographics, situational context) – Items (with or without description of item characteristics) Find: – Relevance score. Used for ranking. Finally: – Recommend items that are assumed to be relevant But: – Remember that relevance might be context-dependent – Characteristics of the list itself might be important (diversity) Collaborative Filtering (CF) The most prominent approach to generate recommendations – used by large, commercial e-commerce sites – well-understood, various algorithms and variations exist – applicable in many domains (book, movies, DVDs,..) Approach – use the "wisdom of the crowd" to recommend items Basic assumption and idea – Users give ratings to catalog items (implicitly or explicitly) – Customers who had similar tastes in the past, will have similar tastes in the future Collaborative filtering Input – User-Rating Matrix (Incomplete : Sparse) Output – For a particular user, complete the row Problem is mathematically equivalent to filling missing NA values. Recommender Systems Rossman stores competition in Kaggle Forecast the sales as a function of From: https://www.slideshare.net/paulskeie/entity-embeddings Feature representations: Cat2Vec Level Representation Bayern 1,0,0,0,0,0,0,0,0,0,0,0 BadenWuert 0,1,0,0,0,0,0,0,0,0,0,0 temberg Berlin Hamburg Hessen 0,0,1,0,0,0,0,0,0,0,0,0

Use Quizgecko on...
Browser
Browser