LECTURE 1 - Introduction to Recommender Systems PDF

Summary

This document is a lecture from the University of Milano-Bicocca, presented by Georgios Peikos, introducing Recommender Systems (RS). The lecture covers various topics, including the history, paradigms, benefits, and data inputs related to RS. The document also explores matrix-based approaches used in building RS and the methods for inferring user preferences. The lecture may discuss the role of machine learning and data science principles in the development and application of RS.

Full Transcript

LECTURE 1: RECOMMENDER INFORMATION RETRIEVAL AND SYSTEMS (RS) RECOMMENDER SYSTEMS Georgios Peikos University of Milano-Bicocca, Milan, Italy Department of Informatics, Systems, and Communicatio...

LECTURE 1: RECOMMENDER INFORMATION RETRIEVAL AND SYSTEMS (RS) RECOMMENDER SYSTEMS Georgios Peikos University of Milano-Bicocca, Milan, Italy Department of Informatics, Systems, and Communication (DISCo) 1 Before we start… What is a Recommender System (RS)? What are some of the examples of RSs? Q: Do you trust RSs and do you take recommended items into account? Before we start… What is a Recommender System (RS)? What are some of the examples of RSs? Q: Do you trust RSs and do you take recommended items into account? Most importantly, why so and what kind of personal data are you willing to share for this purpose? Schedule Overview LECTURE 1: An Introduction to Recommender Systems (3 hours) LECTURE 2: Non-Personalised RS, Bias, Fairness (2 hours) LECTURE 3: Evaluation of RS (3 hours) LECTURE 4: Content-Based Approach (3 hours) 4 LABS on RS LECTURE 5: Collaborative Approach (3 hours) LECTURE 6: Cold Start Problem and Recap (2 hours) LECTURE 1: Information Seeking vs. RS and Introduction to RS Information Seeking vs. RS and Introduction to RS LECTURE 1: An Introduction to RS 1. Information Seeking vs. RS and Introduction to RS 2. History and evolution 3. RS paradigms 4. Benefits and applications 5. Data Inputs 6. RS taxonomy 7. Matrices in RS o User Rating Matrix (URM) o Item-Content Matrix (ICM) 8. Inferring Preferences (Implicit and Explicit Ratings) 9. Current trends and challenges (broad overview) 10. Summary & Recap Information Seeking vs. RS and Introduction to RS Information Seeking, Retrieval and RS Information explosion problem How to handle information - E-commerce (e.g. Amazon, Alibaba…) overload? - Social Networking (e.g. Facebook) - Content sharing platforms (e.g. Instagram, Pinterest) Information Seeking vs. RS and Introduction to RS Information Seeking and RS Information explosion problem How to handle information - E-commerce (e.g. Amazon, Alibaba…) overload? - Social Networking (e.g. Facebook) - Content sharing platforms (e.g. Instagram, Pinterest) In the context of obtaining information, there are two major techniques: Search Recommendation Information Seeking vs. RS and Introduction to RS Searching vs. Recommending Purpose: Searching: actively looking for specific information or items based on a query they provide. Recommending can stimulate users into actions, such as buying a specific book. Also tools for dealing with information overload. Information Used: Searching: Typically relies on the query provided by the user and matches it with available documents or items. Recommending: might not necessarily need detailed information about the items. Information Seeking vs. RS and Introduction to RS Searching vs. Recommending Relation to IR: Searching: Search engines and other IR systems focus on discriminating between relevant and irrelevant documents. Many IR techniques rank documents based on their content. Recommending: The border between content-based recommenders and classical IR methods isn't strictly defined. For instance, while a general spam email detector or a web search engine might not be viewed as a RS, the personalization of search results blurs this boundary. Nature of Results: Searching: Provides a list of results that match the user's query. The results are typically ranked based on relevance to the query. Recommending: Offers suggestions that might be of interest to the user. The recommendations can sometimes be unexpected, directing users to new or different categories they might not have considered. Information Seeking vs. RS and Introduction to RS What is a RS and what are the goals? provide recommendations Users Recommender Systems Q: Which RSs are you using and how good do they meet your needs? Information Seeking vs. RS and Introduction to RS Recommender Systems Which product to buy? Information filtering systems, aka RSs, aim to Which movie Which book push users potentially useful information / to watch? to read? goods/ services. Provides fresh recommendations to users by compiling their preferences. Becoming more popular due to the growth of websites like YouTube, Amazon, Netflix, etc. When used effectively – generates revenue. How? For whom? It relies on machine learning methods. Information Seeking vs. RS and Introduction to RS What is a RS and what are the goals? RSs help to: - match users with items (e.g., job portals). - ease information overload (e.g., academic research). - sales assistance (e.g., guidance, persuasion,…). - personalized entertainment (e.g., streaming services, video games). - education and learning. - social media engagement. - ….and more! Information Seeking vs. RS and Introduction to RS What is a RS and what are the goals? RS are software agents that elicit the interests and preferences of individual consumers […] and make recommendations accordingly. They have the potential to support and improve the quality of the decisions consumers make while searching for and selecting products online. (Xiao & Benbasat, 2007) Information Seeking vs. RS and Introduction to RS The Long Tail perspective When does the RS do a good job? One criterion might be its ability to also recommend widely unknown or lesser-rated items that users Recommend items might actually enjoy. This can be seen as promoting from the long tail diversity and serendipity in the recommendations. Information Seeking vs. RS and Introduction to RS The Long Tail perspective When does the RS do a good job? One criterion might be its ability to also recommend widely unknown or lesser-rated items that users Recommend items might actually enjoy. This can be seen as promoting from the long tail diversity and serendipity in the recommendations. In the MovieLens 100K dataset, only 20% of the items accumulate 74% of all positive ratings (rated above 3). Information Seeking vs. RS and Introduction to RS The Long Tail perspective When does the RS do a good job? One criterion might be its ability to also recommend widely unknown or lesser-rated items that users Recommend items might actually enjoy. This can be seen as promoting from the long tail diversity and serendipity in the recommendations. In the MovieLens 100K dataset, only 20% of the items accumulate 74% of all positive ratings (rated above 3). A well-functioning RS could work to discover and recommend the other 80% of items that might be overlooked but are still relevant and appealing to individual users. Information Seeking vs. RS and Introduction to RS How do RS work? Information Seeking vs. RS and Introduction to RS How do RS work? Information Seeking vs. RS and Introduction to RS How do RS work? Information Seeking vs. RS and Introduction to RS How do RS work? 2. History and evolution of RS History and evolution of RS A brief history 1990s: first systems (e.g., GroupLens), basic algorithms History and evolution of RS A brief history 1990s: first systems (e.g., GroupLens), basic algorithms 1995-2000: rapid commercialisation, challenges of scale History and evolution of RS A brief history 1990s: first systems (e.g., GroupLens), basic algorithms 1995-2000: rapid commercialisation, challenges of scale 2000-2005: research explosion, mainstream applications History and evolution of RS A brief history 1990s: first systems (e.g., GroupLens), basic algorithms 1995-2000: rapid commercialisation, challenges of scale 2000-2005: research explosion, mainstream applications 2006: Netflix prize History and evolution of RS A brief history 1990s: first systems (e.g., GroupLens), basic algorithms 1995-2000: rapid commercialisation, challenges of scale 2000-2005: research explosion, mainstream applications 2006: Netflix prize 2007: the first Recommender Systems conference History and evolution of RS A brief history 1990s: first systems (e.g., GroupLens), basic algorithms 1995-2000: rapid commercialisation, challenges of scale 2000-2005: research explosion, mainstream applications 2006: Netflix prize 2007: the first Recommender Systems conference 2010s: applications common now – very active research, many applications History and evolution of RS A brief history 1990s: first systems (e.g., GroupLens), basic algorithms 1995-2000: rapid commercialisation, challenges of scale 2000-2005: research explosion, mainstream applications 2006: Netflix prize 2007: the first Recommender Systems conference 2010s: applications common now – very active research, many applications 2020s: continued innovation, ethical considerations, and personalization at scale 3. RS paradigms RS paradigms Paradigms of RS Recommender systems reduce information overload Non-personalised, e.g. most popular, manually currated RS paradigms Paradigms of RS Collaborative: "Tell me what's popular among my peers" RS paradigms Paradigms of RS Content-based: "Show me more of the same what I've liked" RS paradigms Paradigms of RS Knowledge-based: "Tell me what fits based on my needs" RS paradigms Knowledge-based: "Tell me what fits based on Paradigms of RS my needs" Knowledge Model Components: User Knowledge: 1. Preferences: Beach vs. mountain destinations, cultural experiences, 2. Demographics: Family status, age group, etc. 3. Previous travel history: Past destinations, liked/disliked experiences. Item Knowledge (Travel Packages): 1. Destination attributes: Climate, attractions, activities, safety level, travel cost, etc. 2. Package details: Accommodation type, transportation, included meals, guided tours, etc. Domain Knowledge: 1. Seasonal considerations: Best time to visit each destination, off-season discounts, etc. 2. Cultural information: Local customs, language, festivals, etc. 3. Travel regulations: Visa requirements, travel advisories, COVID-19 restrictions, etc. RS paradigms Paradigms of RS Hybrid: combinations of various inputs and/or composition of different mechanism 4. Benefits and Applications Benefits and Applications Benefits of employing RS (e.g., in business) Benefits and Applications Applications of RS in different sectors Sector Description Leaders E-commerce and Show similar products Amazon, Alibaba, retail Determine items using relevant keywords eBay Benefits and Applications Applications of RS in different sectors Sector Description Leaders E-commerce and Show similar products Amazon, Alibaba, retail Determine items using relevant keywords eBay Entertainment and Helps user to create playlist Netflix, Spotify media Provide movie recommendation Benefits and Applications Applications of RS in different sectors Sector Description Leaders E-commerce and Show similar products Amazon, Alibaba, retail Determine items using relevant keywords eBay Entertainment and Helps user to create playlist Netflix, Spotify media Provide movie recommendation Social platforms Offer personalized suggestions LinkedIn, Instagram, Helps the user to find similar pages or accounts Facebook Benefits and Applications Applications of RS in different sectors Sector Description Leaders E-commerce and Show similar products Amazon, Alibaba, retail Determine items using relevant keywords eBay Entertainment and Helps user to create playlist Netflix, Spotify media Provide movie recommendation Social platforms Offer personalized suggestions LinkedIn, Instagram, Helps the user to find similar pages or accounts Facebook Content searching Automated recommendations based on your Google, YouTube platforms browsing Benefits and Applications Applications of RS in different sectors Sector Description Leaders E-commerce and Show similar products Amazon, Alibaba, retail Determine items using relevant keywords eBay Entertainment and Helps user to create playlist Netflix, Spotify media Provide movie recommendation Social platforms Offer personalized suggestions LinkedIn, Instagram, Helps the user to find similar pages or accounts Facebook Content searching Automated recommendations based on your Google, YouTube platforms browsing Banking and Give personalized financial advice to the customer American Express, finance Aids in saving money J.P. Morgan Benefits and Applications Applications of RS in different sectors Sector Description Leaders E-commerce and Show similar products Amazon, Alibaba, retail Determine items using relevant keywords eBay Entertainment and Helps user to create playlist Netflix, Spotify media Provide movie recommendation Social platforms Offer personalized suggestions LinkedIn, Instagram, Helps the user to find similar pages or accounts Facebook Content searching Automated recommendations based on your Google, YouTube platforms browsing Banking and Give personalized financial advice to the customer American Express, finance Aids in saving money J.P. Morgan Education Suggest relevant courses or study materials Coursera, Udemy, Personalised study for self-paced learning Khan Academy Suggest research articles or reading materials Benefits and Applications The Impact of RSs Amazon Youtube Netflix Alibaba Cross-selling and cross penetration Amazon’s RS Recommendations Recommendation During the Chinese Techniques accounts for are responsible for s account for 75% global shopping which have 35% of all 70% of the time of viewing, festival of November helped to purchases, people spend according 11, 2016, Alibaba increase sales according watching videos to McKinsey. achieved growth of by 20% and to McKinsey. on YouTube. up to 20% of their profits by 30%. Saving $1 billion conversion rate using each year. personalised landing pages according to Alizila. Benefits and Applications The “Recommender problem” C:= {users} S:= {recommendable items} u:= utility function, measures the usefulness of item s to user Benefits and Applications The “Recommender problem” C:= {users} S:= {recommendable items} u:= utility function, measures the usefulness of item s to user u:CXS→R where R:= {recommended items}. Benefits and Applications The “Recommender problem” C:= {users} S:= {recommendable items} u:= utility function, measures the usefulness of item s to user u:CXS→R where R:= {recommended items}. For each user c, we want to choose the items s that maximize u. ∀c∈C : sc′​=argmax​ u(c,s) s∈S 5. Data Inputs Data Inputs 6. RS Taxonomy RS Taxonomy Taxonomy of recommender systems Do you have any idea about what it means for a recommendation to be non-personalised? How could it work for a user, if it seems not tailored to his or her own "taste"? Memory based Model based RS Taxonomy Taxonomy of recommender systems Memory based Model based RS Taxonomy Taxonomy of recommender systems Memory based Model based RS Taxonomy Taxonomy of recommender systems Memory based Model based RS Taxonomy Taxonomy of recommender systems Memory based Model based RS Taxonomy Recommendation Models - Examples Used By: Model Commonness Think Jinni IMDb Movielens Netflix Shazam Pandora Itunes Amazon Analytics Collaborative Filtering v v v v v v v v Content-Based v v v v v v Knowledge-Based v v v v Stereotype-Based RS v v v Ontologies and Semantic v v Web Technologies for RS Community Based RS v v v v Demographic Based RS v Context Aware RS v v Conversational RS v v Hybrid v v v 7. Matrices in RS Matrices in RS Matrices in RS Foundational in many RS methodologies (particularly those that use collaborative and content-based techniques). What is a matrix in the context of RS? Structured data representations that capture and store information about users, items, and the relationships or interactions between them. Allow for organization and systematic analysis of large quantities of data Matrices in RS Matrices in RS Foundational in many RS methodologies (particularly those that use collaborative and content-based techniques). What is a matrix in the context of RS? Structured data representations that capture and store information about users, items, and the relationships or interactions between them. Allow for organization and systematic analysis of large quantities of data Important for many recommendation algorithms(e.g. collaborative filtering, content- based filtering, hybrid systems…) Many different types: User-Feature Matrix, Context Matrix, Item-Similarity Matrix, User- Similarity Matrix… HOWEVER…. Matrices in RS Key Matrices in RS Item-Content Matrix (ICM) User Rating Matrix (URM) Both fundamental in building RS through analysis of users’ preferences and the item characteristics. By employing these matrices, can find patterns, similarities, and relationships that enable personalized and relevant recommendations. For example: Collaborative filtering – mainly relies on URM to find similarities between users or items based on their interactions. Content-based filtering – mainly relies on the ICM to recommend items that are what a user has liked in the past, based on item attributes. Hybrid Systems: Combines both collaborative and content-based approaches, utilizing both ICM and URM for more accurate and diverse recommendations. Matrices in RS Key Matrices in RS Item-Content Matrix (ICM) User Rating Matrix (URM) Both fundamental in building RS through analysis of users’ preferences and the item characteristics. By employing these matrices, can find patterns, similarities, and relationships that enable personalized and relevant recommendations. For example: Collaborative filtering – mainly relies on URM to find similarities between users or items based on their interactions. Content-based filtering – mainly relies on the ICM to recommend items that are similar to what a user has liked in the past, based on item attributes. Hybrid Systems: Combine both collaborative and content-based approaches, utilizing both ICM and URM for more accurate and diverse recommendations. Matrices in RS Item-Content Matrix (ICM) Captures the features or attributes of items. Row = an item, column = specific attribute or feature (e.g., genre, recipe ingredients). Values indicate the presence (or strength) of a particular item feature. Matrices in RS Item-Content Matrix (ICM) Captures the features or attributes of items. Row = an item, column = specific attribute or feature (e.g., genre, recipe ingredients). Values indicate the presence (or strength) of a particular item feature. Matrices in RS Item-Content Matrix (ICM) Captures the features or attributes of items. Row = an item, column = specific attribute or feature (e.g., genre, recipe ingredients). Values indicate the presence (or strength) of a particular item feature. Each Number how important an attribute is in characterizing an item. Matrices in RS User Rating Matrix (URM) Matrices in RS User Rating Matrix (URM) Represents the interactions (e.g., ratings, clicks, purchases) between users and items. Rows often represent users while columns represent items. The values in the matrix represent the interaction strength, such as a rating value. Sparse in nature: Not all users interact with all items, leading to a matrix with many unset or zero values. Typical URM density < 0.01% - what does density mean? What is sparsity? Density vs. Sparsity DENSITY SPARSITY Sparsity (S) = the opposite of density. It Density (D): the proportion of the user- item matrix that is filled with known values. represents the proportion of the user-item matrix that is unknown or unfilled. U is the total number of users. calculated by taking the ratio of zeros in a I is the total number of items. dataset to the total number of elements R is the total number of interactions present in the matrix. S = 1 − D (Density) 𝑅 (𝑛𝑜. 𝑜𝑓 𝑘𝑛𝑜𝑤𝑛 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠) (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠) D= S= , 𝑈 𝑥 𝐼 (𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠) 𝑈 𝑥 𝐼 (𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠) where the number of unknown Interactions is (U × I) − R Matrices in RS User Rating Matrix (URM) Represents the interactions (e.g., ratings, clicks, purchases) between users and items. Rows often represent users while columns represent items. The values in the matrix represent the interaction strength, such as a rating value. Sparse in nature: Not all users interact with all items, leading to a matrix with many unset or zero values. Typical URM density < 0.01% Netfluix URM density approx. 0.002% MovieLens URM density approx. 0.005% 8. Inferring Preferences (Implicit and Explicit Ratings) Inferring Preferences (Implicit and Explicit Ratings) Recommending items to users User Feedback Recommender List Inferring Preferences (Implicit and Explicit Ratings) Inferring Preferences – why is it important? For effective recommendations, an RS must not only recognize available items but also understand user preferences. This inference forms the basis for delivering personalized and relevant suggestions. RS to produce highly relevant & personalized lists of items items users unknown preferences ratings matrix Inferring Preferences (Implicit and Explicit Ratings) Inferring Preferences Feedback Collection: Enables a deeper understanding of users Guides algorithm refinement Enhances user experience by aligning with their desires Main feedback types: Explicit feedback Implicit feedback Inferring Preferences (Implicit and Explicit Ratings) Explicit Feedback Explicit Ratings Inferring Preferences (Implicit and Explicit Ratings) Explicit Feedback Inferring Preferences (Implicit and Explicit Ratings) Explicit Feedback Most commonly used (1 to 5, 1 to 7 Likert response scales, likes/dislikes) Clear understanding of user preferences, allows for direct user feedback. Optimal granularity of scale = different domains generally use different scales E.g. 10-point scale is better accepted in movie domain Multidimensional ratings (multiple ratings per different aspects – e.g. booking.com) Inferring Preferences (Implicit and Explicit Ratings) Explicit Feedback Most commonly used (1 to 5, 1 to 7 Likert response scales, likes/dislikes) Clear understanding of user preferences, allows for direct user feedback. Optimal granularity of scale = different domains generally use different scales E.g. 10-point scale is better accepted in movie domain Multidimensional ratings (multiple ratings per different aspects – e.g. booking.com) Main problems: Users not always willing to rate many items number of available ratings could be too small → sparse rating matrices → poor recommendation quality – data imputation (predicting missing ratings)? How to stimulate users to rate more items? What else to use? Inferring Preferences (Implicit and Explicit Ratings) Implicit Feedback Indirect indications of user preferences derived from their actions. Examples: Browsing history Viewing time of a movie Song streaming count Click-through rates Purchases… Benefits: More abundant than explicit feedback, captures user behavior without direct input. Inferring Preferences (Implicit and Explicit Ratings) Implicit Feedback - Example Inferring Preferences (Implicit and Explicit Ratings) Implicit Feedback - Example Combine multiple implicit feedback features to estimate user rating Standard CB and CF RS can be used afterwards Q: What can go wrong? Inferring Preferences (Implicit and Explicit Ratings) Implicit Feedback - Example Combine multiple implicit feedback features to estimate user rating Standard CB and CF RS can be used afterwards A: Pages may substantially vary in length, amount of content etc. This could affect perceived implicit feedback features Leveraging context could be important Inferring Preferences (Implicit and Explicit Ratings) Context of User Feedback Context of the user Location, mood, seasonality affect user preferences Context of device and page Page and browser dimensions Page complexity (amount of text, links, images…) Device type These can affect perceived values of the user feedback Explicit Feedback Implicit Feedback (IF) Hybrid Feedback Users explicitly submit their rating/ Indirect monitoring of user's Combination of response to a given product, movie, interaction explicit and music etc. implicit feedback Overview Feedback can be submitted as a rating or/and text (e.g. SHEIN), YouTube – like / dislike, Spotify - heart feature (add to favourites). System asks users to rate products Monitoring mouse Gather both Website like/dislike functionality movements or/and explicit and Request users to provide their browsing patterns, dwell implicit feedback Ways to comments as text time, clicks, copy text, from users in collect printing, purchase process order to employ Tracked via JavaScript the hybrid technique. Strengths Reflect user’s actual experience Relatively effortless Any event = IF Limitations Laborious and cognitively demanding Any event = IF Missing in small e-commerces Noisy Comparing two products - rare Analysis = challenging 9. Current Trends and Challenges Current Trends and Challenges 2020s – current trends and topics Integration with Advanced AI: integration of sophisticated algorithms -> highly personalised and context-aware recommendations. Ethical and Bias Concerns: ensuring that recommendations are fair, unbiased, transparent, and respectful of privacy. Explainable AI (XAI): research and development with aim to better explain to users why they were recommended an item. Cross-domain Recommendations: the ability to provide recommendations across different domains (e.g., music, movies, shopping) offers a holistic user experience (if shared platform – like music/podcast on Spotify; books, videos and items on Amazon). Cross-domain models and techniques may become more prominent. Real-time and Contextual Recommendations: increasing demand for real-time, situational recommendations. Efforts to adapt to real-time context (location, time of day, weather, etc.). Adapt to the users’ changing dynamics and preferences and interests. Current Trends and Challenges 2020s – current trends and topics Privacy-Preserving Methods: with growing concerns over user privacy – development of privacy-preserving techniques. Considerable AI (involvement and contribution of AI in developing and enhancing privacy-preserving methods). Sustainability and Social Responsibility: recommendations aligned with ethical and environmental regulations. Human-in-the-loop (HITL) Systems: Incorporating human expertise and feedback into the recommendation process may become more common. Reinforcement learning and neuroscience (brain signals update the model – continuous learning). Healthcare and Social Good Applications: Beyond entertainment and e-commerce, RSs expanding into areas like healthcare and education -> can have profound positive impacts but also challenges. Need for an explainable and transparent model. 10. Summary & Recap Summary & Recap Summary 1. Information Seeking vs. RS and Introduction to RS 2. History and evolution 3. RS paradigms 4. Benefits and applications 5. Data Inputs 6. RS taxonomy 7. Matrices in RS o User Rating Matrix (URM) o Item-Content Matrix (ICM) 8. Inferring Preferences (Implicit and Explicit Ratings) 9. Current trends and challenges (broad overview) 10. Summary & Recap Summary & Recap Recap Quiz: What are your answers? Make some examples of implicit ratings. Compare advantages and disadvantages related to each kind of rating. Name some current challenges in the field of RS. What are some of the benefits of employing RS? What is “Netflix prize”? How are information seeking and RS different? Give some examples of the most popular algorithms employed by RSs.

Use Quizgecko on...
Browser
Browser