Podcast
Questions and Answers
In collaborative filtering, what distinguishes an 'active user' from other users?
In collaborative filtering, what distinguishes an 'active user' from other users?
- The active user has the highest similarity to all other users.
- The active user is the first user to rate an item.
- The active user has provided the most ratings.
- The active user is the one for whom predictions are being made. (correct)
Which of the following is an example of 'implicit' feedback in a collaborative filtering system?
Which of the following is an example of 'implicit' feedback in a collaborative filtering system?
- A user giving a product a 5-star rating.
- A user adding an item to their wishlist.
- A user writing a positive review about a movie.
- A user purchasing a product. (correct)
What is the fundamental idea behind User-based Collaborative Filtering?
What is the fundamental idea behind User-based Collaborative Filtering?
- Recommending items based on the user's past purchases.
- Recommending items liked by users with similar preferences. (correct)
- Recommending items based on their inherent qualities.
- Recommending items that are frequently purchased together.
Which step is typically performed last in a user-based collaborative filtering system?
Which step is typically performed last in a user-based collaborative filtering system?
In user-based collaborative filtering, what is the role of the 'metric'?
In user-based collaborative filtering, what is the role of the 'metric'?
Why is calculating user similarity a crucial step in user-based collaborative filtering?
Why is calculating user similarity a crucial step in user-based collaborative filtering?
What does a high Cosine similarity score (close to 1) between two users indicate?
What does a high Cosine similarity score (close to 1) between two users indicate?
Euclidean distance is used to measure the distance between two users. How is this distance typically interpreted in collaborative filtering?
Euclidean distance is used to measure the distance between two users. How is this distance typically interpreted in collaborative filtering?
In the context of collaborative filtering, what does a Pearson Correlation Coefficient of -1 between two users suggest?
In the context of collaborative filtering, what does a Pearson Correlation Coefficient of -1 between two users suggest?
Why might a simple average of neighbor ratings NOT be an ideal prediction function in collaborative filtering?
Why might a simple average of neighbor ratings NOT be an ideal prediction function in collaborative filtering?
Which of the following prediction functions addresses rating scale biases between users?
Which of the following prediction functions addresses rating scale biases between users?
What is the primary purpose of using a mean-centered prediction function in collaborative filtering?
What is the primary purpose of using a mean-centered prediction function in collaborative filtering?
In item-based collaborative filtering, what is being measured to make recommendations?
In item-based collaborative filtering, what is being measured to make recommendations?
How does item-based collaborative filtering differ from user-based collaborative filtering?
How does item-based collaborative filtering differ from user-based collaborative filtering?
Why is item-based collaborative filtering considered more stable than user-based collaborative filtering?
Why is item-based collaborative filtering considered more stable than user-based collaborative filtering?
What similarity measure is 'mean-adjusted' in item-based collaborative filtering?
What similarity measure is 'mean-adjusted' in item-based collaborative filtering?
What is the purpose of 'Significance Weighting' in collaborative filtering?
What is the purpose of 'Significance Weighting' in collaborative filtering?
Why is a 'discount factor' used in significance weighting?
Why is a 'discount factor' used in significance weighting?
User-based collaborative filtering is more suitable when the number of users is much ____ than the number of items.
User-based collaborative filtering is more suitable when the number of users is much ____ than the number of items.
Which of the following is a disadvantage of user-based collaborative filtering?
Which of the following is a disadvantage of user-based collaborative filtering?
Which of the following is a key advantage of item-based collaborative filtering?
Which of the following is a key advantage of item-based collaborative filtering?
What does the term 'Cold Start' refer to in the context of collaborative filtering?
What does the term 'Cold Start' refer to in the context of collaborative filtering?
In collaborative filtering, what is the 'Long Tail' problem?
In collaborative filtering, what is the 'Long Tail' problem?
How does offline training help with the scaling issues in collaborative filtering?
How does offline training help with the scaling issues in collaborative filtering?
What distinguishes Memory-based from Model-based collaborative filtering?
What distinguishes Memory-based from Model-based collaborative filtering?
Which of the following collaborative filtering approaches is considered 'Memory-based'?
Which of the following collaborative filtering approaches is considered 'Memory-based'?
Which of the following collaborative filtering approaches is considered 'Model-based'?
Which of the following collaborative filtering approaches is considered 'Model-based'?
What is the primary goal of collaborative filtering?
What is the primary goal of collaborative filtering?
In collaborative filtering, what is the role of user-item ratings?
In collaborative filtering, what is the role of user-item ratings?
What distinguishes collaborative filtering from classification techniques?
What distinguishes collaborative filtering from classification techniques?
How is collaborative filtering similar to missing value analysis?
How is collaborative filtering similar to missing value analysis?
What is 'serendipity' in the context of recommender systems?
What is 'serendipity' in the context of recommender systems?
Which of the following is not considered a scaling issue in collaborative filtering?
Which of the following is not considered a scaling issue in collaborative filtering?
Which technique addresses the reliability of similarity functions when users share few common ratings?
Which technique addresses the reliability of similarity functions when users share few common ratings?
When should a discount factor kick in during significance weighting?
When should a discount factor kick in during significance weighting?
Which best describes the relationship of similarity to distance?
Which best describes the relationship of similarity to distance?
With the Euclidean Distance formula, what does a smaller number mean?
With the Euclidean Distance formula, what does a smaller number mean?
What real world problem does cosine similarity help resolve over things like Euclidean or Manhattan?
What real world problem does cosine similarity help resolve over things like Euclidean or Manhattan?
Flashcards
Users and Items
Users and Items
A list of m users and a list of n items that will be rated.
Active User
Active User
A user who is performing the Collaborative Filtering prediction task.
Metric (Collaborative Filtering)
Metric (Collaborative Filtering)
A way of measuring the similarity between users.
Method (Collaborative Filtering)
Method (Collaborative Filtering)
Signup and view all the flashcards
Basic Idea (Collaborative Filtering)
Basic Idea (Collaborative Filtering)
Signup and view all the flashcards
User-based CF
User-based CF
Signup and view all the flashcards
Item-based CF
Item-based CF
Signup and view all the flashcards
Input (Collaborative Filtering)
Input (Collaborative Filtering)
Signup and view all the flashcards
Output (Collaborative Filtering)
Output (Collaborative Filtering)
Signup and view all the flashcards
User-based CF steps
User-based CF steps
Signup and view all the flashcards
User Similarity
User Similarity
Signup and view all the flashcards
Euclidean distance
Euclidean distance
Signup and view all the flashcards
Cosine similarity
Cosine similarity
Signup and view all the flashcards
Pearson Correlation Coefficient
Pearson Correlation Coefficient
Signup and view all the flashcards
Number ranges
Number ranges
Signup and view all the flashcards
Prediction Functions
Prediction Functions
Signup and view all the flashcards
Underlying Assumption
Underlying Assumption
Signup and view all the flashcards
Mean-centered function
Mean-centered function
Signup and view all the flashcards
Item-based prediction
Item-based prediction
Signup and view all the flashcards
Adjusted Cosine similarity
Adjusted Cosine similarity
Signup and view all the flashcards
Collaborative Filtering vs Classification
Collaborative Filtering vs Classification
Signup and view all the flashcards
Significance Weighting
Significance Weighting
Signup and view all the flashcards
Discount Factor
Discount Factor
Signup and view all the flashcards
User-based CF Pros
User-based CF Pros
Signup and view all the flashcards
User-based CF Cons
User-based CF Cons
Signup and view all the flashcards
Item-based CF Pros
Item-based CF Pros
Signup and view all the flashcards
Item-based CF Cons
Item-based CF Cons
Signup and view all the flashcards
Serendipity
Serendipity
Signup and view all the flashcards
Cold start
Cold start
Signup and view all the flashcards
Long Tail
Long Tail
Signup and view all the flashcards
Scaling Issues
Scaling Issues
Signup and view all the flashcards
Systems
Systems
Signup and view all the flashcards
Study Notes
- Collaborative filtering (CF) involves a list of m users and a list of n items.
- Each user has a list of items with an associated opinion or rating.
Opinion Types
- Explicit, like a 3-star rating on Google Play Store,
- Implicit, such as purchasing a product.
- An active user is the target for whom the CF prediction task occurs.
- Metrics measure similarity between users.
- Methods select a subset of the closest neighbors.
Collaborative Filtering
- It uses "similarities" to recommend items to the active user.
- It is a prominent, well-understood approach that utilizes various algorithms across domains like movies and e-commerce.
- User-based CF finds similar users and recommends the items they liked.
- Item-based CF recommends items similar to those the active user frequently likes.
- CF input is a matrix of user-item ratings.
- CF output is a numerical prediction indicating the degree to which the active user will like an item.
- Top-N recommended items are provided in a list
User-Based Filtering (Basic Steps)
- Given an active user X and an item i not yet seen by X.
- Find users that also liked what X liked in the past
- Average ratings predict how much X will like item i.
- This is done for all items X hasn't seen, recommending the best-rated.
- Find k users who are the nearest neighbors.
- User-based nearest neighbor collaborative filtering.
User-Based Filtering Example
- A matrix of users and their ratings for items, with User 1 being the active user.
- The main challenge is to predict the rating of User 1 for Item 5.
- Consider issues like measuring similarity, determining neighbor count, and generating predictions from neighbor ratings.
Measuring User Similarity
- Similarity is calculated as a measure of "anti-distance".
- Similarity is the inverse of distance.
- Similarity = 1 – Distance
Similarity Measures
- Euclidean
- Jaccard
- Cosine
- Adjusted cosine,
- Raw cosine
- Pearson correlation.
- Euclidean distance is determined between two vectors v1 = (3, 10) and v2 = (7, 13).
- Straight-line distance: d = √((x2 - x1)² + (y2 - y1)²) which simplifies to d = √((7-3)² + (13 – 10)²) = 5.
- Euclidean distance is found between User 1 and all other users to find k nearest neighbors.
- When Euclidean or Manhattan distance cannot correctly detect patterns, cosine distance, or similarity, is another measure that can be used.
- CosineSim = Cos(0)
- Cosine similarity calculates: Σ (AᵢBᵢ) / √Σ(Aᵢ²) √Σ(Bᵢ²).
Pearson Correlation Coefficient (r)
- It measures both magnitude and orientation between data points, between -1 and 1
- -1: strong negative correlation
- 0: no correlation
- 1: strong positive correlation
- Pearson Correlation Coefficient calculation: r= Σ((xi – x)(yi – ỳ) / √( Σ(xi – x)² Σ(yi – ỳ)²)
- Having found the most similar users to active users, the next step is to predict the active user's rating
- Various prediction functions include the average of neighbor ratings.
User-Based Filtering Prediction
- R15 calculation =(3*0.85)+(4*0.70) / |0.85|+|0.70| = 3.45, where R15 ≈ 3
- Pearson Correlation Coefficient and k = 2 nearest neighbors.
- Predicted rating of User 1 is approximately 3.
- A key problem with Pearson Correlation Coefficient is the underlying assumption that users dislike what they rated below average.
- This assumption is often false.
- The correlation flattens in cases of uniformly distributed ratings.
Mean Centered Prediction Function
- A mean-centered prediction function required.
- Consider User 3 and their ratings for unseen Item 1 and Item 6 to determine the top-rated item.
- Ratings are predicted without mean-centering and result in User 3 more likely than average to enjoy both Item 1 and Item 6,
- A mean-centered prediction function removes bias.
- Ru = ra + Σb∈n sim(a, b) * (rb,p - rb) / Σb ∈n sim(a, b)
- Final predicted ratings with mean-centered equations = 3.35 for Item 1 and 0.86 for Item 6.
- Conclusion: item 3 still appears to be the most liked item by User 3, and item 6 is now clearly the least liked item by User 3.
Item-Based Collaborative Filtering.
-
The idea is the same as user-based except using item similarity, and not other user's similarities, to predict ratings.
-
Item-based is more stable than user based.
-
For User 3, predict ratings for Item 1 and Item 6
-
Use Adjusted Cosine similarity
-
It is Cosine similarity that is mean-adjusted
-
Sim(a,b)= Σ (ru,a – ru )(ru,b – ru) / √(Σ(ru,a – ru )² Σ(ru,b – ru )²)
-
Adjusted Cosine(I1, I3) = (1.5*1.5)+(-1.5*-0.5)+(-1*-1) / √1.5²+(-1.5)²+(-1)². √1.5²+(-0.5)²+(-1)²
-
Similarity of I1 is calculated with all other items
-
the R31 equation is: (3*0.735)+(3*0.912) / |0.735|+|0.912|=3
Collaborative Filtering Vs Classification
- Filtering lacks a distinction between dependent/independent variables, unlike typical classification.
- Collaborative Filtering resembles a missing value analysis with a larger matrix.
Improving CF
- Significance weighting increases reliability.
- The reliability of any similarity function sim(u, v) between two users u and v is often affected by the number of common ratings between u and v i.e. (Iu ∩ Iv)
- If the two users only have a small number of ratings in common, the similarity function sim(u, v) should include a discount factor to de-emphasize the importance of that particular user pair
- This method is known as Significance Weighting.
- The discount factor kicks in when the number of common ratings between the two users is less than a particular threshold β.
- DiscountSim(u, v) = Sim(u, v) * min{(Iu∩Iv), β} / β
- Iu∩Iv = common ratings between u/v
- Sim(u,v) original similarity score.
- Beta is the threshold value.
User-Based Pros
- More diverse recommendations
- It is better if the number of users us smaller than the number of items
User-Based Cons
- Generally not as stable as user preferences change quickly
- Cannot offer in-depth analysis on individual user
Item-Based Pros
- More accurate recommendations
- Better if the number of items is larger than the number of users
- More stable
- Can provide in-depth analysis
Item-Based Cons
- Prone to shilling attacks (malicious user degrading an item)
- Provides less diversity
Collaborative Filtering Issues
- Serendipity: Expand the user’s taste into neighboring areas -Cold Start: Using collaborative filtering without initial data
- Long Tail: Only rating popular items
- Scaling: Collaborative filtering requires a lot of computational operations
Recommender Systems
- These Can be Memory-based or Model-based
- Memory-based systems use entire data every time
- User-based Collaborative Filtering
- Model-based systems use the data once and can predict without needing the entire data each time
- Item-based Collaborative Filtering
- Content-based Recommender System
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.