Collaborative Filtering Techniques

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In collaborative filtering, what distinguishes an 'active user' from other users?

  • The active user has the highest similarity to all other users.
  • The active user is the first user to rate an item.
  • The active user has provided the most ratings.
  • The active user is the one for whom predictions are being made. (correct)

Which of the following is an example of 'implicit' feedback in a collaborative filtering system?

  • A user giving a product a 5-star rating.
  • A user adding an item to their wishlist.
  • A user writing a positive review about a movie.
  • A user purchasing a product. (correct)

What is the fundamental idea behind User-based Collaborative Filtering?

  • Recommending items based on the user's past purchases.
  • Recommending items liked by users with similar preferences. (correct)
  • Recommending items based on their inherent qualities.
  • Recommending items that are frequently purchased together.

Which step is typically performed last in a user-based collaborative filtering system?

<p>Recommending the highest-rated items. (B)</p> Signup and view all the answers

In user-based collaborative filtering, what is the role of the 'metric'?

<p>To measure the similarity between users. (D)</p> Signup and view all the answers

Why is calculating user similarity a crucial step in user-based collaborative filtering?

<p>To predict the ratings of items the active user has not yet seen. (C)</p> Signup and view all the answers

What does a high Cosine similarity score (close to 1) between two users indicate?

<p>The users have similar rating patterns. (A)</p> Signup and view all the answers

Euclidean distance is used to measure the distance between two users. How is this distance typically interpreted in collaborative filtering?

<p>A smaller distance indicates higher similarity. (B)</p> Signup and view all the answers

In the context of collaborative filtering, what does a Pearson Correlation Coefficient of -1 between two users suggest?

<p>The two users have completely opposite preferences. (C)</p> Signup and view all the answers

Why might a simple average of neighbor ratings NOT be an ideal prediction function in collaborative filtering?

<p>It doesn't account for differences in user rating scales. (D)</p> Signup and view all the answers

Which of the following prediction functions addresses rating scale biases between users?

<p>A mean-centered prediction function. (B)</p> Signup and view all the answers

What is the primary purpose of using a mean-centered prediction function in collaborative filtering?

<p>To improve the accuracy of rating predictions. (C)</p> Signup and view all the answers

In item-based collaborative filtering, what is being measured to make recommendations?

<p>The similarity between items. (D)</p> Signup and view all the answers

How does item-based collaborative filtering differ from user-based collaborative filtering?

<p>Item-based CF predicts ratings based on similar items. (D)</p> Signup and view all the answers

Why is item-based collaborative filtering considered more stable than user-based collaborative filtering?

<p>Item preferences change more slowly than user preferences. (B)</p> Signup and view all the answers

What similarity measure is 'mean-adjusted' in item-based collaborative filtering?

<p>Adjusted Cosine Similarity. (B)</p> Signup and view all the answers

What is the purpose of 'Significance Weighting' in collaborative filtering?

<p>To adjust similarity scores based on the number of common ratings. (A)</p> Signup and view all the answers

Why is a 'discount factor' used in significance weighting?

<p>To de-emphasize user pairs with few common ratings. (C)</p> Signup and view all the answers

User-based collaborative filtering is more suitable when the number of users is much ____ than the number of items.

<p>smaller (D)</p> Signup and view all the answers

Which of the following is a disadvantage of user-based collaborative filtering?

<p>It can be unstable due to changing user preferences. (B)</p> Signup and view all the answers

Which of the following is a key advantage of item-based collaborative filtering?

<p>It provides more accurate recommendations in general. (B)</p> Signup and view all the answers

What does the term 'Cold Start' refer to in the context of collaborative filtering?

<p>The challenge of recommending when there is very little data. (D)</p> Signup and view all the answers

In collaborative filtering, what is the 'Long Tail' problem?

<p>The system's difficulty in recommending less popular items. (D)</p> Signup and view all the answers

How does offline training help with the scaling issues in collaborative filtering?

<p>By pre-computing similarities to avoid real-time calculations. (A)</p> Signup and view all the answers

What distinguishes Memory-based from Model-based collaborative filtering?

<p>Memory-based systems use the entire dataset for each prediction. (A)</p> Signup and view all the answers

Which of the following collaborative filtering approaches is considered 'Memory-based'?

<p>User-based Collaborative Filtering. (C)</p> Signup and view all the answers

Which of the following collaborative filtering approaches is considered 'Model-based'?

<p>Item-based Collaborative Filtering. (D)</p> Signup and view all the answers

What is the primary goal of collaborative filtering?

<p>To predict user preferences based on past behavior. (B)</p> Signup and view all the answers

In collaborative filtering, what is the role of user-item ratings?

<p>To provide a numerical representation of user preferences. (C)</p> Signup and view all the answers

What distinguishes collaborative filtering from classification techniques?

<p>Collaborative filtering lacks a distinction of independent vs dependent variables. (B)</p> Signup and view all the answers

How is collaborative filtering similar to missing value analysis?

<p>Both aim to predict unknown values within a data structure. (A)</p> Signup and view all the answers

What is 'serendipity' in the context of recommender systems?

<p>The system's ability to recommend items the user might not have otherwise discovered. (D)</p> Signup and view all the answers

Which of the following is not considered a scaling issue in collaborative filtering?

<p>Collaborative filtering requires less computational operations. (A)</p> Signup and view all the answers

Which technique addresses the reliability of similarity functions when users share few common ratings?

<p>Significance Weighting (A)</p> Signup and view all the answers

When should a discount factor kick in during significance weighting?

<p>When the common ratings between the two users is less than a particular threshold (A)</p> Signup and view all the answers

Which best describes the relationship of similarity to distance?

<p>Similarity is the inverse of Distance (D)</p> Signup and view all the answers

With the Euclidean Distance formula, what does a smaller number mean?

<p>A closer relationship (A)</p> Signup and view all the answers

What real world problem does cosine similarity help resolve over things like Euclidean or Manhattan?

<p>It cannot correctly detect patterns between our data points (C)</p> Signup and view all the answers

Flashcards

Users and Items

A list of m users and a list of n items that will be rated.

Active User

A user who is performing the Collaborative Filtering prediction task.

Metric (Collaborative Filtering)

A way of measuring the similarity between users.

Method (Collaborative Filtering)

A way of selecting a subset of users consisting of the closest neighbors.

Signup and view all the flashcards

Basic Idea (Collaborative Filtering)

Uses similarities between users to recommend items.

Signup and view all the flashcards

User-based CF

Find users most similar and recommend what they liked.

Signup and view all the flashcards

Item-based CF

Recommend items similar to the ones the user frequently likes.

Signup and view all the flashcards

Input (Collaborative Filtering)

A matrix of user-item ratings.

Signup and view all the flashcards

Output (Collaborative Filtering)

A prediction of the degree to which the active user will like an item.

Signup and view all the flashcards

User-based CF steps

Find a set of users who liked the same items and rated a new item.

Signup and view all the flashcards

User Similarity

Compute similarity as a measure of anti-distance.

Signup and view all the flashcards

Euclidean distance

A similarity measure based on straight-line distance.

Signup and view all the flashcards

Cosine similarity

A similarity measure that detects patterns between data points.

Signup and view all the flashcards

Pearson Correlation Coefficient

A measure of correlation between users.

Signup and view all the flashcards

Number ranges

A number between -1 and 1

Signup and view all the flashcards

Prediction Functions

How do we use similar users to find the prediction of taste for a active user?

Signup and view all the flashcards

Underlying Assumption

Users may dislike items they rated below average.

Signup and view all the flashcards

Mean-centered function

A prediction function that is centered to remove any bias.

Signup and view all the flashcards

Item-based prediction

It is the same as user-based, except using items instead of users.

Signup and view all the flashcards

Adjusted Cosine similarity

Cosine similarity that is mean-adjusted.

Signup and view all the flashcards

Collaborative Filtering vs Classification

Unlike classification, there is no distinction between dependent and independent variables in Collaborative Filtering

Signup and view all the flashcards

Significance Weighting

The reliability of any similarity function

Signup and view all the flashcards

Discount Factor

When should a discount factor be applied to a similarity score?

Signup and view all the flashcards

User-based CF Pros

Provides more diverse recommendations.

Signup and view all the flashcards

User-based CF Cons

It is generally not stable as user preferences change rather quickly

Signup and view all the flashcards

Item-based CF Pros

Provides more accurate recommendations in general.

Signup and view all the flashcards

Item-based CF Cons

Is prone to shilling attacks.

Signup and view all the flashcards

Serendipity

At times, it's good to recommend something different to the user.

Signup and view all the flashcards

Cold start

Ratings are not available for a newly launched website/store

Signup and view all the flashcards

Long Tail

A large number of items will be unrated hence cannot be recommended easily

Signup and view all the flashcards

Scaling Issues

Collaborative filtering requires a lot of computational operations.

Signup and view all the flashcards

Systems

Systems can either be memory based, or model based.

Signup and view all the flashcards

Study Notes

  • Collaborative filtering (CF) involves a list of m users and a list of n items.
  • Each user has a list of items with an associated opinion or rating.

Opinion Types

  • Explicit, like a 3-star rating on Google Play Store,
  • Implicit, such as purchasing a product.
  • An active user is the target for whom the CF prediction task occurs.
  • Metrics measure similarity between users.
  • Methods select a subset of the closest neighbors.

Collaborative Filtering

  • It uses "similarities" to recommend items to the active user.
  • It is a prominent, well-understood approach that utilizes various algorithms across domains like movies and e-commerce.
  • User-based CF finds similar users and recommends the items they liked.
  • Item-based CF recommends items similar to those the active user frequently likes.
  • CF input is a matrix of user-item ratings.
  • CF output is a numerical prediction indicating the degree to which the active user will like an item.
  • Top-N recommended items are provided in a list

User-Based Filtering (Basic Steps)

  • Given an active user X and an item i not yet seen by X.
  • Find users that also liked what X liked in the past
  • Average ratings predict how much X will like item i.
  • This is done for all items X hasn't seen, recommending the best-rated.
  • Find k users who are the nearest neighbors.
  • User-based nearest neighbor collaborative filtering.

User-Based Filtering Example

  • A matrix of users and their ratings for items, with User 1 being the active user.
  • The main challenge is to predict the rating of User 1 for Item 5.
  • Consider issues like measuring similarity, determining neighbor count, and generating predictions from neighbor ratings.

Measuring User Similarity

  • Similarity is calculated as a measure of "anti-distance".
  • Similarity is the inverse of distance.
  • Similarity = 1 – Distance

Similarity Measures

  • Euclidean
  • Jaccard
  • Cosine
  • Adjusted cosine,
  • Raw cosine
  • Pearson correlation.
  • Euclidean distance is determined between two vectors v1 = (3, 10) and v2 = (7, 13).
  • Straight-line distance: d = √((x2 - x1)² + (y2 - y1)²) which simplifies to d = √((7-3)² + (13 – 10)²) = 5.
  • Euclidean distance is found between User 1 and all other users to find k nearest neighbors.
  • When Euclidean or Manhattan distance cannot correctly detect patterns, cosine distance, or similarity, is another measure that can be used.
  • CosineSim = Cos(0)
  • Cosine similarity calculates: Σ (Aáµ¢Báµ¢) / √Σ(Aᵢ²) √Σ(Bᵢ²).

Pearson Correlation Coefficient (r)

  • It measures both magnitude and orientation between data points, between -1 and 1
    • -1: strong negative correlation
    • 0: no correlation
    • 1: strong positive correlation
  • Pearson Correlation Coefficient calculation: r= Σ((xi – x)(yi – ỳ) / √( Σ(xi – x)² Σ(yi – ỳ)²)
  • Having found the most similar users to active users, the next step is to predict the active user's rating
  • Various prediction functions include the average of neighbor ratings.

User-Based Filtering Prediction

  • R15 calculation =(3*0.85)+(4*0.70) / |0.85|+|0.70| = 3.45, where R15 ≈ 3
  • Pearson Correlation Coefficient and k = 2 nearest neighbors.
  • Predicted rating of User 1 is approximately 3.
  • A key problem with Pearson Correlation Coefficient is the underlying assumption that users dislike what they rated below average.
  • This assumption is often false.
  • The correlation flattens in cases of uniformly distributed ratings.

Mean Centered Prediction Function

  • A mean-centered prediction function required.
  • Consider User 3 and their ratings for unseen Item 1 and Item 6 to determine the top-rated item.
  • Ratings are predicted without mean-centering and result in User 3 more likely than average to enjoy both Item 1 and Item 6,
  • A mean-centered prediction function removes bias.
  • Ru = ra + Σb∈n sim(a, b) * (rb,p - rb) / Σb ∈n sim(a, b)
  • Final predicted ratings with mean-centered equations = 3.35 for Item 1 and 0.86 for Item 6.
  • Conclusion: item 3 still appears to be the most liked item by User 3, and item 6 is now clearly the least liked item by User 3.

Item-Based Collaborative Filtering.

  • The idea is the same as user-based except using item similarity, and not other user's similarities, to predict ratings.

  • Item-based is more stable than user based.

  • For User 3, predict ratings for Item 1 and Item 6

  • Use Adjusted Cosine similarity

  • It is Cosine similarity that is mean-adjusted

  • Sim(a,b)= Σ (ru,a – ru )(ru,b – ru) / √(Σ(ru,a – ru )² Σ(ru,b – ru )²)

  • Adjusted Cosine(I1, I3) = (1.5*1.5)+(-1.5*-0.5)+(-1*-1) / √1.5²+(-1.5)²+(-1)². √1.5²+(-0.5)²+(-1)²

  • Similarity of I1 is calculated with all other items

  • the R31 equation is: (3*0.735)+(3*0.912) / |0.735|+|0.912|=3

Collaborative Filtering Vs Classification

  • Filtering lacks a distinction between dependent/independent variables, unlike typical classification.
  • Collaborative Filtering resembles a missing value analysis with a larger matrix.

Improving CF

  • Significance weighting increases reliability.
  • The reliability of any similarity function sim(u, v) between two users u and v is often affected by the number of common ratings between u and v i.e. (Iu ∩ Iv)
  • If the two users only have a small number of ratings in common, the similarity function sim(u, v) should include a discount factor to de-emphasize the importance of that particular user pair
  • This method is known as Significance Weighting.
  • The discount factor kicks in when the number of common ratings between the two users is less than a particular threshold β.
  • DiscountSim(u, v) = Sim(u, v) * min{(Iu∩Iv), β} / β
  • Iu∩Iv = common ratings between u/v
  • Sim(u,v) original similarity score.
  • Beta is the threshold value.

User-Based Pros

  • More diverse recommendations
  • It is better if the number of users us smaller than the number of items

User-Based Cons

  • Generally not as stable as user preferences change quickly
  • Cannot offer in-depth analysis on individual user

Item-Based Pros

  • More accurate recommendations
  • Better if the number of items is larger than the number of users
  • More stable
  • Can provide in-depth analysis

Item-Based Cons

  • Prone to shilling attacks (malicious user degrading an item)
  • Provides less diversity

Collaborative Filtering Issues

  • Serendipity: Expand the user’s taste into neighboring areas -Cold Start: Using collaborative filtering without initial data
  • Long Tail: Only rating popular items
  • Scaling: Collaborative filtering requires a lot of computational operations

Recommender Systems

  • These Can be Memory-based or Model-based
  • Memory-based systems use entire data every time
  • User-based Collaborative Filtering
  • Model-based systems use the data once and can predict without needing the entire data each time
  • Item-based Collaborative Filtering
  • Content-based Recommender System

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser