Association Rules Chapter 10 PDF

Document Details

WondrousNewOrleans

Uploaded by WondrousNewOrleans

Loyalist College

Tags

association rules data mining machine learning business analytics

Summary

This document details association rules, a data mining technique used to discover relationships between items in a dataset. It explains how association rules work, business applications like in sales marketing, retail and medicine and algorithms like the Apriori Algorithm. A practical case with examples and tables is shown for better understanding.

Full Transcript

Association Rules Chapter 10 Learning Objectives Understand Association Rule Mining Business Applications of Association Rule Mining Understand Association Rules Identify the Key Parameters for Association Rules Steps to Perform Apriori Algorithm on a Small Data Set What is Associa...

Association Rules Chapter 10 Learning Objectives Understand Association Rule Mining Business Applications of Association Rule Mining Understand Association Rules Identify the Key Parameters for Association Rules Steps to Perform Apriori Algorithm on a Small Data Set What is Association Rule Mining It is a popular unsupervised learning techniques for data mining It is also called market basket analysis It helps in finding interesting relationships between items/events Data should be categorical in nature for this technique to be used There is no dependent variable It uses machine learning algorithms e.g. “A Customer who bought a flight tickets and a hotel reservation also bought a rental car plan 60 percent of Case Study: Netflix Recommendation Engine Netflix suggestions and recommendation engines are powered by a sutie of algorithms using data about millions of customer ratings about thousands of movies. Most of these algorithms are based on the premise that similar viewin patterns represent similar user tastes. This suite of algorithms, called CineMatch, instructs Netflix's servers to process information from its databases to determine which movies a customer is likely to enjoy. The algorithm takes into account many factors about the films themselves, the customers‘ ratings, and the combined ratings of all Netflix users. The company estimates that a whopping 75 percent of viewer activity is driven by recommendations. According to Netflix, these predictions were valid around 75 percent of the time and half of Netflix users who rented CineMatchrecommended movies gave them a five-star rating. Are Netflix customers being manipulated into seeing what Netflix wants them to see? Business Applications of Association Rules In sales and marketing, it is used for cross-marketing and cross-selling, catalog design, e-commerce site design, online advertising optimization, product pricing, and sales/promotion configurations In retail environments, it can be used for store design. Strongly associated items can be kept close together for customer convenience. Or they could be placed far from each other so that the customer has to walk the aisles and by doing so is potentially exposed to other items. In medicine, this technique can be used for relationships between symptoms and illnesses; diagnosis and patient characteristics/treatments; genes and their functions; etc. What are Association Rules If a customer buys milk he may also buy cereal If a customer buys a tablet computer then he may buy a case Two basic criteria that association rules use: Support Confidence Not all association rules are interesting or useful The goal is to find the association rules that satisfy a user specified minimum support and user specified minimum confidence The number of association rules depends upon business need. Implementing every rule in business will require some cost and effort, with some potential of gains. The strongest of rules, with the higher support and confidence rates, should be used first, and the others should be progressively implemented later. Representing Association Rules: Support and Confidence These are the constraint measures for identifying which rules to keep and which ones to discard Support (X) = Number of transactions in which X and Y appear P(X U Y) Total number of transactions Confidence (X  Y) = P (X U Y) P(X) Algorithms for Association Rules There are a large number of algorithms that are available for generating association rules. The most popular algorithms are Apriori, Eclat, FP- Growth, along with various derivatives and hybrids of the three. All the algorithms help identify the frequent item sets, which are then converted to association rules. Apriori Algorithm This is the most popular algorithm used for association rule mining. The objective is to find subsets that are common to at least a minimum number of the itemsets. The Apriori property is​ a downward ​ closure property, which means that any subsets of a frequent itemset are also frequent itemsets. if (A,B,C,D) is a frequent itemset, then any subset such as (A,B,C) or (B,D) are also frequent itemsets. Association Rules Exercise There are six products : Milk, Bread, Butter, Transactions List 1 Milk Eggs Bread Butter Eggs, Cookies, and Ketchup. 2 Milk Butter Eggs Ketchup The objective is to use this transaction data to 3 Bread 4 Milk Butter Bread Ketchup Butter find association between products, i.e. which 5 Bread Butter Cookies products sell together often. 6 Milk Bread Butter Cookies 7 Milk Cookies Rules 8 Milk Bread Butter 9 Bread Butter Eggs Cookies Support =33% 10 Milk Butter Bread Confidence level = 50% 11 Milk Bread Butter 12 Milk Bread Cookies Ketchup Support means those itemsets that occur at least 33 percent of the time in the total set of transactions. Confidence level means that within those itemsets, the rules of the form X → Y should be such that there is at least 50 percent chance of Y occurring based on X occurring. Steps: Association Rules Rules Support =33% Confidence level = Frequency%occurred (Freq/12) Milk 9 75% 50% Bread 10 83% Step 1: Calculate Eggs 3 25% Butter 10 83% Transactions List Ketchup 3 25% Support 1 Milk 2 Milk Eggs Butter Bread Eggs Butter Ketchup Cookies 5 42% 3 Bread Butter Ketchup 4 Milk Bread Butter 5 Bread Butter Cookies 6 Milk Bread Butter Cookies 7 Milk Cookies 8 Milk Bread Butter 9 Bread Butter Eggs Cookies 10 Milk Butter Bread 11 Milk Bread Butter 12 Milk Bread Cookies Ketchup Rules Steps: Association Rules Support =33% Confidence level = 50% Frequency%occurred (Freq/12) Step 2: Calculate Milk, Bread 7 58.33% Milk, Butter 7 58.33% Support for 2 items Milk, Cookies Bread, Butter 3 9 25.00% 75.00% Transactions List Bread, Cookies 4 33.33% 1 Milk Eggs Bread Butter Butter, Cookies 3 25.00% 2 Milk Butter Eggs Ketchup 3 Bread Butter Ketchup 4 Milk Bread Butter 5 Bread Butter Cookies There is no room to 6 Milk Bread Butter Cookies create a 4-item itemset 7 Milk Cookies for this support level. 8 Milk Bread Butter 9 Bread Butter Eggs Cookies 10 Milk Butter Bread 11 Milk Bread Butter 12 Milk Bread Cookies Ketchup Steps: Association Rules Rules Support =33% Confidence level = Frequency %occurred (Freq/12) 50% Milk, Bread, Butter 6 50.00% Step 3: Calculate Milk, Bread, Cookies Bread, Butter, Cookies 2 3 16.67% 25.00% Support Transactions List 1 Milk forBread Eggs 3 items Butter 2 Milk Butter Eggs Ketchup 3 Bread Butter Ketchup 4 Milk Bread Butter 5 Bread Butter Cookies 6 Milk Bread Butter Cookies 7 Milk Cookies 8 Milk Bread Butter 9 Bread Butter Eggs Cookies 10 Milk Butter Bread 11 Milk Bread Butter 12 Milk Bread Cookies Ketchup Steps: Association Rules Rules Support =33% Confidence level = 50% Milk, Step 4: Association Bread, Butter Times for X,Y Times for X,Y ( ) S =Times for X,Y ( )/12 C=Times for X,Y ( )/Times for X,Y Bread, Butter, (Milk) 9 6 50% 67% Rules: Milk, Bread, (Butter) 7 6 50% 86% Milk, Butter, (Bread) 7 6 50% 86%

Use Quizgecko on...
Browser
Browser