Association Rule Mining CHAP 5 PDF

Document Details

Uploaded by Deleted User

Jiawei Han and Micheline Kamber

Tags

association rule mining data mining frequent patterns market basket analysis

Summary

This document provides an overview of Association Rule Mining. It covers the concepts and techniques involved in discovering relationships between items in a dataset. The document explains frequent patterns, market basket analysis, and association rules using examples.

Full Transcript

Module 5 Mining frequent patterns and associations Text Book: Data Mining: Concepts and Techniques Jiawei Han and Micheline Kamber What Is Frequent Pattern? Frequent pattern: a pattern (a set of items, subsequences, s...

Module 5 Mining frequent patterns and associations Text Book: Data Mining: Concepts and Techniques Jiawei Han and Micheline Kamber What Is Frequent Pattern? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. Why is frequent pattern mining is important ? Frequent pattern : An important property of datasets. Foundation for many essential data mining tasks Association, correlation, and causality analysis Sequential patterns Pattern analysis in multimedia, time-series Classification: discriminative, frequent pattern analysis Cluster analysis: frequent pattern-based clustering What is Association Rule Mining? Association rule mining, is a technique used in data mining to discover relationships between items in a dataset. The goal of association rule mining is to identify relationships between items in a dataset that occur frequently together. Market-Basket transactions TID ITEMS 1 Bread ,Milk Example of Association Rules 2 Bread, Curd, Butter, Eggs 3 Milk, Curd, Butter, Coke {Curd} {Butter}, 4 Bread, Milk, Curd, Butter {Milk, Bread} {Eggs, Coke}, {Butter, Bread} {Milk}, 5 Bread, Milk, Curd, Coke Application of Association Rule Mining- Market Basket Analysis Market basket analysis Market basket analysis is a data mining technique used by retailers to increase sales by better understanding customer purchasing patterns. Market Basket Analysis INPUT: list of purchases by purchaser. Identify purchase patterns What items tend to be purchased together? What items are purchased sequentially? Market Basket Analysis Categorize customer purchase behaviour Benefits – use for marketing layout or catalogs – select products for promotion – space allocation, – product placement How does Market Basket Analysis Work? Market Basket Analysis is modelled on Association rule mining. Association rules are used to predict the likelihood of products being purchased together. i.e., the IF {}, THEN {} construct. For example, IF a customer buys bread, THEN he is likely to buy butter as well. Association rules are usually represented as: {Bread} -> {Butter} Definition: Frequent Itemset Itemset – A collection of one or more items TID ITEMS Example: {Milk, Bread, Curd} 1 Bread , Milk – k-itemset An itemset that contains k items 2 Bread, curd, butter, Eggs Support count () (in number) – Frequency of occurrence of an itemset 3 Milk, curd, butter, – E.g. ({Milk, Bread, Curd}) = 2 Coke Support (in %) 4 Milk, Bread, curd, – Fraction of transactions that contain an itemset butter – E.g. s({Milk, Bread, Curd}) = 2/5 5 Bread, Milk, curd, Frequent Itemset Coke – An itemset whose support is greater than or equal to a minsup threshold Definition: Association Rule Association Rule – An expression of the form X Y, where X and Y are itemsets – Example: {Milk, Curd} {Butter} Example- Association Rule for the following products: Milk, Cheese, Bread, Eggs Possible associations include: 1.if customers purchase milk they also purchase bread {milk} → {bread} 2.if customers purchase bread they also purchase milk {bread}→ {milk} 3.if customers purchase milk and eggs they also purchase cheese and bread {milk, eggs} → { cheese, bread} 4.if customers purchase milk, cheese, and eggs they also purchase bread {milk, cheese, eggs} → {bread} Based on a set of transactions of customers. Implication means co-occurrence, not causality! Definition: Association Rule Confidence Each rule has an associated confidence: the conditional probability of the association. E.g., the probability that purchasing a set of items they then purchase another set of items, so if there were 10000 recorded transactions purchasing milk, and of those 5000 purchase bread, we have 50% confidence for the following rule Products: Milk, Cheese, Bread, Eggs associations rule: 1.if customers purchase milk they also purchase bread {milk} → {bread} A minimum confidence constraint can be applied to the frequent itemsets to form rules. Rule Evaluation Metrics used in Association Rule Mining are – Support (s) Fraction of transactions that contain both X and Y – Confidence (c) Measures how often items in Y appear in transactions that contain X Example {Milk, Curd} {Butter} A No. of transaction in which X and Y S = ({Milk, Curd, Butter}) / T appears together / total no. of transactions = 2/5 C= ({Milk, Curd, Butter}) / ({Milk, Curd}) No. of tuples containing X and Y to the no. of tuples containing A = 2/3 TID ITEMS 1 Bread , Milk 2 Bread, curd, butter, Eggs 3 Milk, curd, butter, Coke 4 Milk, Bread, curd, butter 5 Bread, Milk, curd, Coke Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having – support ≥ minsup threshold – confidence ≥ minconf threshold Finding the large itemsets Brute-force approach: List all possible association rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds  Computationally prohibitive! Association Rules An association rule is an implication of the form: X → Y, where X, Y ⊂ I, and X ∩Y = ∅ {Milk, Biscuit} → {FruitJuice} {Milk,Biscuit} → {FruitJuice} (s=0.4, c=0.67) Mining Association Rules Example of Rules: {Milk, Curd} {Butter} (s=0.4, c=0.67) TID ITEMS {Milk, Butter} {Curd} (s=0.4, c=1.0) 1 Bread , Milk {Curd, Butter} {Milk} (s=0.4, c=0.67) 2 Bread, curd, butter, Eggs {Butter} {Milk,Curd} (s=0.4, c=0.67) 3 Milk, curd, butter, Coke {Curd} {Milk,Butter} (s=0.4, c=0.5) {Milk} {coke ,Butter} (s=0.4, c=0.5) 4 Milk, Bread, curd, butter 5 Bread, Milk, curd, Coke Observations: All the above rules are binary partitions of the same itemset: {Milk, Curd, Butter} Rules originating from the same itemset have identical support but can have different confidence Thus, we may decouple the support and confidence requirements Mining Association Rules Two-step approach: 1. Frequent Itemset Generation – Generate all itemsets whose support  minsup 2. Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive Generating Association Rules: The Apriori Algorithm In data mining, Apriori Algorithm is classic algorithm for generating ‘Association Rules’. ‘Association Rules’ reflect the inner relationship of datasets. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. Apriori Algorithm Method: – Let k=1 – Generate frequent itemsets of length 1 – Repeat until no new frequent itemsets are identified Generate length (k+1) candidate itemsets from length k frequent itemsets Prune candidate itemsets containing subsets of length k that are infrequent Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those that are frequent KEY CONCEPTS Candidate itemset Support Confidence Frequent Itemsets Apriori Properties: 1.Antimonotonicity 2. Downward Closure Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent (Downward Closure ) Apriori principle holds due to the following property of the support measure: – Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support. Two principles or properties are used to reduce the combinatorial search space for Association Rule generation. Downward closure property of frequent patterns All subset of any frequent itemset must also be frequent. Example: If {Tea, Biscuit, Coffee} is a frequent itemset, then all of the following itemsets are frequent; {Tea} {Biscuit} {Coffee} {Tea, Biscuit} {Tea, Coffee} {Biscuit, Coffee} Apriori pruning principle If an itemset is infrequent, its superset should not be generated for getting the frequent itemset. {Tea} {Biscuit} {Coffee} {Tea, Biscuit} Examples {Tea, Coffee} {Biscuit, Coffee} If {Tea, Biscuit} is a frequent itemset and Coffee is not frequent itemset, then all of the following item sets are frequent. Tea Biscuit Tea, Biscuit What is Association Rule? Key Concept for Association Rule: Support(A, B) =(no. of tuples having both A and B) / (Total no. of tuples) Confidence(A→B) =(no. of tuples having both A and B) / (no. of tuples having A) =Support(A,B) / Support(A) Frequent Itemset Generation null A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE Given n items, there are 2n possible ABCDE candidate itemsets Steps in Apriori Algorithm 1. Generating one itemset frequent patterns 2. Generating two and more itemset frequent patterns 3. Uses ‘Antimonotonicity’ property. 4. Generate maximum frequent itemset at the end. 5. Generate association rules. Rule Generation Given a frequent itemset L, find all non-empty subsets f  L such that f L – f satisfies the minimum confidence requirement – If {A,B,C,D} is a frequent itemset, candidate rules: ABC D, ABD C, ACD B, BCD A, A BCD, B ACD, C ABD, D ABC AB CD, AC BD, AD BC, BC AD, BD AC, CD AB, If |L| = k, then there are 2k – 2 candidate association rules (ignoring L  and  L) Rule Generation for Apriori Algorithm Lattice of rules ABCD=>{ } Low Confidence Rule BCD=>A ACD=>B ABD=>C ABC=>D CD=>AB BD=>AC BC=>AD AD=>BC AC=>BD AB=>CD D=>ABC C=>ABD B=>ACD A=>BCD Pruned Rules Rule Generation for Apriori Algorithm Candidate rule is generated by merging two rules that share the same prefix in the rule consequent CD=>AB BD=>AC join(CD=>AB,BD=>AC) would produce the candidate rule D => ABC D=>ABC Prune rule D=>ABC if its subset AD=>BC does not have high confidence Effect of Support Distribution How to set the appropriate minsup threshold? – If minsup is set too high, we could miss itemsets involving interesting rare items (e.g., expensive products) – If minsup is set too low, it is computationally expensive and the number of itemsets is very large Using a single minimum support threshold may not be effective Working of Apriori Algorithm Min support =50% Database D itemset sup. L1 itemset sup. TID Items C1 {1} 2 {1} 2 100 134 {2} 3 {2} 3 200 235 Scan D {3} 3 {3} 3 300 1235 {4} 1 {5} 3 400 25 {5} 3 C2 itemset sup C2 itemset L2 itemset sup {1 2} 1 {1 2} {1 3} 2 {1 3} 2 Scan D {1 3} {2 3} 2 {1 5} 1 {1 5} {2 3} 2 {2 3} {2 5} 3 {2 5} 3 {2 5} {3 5} 2 {3 5} 2 {3 5} C3 itemset C3 itemset sup. L3 itemset sup Scan D {2 3 5} {2 3 5} 2 {2 3 5} 2 Working of Apriori Algorithm Any subset of a frequent itemset is also a frequent itemset Generate Association rule from the frequent itemset {2,3,5} No. of Association rule = 2n – 2 = 23 – 2 =6 Confidence =70% A→B Sr. No. Association Rule Support count of set Support of A Confidence 1 {2,3} → 5 2 2 = 2/2 = 100% 2 5 → {2,3} 2 3 = 2/3 = 67% 3 {3,5} → 2 2 2 = 2/2 = 100% 4 2 → {3,5} 2 3 = 2/3 = 67% 5 {2,5} → 3 2 3 = 2/3 = 67% 6 3 → {2,5} 2 3 =2/3 = 67% 7 {2,3,5} → {Φ } 8 Φ can’t be associated with anything {Φ } → {2,3,5} {2,3} → 5 and {3,5} → 2 rules are strong as they satisfy min confidence of 70% TID ITEMS 1 Bread ,Milk 2 Bread, Curd, Butter, Eggs 3 Milk, Curd, Butter,Coke 4 Bread,Milk,Curd, Butter 5 Bread,Milk,Curd, Coke min_sup= 2 Generate Association rule from the frequent itemset {b, c, e} No. of Association rule = 2d – 2 = 23 – 2 =6 Confidence =50% Sr. No. Association Rule Support count of set Support of A Confidence 1 2 3 4 5 6 7 8 Example Note – In T500 transaction consider count of “O” is 2. Bottleneck of Frequent-pattern Mining Apriori algorithm is an expensive method to find support since the calculation has to pass through the whole database. Mining long patterns needs many passes of scanning and generates lots of candidates Bottleneck: candidate-generation-and-test Can we avoid candidate generation? 1. Hashing based technique The major aim of the algorithm is to reduce the size of C2. It is therefore essential that the hash table is large enough so that collisions are low. 2. Transaction Reduction (reducing the number of transactions scanned in future iterations): A transaction that does not contain any frequent k-itemsets cannot contain any frequent (k+1)-itemsets. Therefore, such a transaction can be marked or removed from further consideration because subsequent scans of the database for j-itemsets, where j > k, will not require it. 3. Partitioning The set of transactions may be divided into a number of disjoint subsets. Then each partition is searched for frequent itemsets. These frequent itemsets are called local frequent itemsets. 3. Partitioning Phase 1 – Divide n transactions into m partitions – Find the frequent itemsets in each partition – Combine all local frequent itemsets to form candidate itemsets Phase 2 Find global frequent itemsets 3. Partitioning 4. Sampling (mining on a subset of the given data): The basic idea of the sampling approach is to pick a random sample S of the given data D, and then search for frequent itemsets in S instead of D. we are searching for frequent itemsets in S rather than in D, it is possible that we will miss some of the global frequent itemsets. FP Growth FP growth improves Apriori to a big extent Frequent Item set Mining is possible without candidate generation Only “two scan” to the database is needed BUT HOW? Mining Frequent Patterns Without Candidate Generation Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure – highly condensed, but complete for frequent pattern mining – avoid costly database scans Develop an efficient, FP-tree-based frequent pattern mining method – A divide-and-conquer methodology: decompose mining tasks into smaller ones – Avoid candidate generation: sub-database test only! 54 FP Growth Simply a two step procedure – Step 1: Build a compact data structure called the FP-tree Build using 2 passes over the data-set. – Step 2: Extracts frequent item sets directly from the FP-tree Construct FP-tree Two Steps: 1. Scan the transaction DB for the first time, find frequent items (single item patterns) and order them into a list L in frequency descending order. e.g., L={f:4, c:4, a:3, b:3, m:3, p:3} In the format of (item-name, support) 2. For each transaction, order its frequent items according to the order in L; Scan DB the second time, construct FP-tree by putting each frequency ordered transaction onto it. Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} min_support = 3 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Steps: Item frequency head f:4 c:1 1. Scan DB once, find frequent 1- f 4 itemset (single item pattern) c 4 c:3 b:1 b:1 a 3 2. Order frequent items in b 3 a:3 p:1 frequency descending order m 3 p 3 m:2 b:1 3. Scan DB again, construct FP- tree p:2 m:1 57 Benefits of the FP-tree Structure Completeness: – never breaks a long pattern of any transaction – preserves complete information for frequent pattern mining Compactness – reduce irrelevant information—infrequent items are gone – frequency descending ordering: more frequent items are more likely to be shared – never be larger than the original database (if not count node-links and counts) – Example: For Connect-4 DB, compression ratio could be over 100 58 Mining Frequent Patterns Using FP-tree General idea (divide-and-conquer) – Recursively grow frequent pattern path using the FP-tree Method – For each item, construct its conditional pattern-base, and then its conditional FP-tree – Repeat the process on each newly created conditional FP-tree – Until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern) 59 Step 1: From FP-tree to Conditional Pattern Base Starting at the frequent header table in the FP-tree Traverse the FP-tree by following the link of each frequent item Accumulate all of transformed prefix paths of that item to form a conditional pattern base Conditional pattern bases Header Table {} item cond. pattern base Item frequency head f:4 c:1 c f:3 f 4 c 4 c:3 b:1 b:1 a fc:3 a 3 b fca:1, f:1, c:1 b 3 a:3 p:1 m 3 m fca:2, fcab:1 m:2 b:1 p 3 p fcam:2, cb:1 p:2 m:1 60 Step 2: Construct Conditional FP-tree For each pattern-base – Accumulate the count for each item in the base – Construct the FP-tree for the frequent items of the pattern base {} m-conditional pattern Header Table base: Item frequency head fca:2, fcab:1 f:4 c:1 f 4 All frequent patterns c 4 {} concerning m c:3 b:1 b:1 a 3 m,  f:3  b 3 a:3 p:1 fm, cm, am, m 3 fcm, fam, cam, p 3 m:2 b:1 fcam c:3 p:2 m:1 a:3 m-conditional FP-tree 61 Mining Frequent Patterns by Creating Conditional Pattern-Bases Item Conditional pattern-base Conditional FP-tree p {(fcam:2), (cb:1)} {(c:3)}|p m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m b {(fca:1), (f:1), (c:1)} Empty a {(fc:3)} {(f:3, c:3)}|a c {(f:3)} {(f:3)}|c f Empty Empty 62 Step 3: Recursively mine the conditional FP-tree {} {} Cond. pattern base of “am”: (fc:3) f:3 c:3 f:3 am-conditional FP-tree c:3 {} Cond. pattern base of “cm”: (f:3) a:3 f:3 m-conditional FP-tree cm-conditional FP-tree {} Cond. pattern base of “cam”: (f:3) f:3 cam-conditional FP-tree 63 Single FP-tree Path Generation Suppose an FP-tree T has a single path P The complete set of frequent pattern of T can be generated by enumeration of all the combinations of the sub-paths of P {} All frequent patterns concerning m f:3 m,  fm, cm, am, c:3 fcm, fam, cam, a:3 fcam m-conditional FP-tree 64 Why Is Frequent Pattern Growth Fast? Our performance study shows – FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection Reasoning – No candidate generation, no candidate test – Use compact data structure – Eliminate repeated database scan – Basic operation is counting and FP-tree building 65 FP Growth Consider the transaction table minimum support count=2, minimum confidence=70% Step 1. Scan the DB for count of Step 3. Scan the DB for the second time, order each itemset frequent items in each transaction I1 6 I2 7 I3 6 I4 2 I5 2 Step 2. Sort the set of frequent itemset in the order of descresing support count I2 7 I1 6 I3 6 I4 2 I5 2 FP Growth Now build a FP tree of that database Item sets are considered in order of their descending value of support count. For Transaction: I2,I1,I5 null I2: 1 I1: 1 I5: 1 For Transaction: null I2,I4 I2: 2 I1: 1 I4: 1 I5: 1 For Transaction: I2,I3 null I2: 3 I1: I3: I4: 1 1 1 I5: 1 For Transaction: I2,I1,I4 null I2: 4 I1: I3: I4: 2 1 1 I4: I5: 1 1 For Transaction: I1,I3 null I2: 4 I1: 1 I1: I3: I4: 2 1 1 I3: 1 I5: 1 I4: 1 For Transaction: I2,I3 null I2: 5 I1: 1 I1: I3: I4: 2 2 1 I3: 1 I5: 1 I4: 1 For Transaction: null I1,I3 I2 :5 I1: 2 I1: I3: I4: 2 2 1 I3: 2 I5: 1 I4: 1 For Transaction: null I2,I1,I3,I5 I2: 6 I1: 2 I1: I3: I4: 3 2 1 I3: 2 I5: 1 I3: I4: 1 1 I5: 1 For Transaction: null I2,I1,I3 I2: 7 I1: 2 I1: I3: I4: 4 2 1 I3: 2 I5: 1 I3: I4: 2 1 I5: Almost 1 Over! To facilitate tree traversal, an item header table is built so that each item points to its null occurrences in the tree via a chain of node-links. I2 I2 7 I1 :7 I1 6 :2 I3 6 I4 2 I1 I3 I4: I5 2 :4 I3: :2 1 I5: 2 1 I3: I4 2 :1 FP Tree Construction Over!! I5 Now find conditional pattern base :1 and Conditional FP Tree for each item Construct Conditional Pattern Base Starting at the bottom of frequent-item header table in the FP- tree Traverse the FP-tree by following the link of each frequent item Accumulate all of transformed prefix paths of that item to form a conditional pattern base Conditional Pattern Base null I5 {{I2, I1 : 1},{I2, I1, I3 : 1}} I2: 7 I1: 2 I1: I3: I4: 4 2 1 I3: 2 I5: 1 I3: I4: 2 1 Conditional FP Tree I5: for I5: {I2:2, I1:2} 1 Conditional Pattern Base null I4 {{I2, I1 : 1}, {I2 : 1}} I2: 7 I1: 2 I1: I3: I4: 4 2 1 I3: 2 I5: 1 I3: I4: 2 1 I5: Conditional FP Tree for I4: {I2 : 2} 1 Conditional Pattern Base null I3 {{I2, I1 : 2}, {I2 : 2}, {I1 : 2}} I2: 7 I1: 2 I1: I3: I4: 4 2 1 I3: 2 I5: 1 I3: I4: 2 1 I5: Conditional FP Tree for 1 I3: {I2 : 4, I1 : 2}, {I1 : 2} Conditional Pattern Base null I1 {{I2 : 4}} I2: 7 I1: 2 I1: I3: I4: 4 2 1 I3: 2 I5: 1 I3: I4: 2 1 Conditional FP Tree I5: for I1: {I2 : 4} 1 Frequent Patters Generated Item Conditional Pattern Base Conditional Frequent Patterns FP-tree Generated I5 {I2, I1 : 1}, {I2, I1, I3 : 1} {I2:2, I1:2} {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2} I4 {I2, I1 : 1}, {I2 : 1} {I2 : 2} {I2, I4: 2} I3 {I2, I1: 2},{I2 : 2},{I1 :2} {I2 : 4, I1 : 2}, {I2, I3: 4}, {I1, I3: 4}, {I2, {I1 : 2} I1, I3: 2} I1 {I2:4} {I2:4} {I2,I1:4} Example A database has have 5 transactions. Let min sup = 60% and min conf = 80%. (a) Find all frequent itemsets using FPGrowth (b) List all of the strong association rules Example Note – In T500 transaction consider count of “O” is 1. Example- FP-tree construction null TID Items After reading A:1 1 {A,B} TID=1: 2 {B,C,D} 3 {A,C,D,E} B:1 4 {A,D,E} 5 {A,B,C} After reading null 6 {A,B,C,D} TID=2: A:1 B:1 7 {B,C} 8 {A,B,C} B:1 C:1 9 {A,B,D} 10 {B,C,E} D:1 FP-Tree Construction TID Items 1 {A,B} Transaction 2 {B,C,D} Database 3 {A,C,D,E} null 4 {A,D,E} 5 {A,B,C} 6 {A,B,C,D} A:7 B:3 7 {B,C} 8 {A,B,C} 9 {A,B,D} B:5 C:3 10 {B,C,E} C:1 D:1 Header table D:1 Item Pointer C:3 E:1 D:1 E:1 A D:1 B C D:1 E:1 D Pointers are used to assist E frequent itemset generation FP-growth Conditional Pattern base for D: null P = {(A:1,B:1,C:1), (A:1,B:1), A:7 B:1 (A:1,C:1), (A:1), B:5 C:1 (B:1,C:1)} C:1 D:1 C:3 D:1 Recursively apply FP-growth D:1 on P D:1 D:1 Frequent Itemsets found (with sup > 1): AD, BD, CD, ACD, BCD Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support. Rules regarding itemsets at appropriate levels could be quite useful. 90 Multilevel Association Rule Mining Multilevel Association Rule mining is a technique that extends Association Rule mining to discover relationships between items at different levels of granularity. Multilevel Association Rule mining can be classified into two types: multi-dimensional Association Rule and multi-level Association Rule. Multiple-level Association Rules Find relationships between items at different levels of granularity. uniform support reduced support Milk Level 1 [support = 10%] min_sup = 5% Level 1 min_sup = 5% Level 2 2% Milk Skim Milk Level 2 min_sup = 5% [support = 6%] [support = 4%] min_sup = 3% Multi-level Association: Uniform Support vs. Reduced Support Uniform Support: the same minimum support for all levels – One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. – Lower level items do not occur as frequently. If support threshold too high  miss low level associations too low  generate too many high level associations Reduced Support: reduced minimum support at lower levels – There are 4 search strategies: Level-by-level independent Level-cross filtering by k-itemset Level-cross filtering by single item Controlled level-cross filtering by single item 94 Multi-dimensional Association Rules Single-dimensional rules: buys(X, “milk”)  buys(X, “bread”) MD rules:  2 dimensions or predicates – Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”) – hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”) Categorical Attributes: finite number of possible values, no order among values Quantitative Attributes: numeric, implicit order THANK YOU

Use Quizgecko on...
Browser
Browser