Building Model with Unsupervised Techniques PDF
Document Details
Uploaded by Deleted User
Salwa Osama
Tags
Summary
This document is a presentation on building models with unsupervised techniques in machine learning. It covers topics like unsupervised learning definition, clustering methods (K-means, hierarchical, fuzzy), and the Apriori algorithm for association rule learning. The presentation provides high-level explanations and examples.
Full Transcript
Building Model with Unsupervised Techniques ASST. PROF. SALWA OSAMA Agenda Unsupervised Definition Clustering K-Mean Association Rules Unsupervised -Definition Unsupervised Learning is a machine learning technique in which the users do not need to supervise the model. Instead, it allows the model...
Building Model with Unsupervised Techniques ASST. PROF. SALWA OSAMA Agenda Unsupervised Definition Clustering K-Mean Association Rules Unsupervised -Definition Unsupervised Learning is a machine learning technique in which the users do not need to supervise the model. Instead, it allows the model to work on its own to discover patterns and information that was previously undetected. It mainly deals with the unlabeled data. Unsupervised Learning Algorithms uses: ◦ Allow users to perform more complex processing tasks compared to supervised learning. ◦ Analyze and cluster unlabeled datasets. ◦ Discover hidden patterns or data groupings without the need for human intervention. ◦ Discover similarities and differences in information make it the ideal solution for exploratory data analysis, cross-selling strategies, customer segmentation, and image recognition. The unsupervised Learning Tasks Find groups or clusters Reduce dimensionality Association mining Segmentation Anomaly detection Why Unsupervised Learning? When presented with data, an unsupervised machine will search for similarities between the data namely images and separate them into individual groups, attaching its own labels onto each group. Clustering How do I group these documents by topic? How do I group my customers by purchase patterns? Sort items into groups by similarity: ◦ Items in a cluster are more similar to each other than they are to items in other clusters. ◦ Need to detail the properties that characterize “similarity” ◦ Or of distance, the "inverse" of similarity Not a predictive method; finds similarities, relationships Our Example: K-means Clustering Clustering -Types There are different types of clustering you can utilize: Exclusive (partitioning) ◦ In this clustering method, Data are grouped in such a way that one data can belong to one cluster only. ◦ Example: K-means Agglomerative ◦ In this clustering technique, every data is a cluster. The iterative unions between the two nearest clusters reduce the number of clusters. ◦ Example: Hierarchical clustering Clustering -Types There are different types of clustering you can utilize: Overlapping ◦ In this technique, fuzzy sets is used to cluster data. Each point may belong to two or more clusters with separate degrees of membership. ◦ Here, data will be associated with an appropriate membership value. Example: Fuzzy C-Means Probabilistic ◦ This technique uses probability distribution to create the clusters ◦ Example: Following keywords ◦ "man's shoe." ◦ "women's shoe." ◦ "women's glove." ◦ "man's glove." ◦ can be clustered into two categories "shoe" and "glove" or "man" and "women." K-Means Clustering - What is it? Used for clustering numerical data, usually a set of measurements about objects of interest. Input: numerical. There must be a distance metric defined over the variable space. ◦ Euclidian distance Output: The centers of each discovered cluster, and the assignment of each input datum to a cluster. ◦ Centroid Use Cases Often an exploratory technique: ◦ Discover structure in the data ◦ Summarize the properties of each cluster Sometimes a prelude to classification: ◦ "Discovering the classes“ Examples ◦ The height, weight and average lifespan of animals ◦ Household income, yearly purchase amount in dollars, number of household members of customer households Use-Case Example – On-line Retailer LTV – Lifetime Customer Value K-means Clustering K-Means performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to another cluster The term ‘K’ is a number. You need to tell the system how many clusters you need to create. For example, K = 2 refers to two clusters. There is a way of finding out what is the best or optimum value of K for a given data. K-means Clustering - Example The first step in k-means clustering is the allocation of two centroids randomly (as K=2). Two points are assigned as centroids. Note that the points can be anywhere, as they are random points. They are called centroids, but initially, they are not the central point of a given data set. K-means Clustering - Example The next step is to determine the distance between each of the randomly assigned centroids' data points. For every point, the distance is measured from both the centroids, and whichever distance is less, that point is assigned to that centroid. You can see the data points attached to the centroids and represented here in blue and yellow. K-means Clustering - Example The next step is to determine the actual centroid for these two clusters. The original randomly allocated centroid is to be repositioned to the actual centroid of the clusters. K-means Clustering - Example This process of calculating the distance and repositioning the centroid continues until we obtain our final cluster. Then the centroid repositioning stops. Distance Measure K-Means clustering supports various kinds of distance measures, such as: Euclidean distance measure Manhattan distance measure Squared Euclidean distance measure Cosine distance measure Distance Measure Euclidean distance measure The most common case is determining the distance between two points. If we have a point P and point Q, the Euclidean distance is an ordinary straight line. It is the distance between the two points in Euclidean space. Distance Measure Manhattan Distance Measure The Manhattan distance is the simple sum of the horizontal and vertical components. Distance Measure Cosine Distance Measure In this case, we take the angle between the two vectors formed by joining the origin point K-mean Clustering Work K-mean Clustering Work Within-sum-of-squares is used as a measure to find the optimum number of clusters that can be formed for a given data set. Within the sum of squares (WSS) is defined as the sum of the squared distance between each member of the cluster and its centroid. The WSS is measured for each value of K. The value of K, which has the least amount of WSS, is taken as the optimum value. K-mean Clustering Work Now, we draw a curve between WSS and the number of clusters. You can see that there is a very gradual change in the value of WSS as the K value increases from 2. So, you can take the elbow point value as the optimal value of K. It should be either two, three, or at most four. But, beyond that, increasing the number of clusters does not dramatically change the value in WSS, it gets stabilized. K-mean Clustering Work Step 1: The Elbow method is the best way to find the number of clusters. Step 2: Randomly initialize two points called the cluster centroids. K-mean Clustering Work Step 3: The distance of each location from the centroid is measured, and each data point is assigned to the centroid, which is closest to it. K-mean Clustering Work Step 4: Compute the actual centroid of data points for the first group. Step 5: Reposition the random centroid to the actual centroid. K-mean Clustering Work Step 6: Compute the actual centroid of data points for the second group. Step 7: Reposition the random centroid to the actual centroid. K-mean Clustering Work Step 8: Once the cluster becomes static, the k-means algorithm is said to be converged. The final cluster with centroids c1 and c2 K-Means Clustering - Reasons to Choose (+) and Cautions (-) Visualizing K-Means Clustering (naftaliharris.com) Reasons to Choose (+) Cautions (-) Easy to implement Doesn't handle categorical variables Easy to assign new data to existing clusters Sensitive to initialization (first guess) Which is the nearest cluster center? Concise output Variables should all be measured on similar or Coordinates the K cluster centers compatible scales Not scale-invariant! K (the number of clusters) must be known or decided a priori Wrong guess: possibly poor results Tends to produce "round" equi-sized clusters. Not always desirable Apriori Algorithm Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. We apply an iterative approach or level-wise search where k-frequent itemsets are used to find k+1 itemsets. Apriori Property: Apriori assumes that All subsets of a frequent itemset must be frequent(Apriori property). If an itemset is infrequent, all its supersets will be infrequent. Apriori Algorithm Consider the following dataset and we will find frequent itemsets and generate association rules for them. Apriori Algorithm Apriori Algorithm Step-2: K=2 Apriori Algorithm Step-3: Also For K=3 Apriori Algorithm Apriori Algorithm Apriori Algorithm There are six transactions in total with various different purchases that happened in your cafeteria. We can utilize three core measures that are used in Association Rule Learning, which are: Support, Confidence, and Lift. Support Support is just the plain basic probability of an event to occur. It is measured by the proportion of transactions in which an item set appears. To put it in another way, Support(A) is the number of transactions which includes A divided by the total number of transactions. If we analyze the transaction table above, the support for cookie is 3 out of 6. That is, out of a total of 6 transactions, purchases containing cookies have occurred 3 times. or 50%. Support can be implemented onto multiple items at the same time as well. The support for cookie and cake is 2 out of 6. Confidence The confidence of a consequent event given an antecedent event can be described by using conditional probability. Simply put, it is the probability of event A happening given that event B has already happened. This can be used to describe the probability of an item being purchased when another item is already in the basket. It is measured by dividing the proportion of transactions with item X and Y, over the proportion of transactions with Y. From the transactions table above, the confidence of {cookie -> cake} can be formulated below: The conditional probability can also be written as: Lift Lift is the observed to expected ratio (abbreviation o/e). Lift measures how likely an item is purchased when another item is purchased, while controlling for how popular both items are. It can be calculated by dividing the probability of both of the items occurring together by the product of the probabilities of the both individuals items occurring as if there was no association between them. A lift of 1 will then mean that both of the items are actually independent and without any association. For any value higher than 1, lift shows that there is actually an association. The higher the value, the higher the association. Looking at the table again, the lift of {cookies -> cake} is 2,which implies that there is actually an association between cookies and cakes. Now that we have mastered all the core concepts, we can look into an algorithm that is able to generate item sets from transactional data, which is used to calculate these association rules