Unit-04 Unsupervised Machine Learning PDF

Document Details

HottestNash

Uploaded by HottestNash

Noida Institute of Engineering and Technology

2024

Dr. Raju

Tags

unsupervised machine learning artificial intelligence data science machine learning algorithms

Summary

This document is lecture notes for a course on unsupervised machine learning. It covers various clustering algorithms including K-Means, K-Modes, K-Medoids and related concepts. The course material is presented for a B.Tech 3rd-year class.

Full Transcript

Noida Institute of Engineering and Technology, Greater Noida Artificial Intelligence & Machine Learning Unit: 4 Unsupervised Machine Learning Dr. Raju Course Details...

Noida Institute of Engineering and Technology, Greater Noida Artificial Intelligence & Machine Learning Unit: 4 Unsupervised Machine Learning Dr. Raju Course Details Assistant Professor & HoD B-Tech 3rd Sem. ONLINE & Offline (Sec A) Department of CSE(AIML) Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 11/4/2024 1 Faculty Introduction Name : Dr. Raju Qualification: Ph.D Experience: More than 9 years Subject Taught: Neural Network, DBMS, Object Oriented Programming, Computer Graphics, COA, Digital Image Processing, Computer Application Dr. Raju, Assistant Prof. (CSE (AIML)) 11/4/2024 2 UNIT 03 Course Outcomes (CO) Course Outcomes (CO) Bloom’s Knowledge Level (KL) Course outcome: After completion of this course students will be able to: CO 1 Choose and apply the most suitable search algorithm for a given problem to find the goal state. K3 CO2 Comprehend and apply feature engineering and data visualization concepts. K3 CO3 Critically analyze the strengths and weaknesses of various regression and classification algorithms. K5 CO4 Develop approaches that incorporates appropriate clustering algorithms to solve a specific data clustering K3 problem. CO5 Analyze the efficiency using the ensemble learning techniques, probabilistic learning and reinforcement learning K4 algorithms. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 3 Syllabus Lecture Unit Module s Introduction to AI and problem-solving methods Introduction to AI and Intelligent agent, Different Approaches of AI, Problem Solving by searching Techniques: Uninformed search- BFS, DFS, Iterative deepening, Bi directional search, Unit-I Informed search- Iterative deepening, Bi directional search, Heuristic search, Greedy Best First Search, A* search, Local Search Algorithms- Hill Climbing and Simulated Annealing Adversial Search- Game Playing- minimax, alpha-beta pruning, constraint satisfaction problems Machine Learning & Feature Engineering Introduction to Machine Learning, Types of Machine Learning, Feature Engineering: Features and their types, handing missing data, Dealing with Unit-II categorical features, Working with features: Feature Scaling, Feature selection, Feature Extraction: Principal Component Analysis (PCA) algorithm Supervised Learning Regression & Classification: Types of regression (Univariate, Multivariate, Polynomial), Mean Square Error, R square error, Logistic Regression, Unit Regularization: Bias and Variance, Overfitting and Underfitting, L1 and L2 Regularization, Regularized Linear Regression, Decision Trees (ID3, C4.5, III CART), Confusion matrix, k-folds cross-validation, K Nearest Neighbour, Support vector machine. Unsupervised Machine Learning Unit Introduction to clustering, Types of clustering: K-means clustering, K-mode, K-medoid, hierarchical clustering, single-linkage, multiple linkage, AGNES IV and DIANA algorithms, Gaussian mixture models density based clustering, DBSCAN Ensemble & Reinforcement Learning Probabilistic learning: Bayesian Learning, Naive Bayes Classifier, Bayesian belief networks, Ensembles Learning: Random Forest, Gradient Boosting, Unit V XGBoost., Reinforcement Learning: Introduction to reinforcement learning, models of reinforcement learning: Markov decision process, Q-learning. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 4 Course Contents / Syllabus UNIT-III Unsupervised Machine Learning 8 Hours Introduction to clustering, Types of clustering: K-means clustering, K-mode, K- medoid, hierarchical clustering, single-linkage, multiple linkage, AGNES and DIANA algorithms, Gaussian mixture models density based clustering, DBSCAN 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 5 Introduction to clustering The task of grouping data points based on their similarity with each other is called Clustering or Cluster Analysis. It is a type of Unsupervised Learning, which aims at gaining insights from unlabelled data points, Clustering aims at forming groups of homogeneous data points from a heterogeneous dataset. It evaluates the similarity based on a metric like Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group the points with highest similarity score together. It can be defined as "A way of grouping the data points into different clusters, consisting of similar data points. The objects with the possible similarities remain in a group that has less or no similarities with another group." 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 6 Introduction to clustering 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 7 Introduction to clustering 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 8 Types of Clustering Hard Clustering: In this type of clustering, each data point belongs to a cluster completely or not. For example, Let’s say there are 4 data point and we have to cluster them into 2 clusters. So each data point will either belong to cluster 1 or cluster 2. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 9 Types of Clustering Soft Clustering: In this type of clustering, instead of assigning each data point into a separate cluster, a probability or likelihood of that point being that cluster is evaluated. For example, Let’s say there are 4 data point and we have to cluster them into 2 clusters. So we will be evaluating a probability of a data point belonging to both clusters. This probability is calculated for all data points. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 10 Partitioning Clustering It is a type of clustering that divides the data into non-hierarchical groups. It is also known as the centroid-based method. The most common example of partitioning clustering is the K-Means Clustering algorithm. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 11 Partitioning Clustering:K-Mean In this type, the dataset is divided into a set of k groups, where K is used to define the number of pre-defined groups. The cluster center is created in such a way that the distance between the data points of one cluster is minimum as compared to another cluster centroid. K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on. It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 12 Partitioning Clustering:K-Mean It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties. It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the unlabeled dataset on its own without the need for any training. It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 13 Partitioning Clustering:K-Mean The k-means clustering algorithm mainly performs two tasks: Determines the best value for K center points or centroids by an iterative process. Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a cluster. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 14 Working of K-Means Algorithm Step-1: Select the number K to decide the number of clusters. Step-2: Select random K points or centroids. (It can be other from the input dataset). Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters. Step-4: Calculate the variance and place a new centroid of each cluster. Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster. Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. Step-7: The model is ready. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 15 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 16 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 17 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 18 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 19 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 20 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 21 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 22 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 23 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 24 K-Mean Example 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 25 Real-Life Example: Customer Segmentation using K-Means A retail company wants to segment its customers to improve marketing strategies. By grouping customers based on their purchasing behavior, the company can tailor promotions and offers to meet the needs of different segments. Link for Colab https://colab.research.google.com/drive/1A3swLVu2aFuMmyMEhJT_O4u6 FTDgkOqR#scrollTo=dve-TqHWuSaG 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 26 K-Medoids clustering K-Medoids and K-Means are two types of clustering mechanisms in Partition Clustering. Two statisticians, Leonard Kaufman, and Peter J. Rousseeuw came up with this method. K-medoids is an unsupervised method with unlabelled data to be clustered. It is an improvised version of the K-Means algorithm mainly designed to deal with outlier data sensitivity. Compared to other partitioning algorithms, the algorithm is simple, fast, and easy to implement. Medoid: A Medoid is a point in the cluster from which the sum of distances to other data points is minimal. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 27 Type of K-Medoids clustering There are three algorithms for K-Medoids Clustering: PAM (Partitioning Around Clustering) CLARA (Clustering Large Applications) CLARANS (Randomized Clustering Large Applications) 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 28 Key Features of K-Medoids Robustness: K-Medoids is less sensitive to outliers than K-Means because it minimizes the sum of dissimilarities between points and the medoid, rather than the squared Euclidean distance. Distance Metric: It can use various distance metrics, making it suitable for different types of data, including categorical data. No Assumptions about Data Distribution: Like K-Means, K-Medoids does not assume any specific distribution of data points. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 29 K-Medoids Algorithm Initialization: Randomly select k objects as the initial medoids. Assignment: Assign each data point to the nearest medoid based on a chosen distance metric. Update Medoids: For each cluster, find the point that minimizes the sum of distances to all other points in that cluster, and update the medoid. Repeat: Continue the assignment and update steps until the medoids do not change. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 30 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 31 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 32 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 33 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 34 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 35 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 36 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 37 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 38 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 39 Advantages & Disadvantage of K-Medoids Algorithm Advantages of using K-Medoids: Deals with noise and outlier data effectively Easily implementable and simple to understand Faster compared to other partitioning algorithms Disadvantages: Not suitable for Clustering arbitrarily shaped groups of data points. As the initial medoids are chosen randomly, the results might vary based on the choice in different runs. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 40 K-Mode Algorithm K-mode clustering is an unsupervised machine-learning technique used to group a set of data objects into a specified number of clusters, based on their categorical attributes. The algorithm is called “K-Mode” because it uses modes (i.e. the most frequent values) instead of means or medians to represent the clusters. In K-means clustering when we used categorical data after converting it into a numerical form. it doesn’t give a good result for high-dimensional data. So, Some changes are made for categorical data t. Replace Euclidean distance with Dissimilarity metric Replace Mean by Mode for cluster centers. Apply a frequency-based method in each iteration to update the mode. 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 41 K-Medoids Algorithm 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 42 Course Contents / Syllabus UNIT-III Unsupervised Machine Learning 8 Hours Introduction to clustering, Types of clustering: K-means clustering, K-mode, K- medoid, hierarchical clustering, single-linkage, multiple linkage, AGNES and DIANA algorithms, Gaussian mixture models density based clustering, DBSCAN 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 43 11/4/2024 Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03 44

Use Quizgecko on...
Browser
Browser