Unit 4 Introduction to Algorithm.pdf

Full Transcript

Introduction to Algorithm: - Machine learning algorithms are the heart of artificial intelligence systems, enabling them to learn from data and make predictions or decisions without being explicitly programmed. In essence, a machine learning algorithm is a set of rules or processes that allow an AI...

Introduction to Algorithm: - Machine learning algorithms are the heart of artificial intelligence systems, enabling them to learn from data and make predictions or decisions without being explicitly programmed. In essence, a machine learning algorithm is a set of rules or processes that allow an AI system to conduct tasks, such as pattern recognition, classification, and prediction. Types of Machine Learning Algorithms: - There are three main types of machine learning algorithms: supervised, unsupervised, and reinforcement learning algorithms. Supervised Learning Algorithms: - Supervised learning algorithms are trained on labeled data, where the correct output is already known. The algorithm learns to map inputs to outputs based on the labeled data, enabling it to make predictions on new, unseen data. Examples of supervised learning algorithms include: Linear Regression Logistic Regression Decision Tree Random Forest Classification Algorithm Unsupervised Learning Algorithms: - Unsupervised learning algorithms are used for unstructured data to find common characteristics and distinct patterns in the dataset. These algorithms do not require labeled data and are often used for clustering, dimensionality reduction, and anomaly detection. Examples of unsupervised learning algorithms include: K-Means Clustering Hierarchical Clustering Principal Component Analysis (PCA) Reinforcement Learning Algorithms: - Reinforcement learning algorithms are trained on data that is generated through interactions with an environment. The algorithm learns to make decisions by trial and error, receiving rewards or penalties for its actions. Examples of reinforcement learning algorithms include: Q-Learning Deep Q-Networks (DQN) How Machine Learning Algorithms Work Machine learning algorithms work by analyzing data, identifying patterns, and making predictions or decisions based on those patterns. The process typically involves the following steps: 1. Data Preparation: Collecting, cleaning, and preprocessing the data to prepare it for analysis. 2. Model Selection: Choosing the appropriate machine learning algorithm for the task at hand. 3. Training: Training the algorithm on the prepared data to enable it to learn from the data. 4. Evaluation: Evaluating the performance of the algorithm on a test dataset to ensure it is making accurate predictions or decisions. 5. Deployment: Deploying the trained algorithm in a production environment to make predictions or decisions on new data. Data Preparation Model Deployment Selection Evaluation Training Benefits of Machine Learning Algorithms Improved Accuracy: Machine learning algorithms can analyze large datasets and identify patterns that may not be apparent to humans, leading to more accurate predictions and decisions. Increased Efficiency: Machine learning algorithms can automate tasks, freeing up time for more strategic activities. Enhanced Customer Experience: Machine learning algorithms can be used to personalize customer experiences, leading to increased customer satisfaction and loyalty. Classification of Algorithm 1. What is the Classification Algorithm? The Classification algorithm is a Supervised Learning technique that is used to identify the category of new observations on the basis of training data. In Classification, a program learns from the given dataset or observations and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or categories. Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised learning technique, hence it takes labeled input data, which means it contains input with the corresponding output. ** The best example of an ML classification algorithm is Email Spam Detector. The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms are mainly used to predict the output for the categorical data. Classification algorithms can be better understood using the below diagram. In the below diagram, there are two classes, class A and Class B. These classes have features that are similar to each other and dissimilar to other classes. The algorithm which implements the classification on a dataset is known as a classifier. There are two types of Classifications: Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary Classifier. Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc. Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class Classifier. Example: Classifications of types of crops, Classification of types of music. 2. Types of ML Classification Algorithms: 1. Logistic Regression: Logistic Regression is a linear model that predicts the probability of an instance belonging to one of two classes. It's a popular choice for binary classification problems and is often used as a baseline model. 2. Decision Tree: A Decision Tree is a tree-like model that splits data into subsets based on features. It's a simple, intuitive model that can handle both binary and multi-class classification problems. 3. Random Forest: A Random Forest is an ensemble learning method that combines multiple Decision Trees. It's a powerful model that can handle high-dimensional data and is often used for multi-class classification problems. 4. Support Vector Machine (SVM): An SVM is a linear or non-linear model that finds the hyperplane that maximally separates two classes. It's a popular choice for binary classification problems and can handle high-dimensional data. 5. Naive Bayes: Naive Bayes is a family of probabilistic models based on Bayes' theorem. It's a simple, efficient model that can handle multi-class classification problems and is often used for text classification. 6. K-Nearest Neighbors (KNN): KNN is a simple, non-parametric model that classifies instances based on the majority vote of their neighbors. It's a popular choice for multi-class classification problems and can handle non-linear relationships. What is clustering? The task of grouping data points based on their similarity with each other is called Clustering or Cluster Analysis. This method is defined under the branch of Unsupervised Learning, which aims at gaining insights from unlabeled data points, that is, unlike supervised learning we don’t have a target variable. Clustering aims at forming groups of homogeneous data points from a heterogeneous dataset. It evaluates the similarity based on a metric like Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group the points with highest similarity score together. For Example, In the graph given below, we can clearly see that there are 3 circular clusters forming on the basis of distance. Types of Clustering here are 2 types of clustering that can be performed to group similar data points: Hard Clustering: In this type of clustering, each data point belongs to a cluster completely or not. For example, Let’s say there are 4 data point and we have to cluster them into 2 clusters. So each data point will either belong to cluster 1 or cluster 2. Data Points Clusters Data Points: A, B, C, D Clusters: Cluster 1, Cluster 2 A C1 A -> Cluster 1 B C2 B -> Cluster 2 C -> Cluster 1 C C2 D -> Cluster 2 D C1 Soft Clustering: In this type of clustering, instead of assigning each data point into a separate cluster, a probability or likelihood of that point being that cluster is evaluated. For example, Let’s say there are 4 data point and we have to cluster them into 2 clusters. So we will be evaluating a probability of a data point belonging to both clusters. This probability is calculated for all data points. Data Points: A, B, C, D Data Points Probability of C1 Probability of C2 A 0.91 0.09 Clusters: Cluster 1, Cluster 2 B 0.3 0.7 C 0.17 0.83 D 1 0 A -> Cluster 1 (0.8), Cluster 2 (0.2) B -> Cluster 1 (0.3), Cluster 2 (0.7) C -> Cluster 1 (0.9), Cluster 2 (0.1) D -> Cluster 1 (0.4), Cluster 2 (0.6) Logistic Regression in Machine Learning Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables. Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1. Logistic Regression is much similar to the Linear Regression except that how they are used. Linear Regression is used for solving Regression problems, whereas Logistic regression is used for solving the classification problems. In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1). The curve from the logistic function indicates the likelihood of something such as whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc. Logistic Regression is a significant machine learning algorithm because it has the ability to provide probabilities and classify new data using continuous and discrete datasets. Type of Logistic Regression: Binomial: In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as "cat", "dogs", or "sheep" Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as "low", "Medium", or "High".

Use Quizgecko on...
Browser
Browser