Study Material ML -Unit 3 PDF
Document Details
Uploaded by GratifyingBixbite7578
Parul University
Tags
Summary
This document, titled 'Study Material ML -Unit 3', is a syllabus for a machine learning course. It covers supervised learning topics like classification and regression, and also touches on unsupervised learning and association rules. The material's layout seems to be meant for university-level students.
Full Transcript
Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY SYL...
Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY SYLLABUS FOR5thSEM B.Sc. (IT)- BCAPROGRAMME Machinelearning using python(05101331) Elective subject Unit 3 Supervised Learning: Classification and Regression 1. Supervised Learning,Introduction 2. Learning steps, 3. Classification Model 4. Regression algorithms 5. Unsupervised Learning: 6. Supervised vs. Unsupervised Learning 7. Association rules Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 1 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY Que Supervised Machine Learning Ans :Supervised learning is the types of machine learning in which machines are trained using well "labelled" training data, and on basis of that data, machines predict the output. The labelled data means some input data is already tagged with the correct output. In supervised learning, the training data provided to the machines work as the supervisor that teaches the machines to predict the output correctly. It applies the same concept as a student learns in the supervision of the teacher. Supervised learning is a process of providing input data as well as correct output data to the machine learning model. The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x) with the output variable(y). In the real-world, supervised learning can be used for Risk Assessment, Image classification, Fraud Detection, spam filtering, etc. Ques :How Supervised Learning Works? In supervised learning, models are trained using labelled dataset, where the model learns about each type of data. Once the training process is completed, the model is tested on the basis of test data (a subset of the training set), and then it predicts the output. The working of Supervised learning can be easily understood by the below example and diagram: Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 2 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and Polygon. Now the first step is that we need to train the model for each shape. o If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square. o If the given shape has three sides, then it will be labelled as a triangle. o If the given shape has six equal sides then it will be labelled as hexagon. Now, after training, we test our model using the test set, and the task of the model is to identify the shape. The machine is already trained on all types of shapes, and when it finds a new shape, it classifies the shape on the bases of a number of sides, and predicts the output. Steps Involved in Supervised Learning: o First Determine the type of training dataset o Collect/Gather the labelled training data. o Split the training dataset into training dataset, test dataset, and validation dataset. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 3 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o Determine the input features of the training dataset, which should have enough knowledge so that the model can accurately predict the output. o Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc. o Execute the algorithm on the training dataset. Sometimes we need validation sets as the control parameters, which are the subset of training datasets. o Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output, which means our model is accurate. Ques Explain Types of supervised Machine learning Algorithms: Supervised learning can be further divided into two types of problems: 1. Regression Regression algorithms are used if there is a relationship between the input variable and the output variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below are some popular Regression algorithms which come under supervised learning: o Linear Regression o Regression Trees Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 4 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o Non-Linear Regression o Bayesian Linear Regression o Polynomial Regression 2. Classification Classification algorithms are used when the output variable is categorical, which means there are two classes such as Yes-No, Male-Female, True-false, etc. Spam Filtering, o Random Forest o Decision Trees o Logistic Regression o Support vector Machines Advantages of Supervised learning: o With the help of supervised learning, the model can predict the output on the basis of prior experiences. o In supervised learning, we can have an exact idea about the classes of objects. o Supervised learning model helps us to solve various real-world problems such as fraud detection, spam filtering, etc. Disadvantages of supervised learning: o Supervised learning models are not suitable for handling the complex tasks. o Supervised learning cannot predict the correct output if the test data is different from the training dataset. o Training required lots of computation times. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 5 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o In supervised learning, we need enough knowledge about the classes of object. Que:What is Unsupervised Machine Learning Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on that data without any supervision. Unsupervised learning cannot be directly applied to a regression or classification problem because unlike supervised learning, we have the input data but no corresponding output data. The goal of unsupervised learning is to find the underlying structure of dataset, group that data according to similarities, and represent that dataset in a compressed format. Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of different types of cats and dogs. Advertisement Ques :Why use Unsupervised Learning? Below are some main reasons which describe the importance of Unsupervised Learning: o Unsupervised learning is helpful for finding useful insights from the data. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 6 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it closer to the real AI. o Unsupervised learning works on unlabeled and uncategorized data which make unsupervised learning more important. o In real-world, we do not always have input data with the corresponding output so to solve such cases, we need unsupervised learning. Working of Unsupervised Learning Working of unsupervised learning can be understood by the below diagram: Here, we have taken an unlabeled input data, which means it is not categorized and corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine learning model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such as k-means clustering, Decision tree, etc. Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to the similarities and difference between the objects. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 7 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY Types of Unsupervised Learning Algorithm: The unsupervised learning algorithm can be further categorized into two types of problems: o Clustering: Clustering is a method of grouping the objects into clusters such that objects with most similarities remains into a group and has less or no similarities with the objects of another group. Cluster analysis finds the commonalities between the data objects and categorizes them as per the presence and absence of those commonalities. o Association: An association rule is an unsupervised learning method which is used for finding the relationships between variables in the large database. It determines the set of items that occurs together in the dataset. Association rule makes marketing strategy more effective. Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket Analysis. Unsupervised Learning algorithms: Below is the list of some popular unsupervised learning algorithms: o K-means clustering Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 8 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o KNN (k-nearest neighbors) o Hierarchal clustering o Anomaly detection o Neural Networks o Principle Component Analysis o Independent Component Analysis o Apriori algorithm o Singular value decomposition Advantages of Unsupervised Learning o Unsupervised learning is used for more complex tasks as compared to supervised learning because, in unsupervised learning, we don't have labeled input data. o Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled data. Disadvantages of Unsupervised Learning o Unsupervised learning is intrinsically more difficult than supervised learning as it does not have corresponding output. o The result of the unsupervised learning algorithm might be less accurate as input data is not labeled, and algorithms do not know the exact output in advance. Ques :Give Difference between Supervised Learning VS unsupervised Learning Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 9 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY Supervised Learning Unsupervised Learning Supervised learning algorithms are trained Unsupervised learning algorithms are using labeled data. trained using unlabeled data. Supervised learning model takes direct Unsupervised learning model does not feedback to check if it is predicting correct take any feedback. output or not. Supervised learning model predicts the Unsupervised learning model finds the output. hidden patterns in data. In supervised learning, input data is In unsupervised learning, only input data provided to the model along with the is provided to the model. output. The goal of supervised learning is to train The goal of unsupervised learning is to the model so that it can predict the output find the hidden patterns and useful when it is given new data. insights from the unknown dataset. Supervised learning needs supervision to Unsupervised learning does not need any train the model. supervision to train the model. Supervised learning can be categorized Unsupervised Learning can be classified in Classification and Regression problems. in Clustering and Associations problems. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 10 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY Supervised learning can be used for those Unsupervised learning can be used for cases where we know the input as well as those cases where we have only input data corresponding outputs. and no corresponding output data. Supervised learning model produces an Unsupervised learning model may give accurate result. less accurate result as compared to supervised learning. Supervised learning is not close to true Unsupervised learning is more close to Artificial intelligence as in this, we first the true Artificial Intelligence as it learns train the model for each data, and then only similarly as a child learns daily routine it can predict the correct output. things by his experiences. It includes various algorithms such as It includes various algorithms such as Linear Regression, Logistic Regression, Clustering, KNN, and Apriori algorithm. Support Vector Machine, Multi-class Classification, Decision tree, Bayesian Logic, etc. Ques : Explain Regression Analysis in Machine learning Regression analysis is a statistical method to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variables. Regression is a supervised learning technique which helps in finding the correlation between variables and enables us to predict the continuous output variable based on the one or more predictor variables. It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship between variables. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 11 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY. It predicts continuous/real values such as temperature, age, salary, price, etc. In Regression, we plot a graph between the variables which best fits the given datapoints, using this plot, the machine learning model can make predictions about the data. In simple words, "Regression shows a line or curve that passes through all the datapoints on target-predictor graph in such a way that the vertical distance between the datapoints and the regression line is minimum." The distance between datapoints and line tells whether a model has captured a strong relationship or not. Some examples of regression can be as: o Prediction of rain using temperature and other factors o Determining Market trends o Prediction of road accidents due to rash driving. Terminologies Related to the Regression Analysis: o Dependent Variable: The main factor in Regression analysis which we want to predict or understand is called the dependent variable. It is also called target variable. o Independent Variable: The factors which affect the dependent variables or which are used to predict the values of the dependent variables are called independent variable, also called as a predictor. o Outliers: Outlier is an observation which contains either very low value or very high value in comparison to other observed values. An outlier may hamper the result, so it should be avoided. o Multicollinearity: If the independent variables are highly correlated with each other than other variables, then such condition is called Multicollinearity. It should not be present in the dataset, because it creates problem while ranking the most affecting variable. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 12 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o Underfitting and Overfitting: If our algorithm works well with the training dataset but not well with test dataset, then such problem is called Overfitting. And if our algorithm does not perform well even with training dataset, then such problem is called underfitting. Why do we use Regression Analysis? As mentioned above, Regression analysis helps in the prediction of a continuous variable. There are various scenarios in the real world where we need some future predictions such as weather condition, sales prediction, marketing trends, etc., for such case we need some technology which can make predictions more accurately. So for such case we need Regression analysis which is a statistical method and used in machine learning and data science. Below are some other reasons for using Regression analysis: o Regression estimates the relationship between the target and the independent variable. o It is used to find the trends in data. o It helps to predict real/continuous values. o By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 13 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY Types of Regression Linear Regression: o Linear regression is a statistical regression method which is used for predictive analysis. o It is one of the very simple and easy algorithms which works on regression and shows the relationship between the continuous variables. o It is used for solving the regression problem in machine learning. o Linear regression shows the linear relationship between the independent variable (X- axis) and the dependent variable (Y-axis), hence called linear regression. o If there is only one input variable (x), then such linear regression is called simple linear regression. And if there is more than one input variable, then such linear regression is called multiple linear regression. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 14 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o The relationship between variables in the linear regression model can be explained using the below image. Here we are predicting the salary of an employee on the basis of the year of experience. o Below is the mathematical equation for Linear regression: 1. Y= aX+b Here, Y = dependent variables (target variables), X= Independent variables (predictor variables), a and b are the linear coefficients Some popular applications of linear regression are: o Analyzing trends and sales estimates o Salary forecasting o Real estate prediction Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 15 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o Arriving at ETAs in traffic. Logistic Regression: o Logistic regression is another supervised learning algorithm which is used to solve the classification problems. In classification problems, we have dependent variables in a binary or discrete format such as 0 or 1. o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False, Spam or not spam, etc. o It is a predictive analysis algorithm which works on the concept of probability. o Logistic regression is a type of regression, but it is different from the linear regression algorithm in the term how they are used. o Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function is used to model the data in logistic regression. The function can be represented as: o f(x)= Output between the 0 and 1 value. o x= input to the function o e= base of natural logarithm. When we provide the input values (data) to the function, it gives the S-curve as follows: Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 16 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o It uses the concept of threshold levels, values above the threshold level are rounded up to 1, and values below the threshold level are rounded up to 0. There are three types of logistic regression: o Binary(0/1, pass/fail) o Multi(cats, dogs, lions) o Ordinal(low, medium, high) Ques :What is the Classification Algorithm? The Classification algorithm is a Supervised Learning technique that is used to identify the category of new observations on the basis of training data. In Classification, a program learns from the given dataset or observations and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or categories. Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 17 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY learning technique, hence it takes labeled input data, which means it contains input with the corresponding output. In classification algorithm, a discrete output function(y) is mapped to input variable(x). Advertisement 1. y=f(x), where y = categorical output The best example of an ML classification algorithm is Email Spam Detector. The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms are mainly used to predict the output for the categorical data. Classification algorithms can be better understood using the below diagram. In the below diagram, there are two classes, class A and Class B. These classes have features that are similar to each other and dissimilar to other classes. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 18 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY The algorithm which implements the classification on a dataset is known as a classifier. There are two types of Classifications: o Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary Classifier. Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc. o Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class Classifier. Example: Classifications of types of crops, Classification of types of music. Learners in Classification Problems: In the classification problems, there are two types of learners: 1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the test dataset. In Lazy learner case, classification is done on the basis of the most related data stored in the training dataset. It takes less time in training but more time for predictions. Example: K-NN algorithm, Case-based reasoning 2. Eager Learners:Eager Learners develop a classification model based on a training dataset before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes more time in learning, and less time in prediction. Example: Decision Trees, Naïve Bayes, ANN. Types of ML Classification Algorithms: Classification Algorithms can be further divided into the Mainly two category: o Linear Models Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 19 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o Logistic Regression o Support Vector Machines o Non-linear Models o K-Nearest Neighbours o Kernel SVM o Naïve Bayes o Decision Tree Classification o Random Forest Classification Ques : Logistic Regression in Machine Learning o Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables. o Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1. o Logistic Regression is much similar to the Linear Regression except that how they are used. Linear Regression is used for solving Regression problems, whereas Logistic regression is used for solving the classification problems. o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1). o The curve from the logistic function indicates the likelihood of something such as whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 20 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY o Logistic Regression is a significant machine learning algorithm because it has the ability to provide probabilities and classify new data using continuous and discrete datasets. o Logistic Regression can be used to classify the observations using different types of data and can easily determine the most effective variables used for the classification. The below image is showing the logistic function: Logistic Function (Sigmoid Function): o The sigmoid function is a mathematical function used to map the predicted values to probabilities. o It maps any real value into another value within a range of 0 and 1. o The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic function. o In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 21 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY Assumptions for Logistic Regression: o The dependent variable must be categorical in nature. o The independent variable should not have multi-collinearity. Type of Logistic Regression: On the basis of the categories, Logistic Regression can be classified into three types: o Binomial: In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc. o Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as "cat", "dogs", or "sheep" o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as "low", "Medium", or "High". Ques :Linear Regression vs Logistic Regression Linear Regression and Logistic Regression are the two famous Machine Learning Algorithms which come under supervised learning technique. Since both the algorithms are of supervised in nature hence these algorithms use labeled dataset to make the predictions. But the main difference between them is how they are being used. The Linear Regression is used for solving Regression problems whereas Logistic Regression is used for solving the Classification problems. The description of both the algorithms is given below along with difference table. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 22 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY Linear Regression Logistic Regression Linear regression is used to predict the Logistic Regression is used to predict the continuous dependent variable using a given categorical dependent variable using a given set of independent variables. set of independent variables. Linear Regression is used for solving Logistic regression is used for solving Regression problem. Classification problems. In Linear regression, we predict the value of In logistic Regression, we predict the values continuous variables. of categorical variables. In linear regression, we find the best fit line, In Logistic Regression, we find the S-curve by which we can easily predict the output. by which we can classify the samples. Least square estimation method is used for Maximum likelihood estimation method is estimation of accuracy. used for estimation of accuracy. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 23 Parul Institute of Computer Application Faculty Of IT and Computer Science PARULUNIVERSITY The output for Linear Regression must be a The output of Logistic Regression must be a continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No, etc. In Linear regression, it is required that In Logistic regression, it is not required to relationship between dependent variable and have the linear relationship between the independent variable must be linear. dependent and independent variable. In linear regression, there may be collinearity In logistic regression, there should not be between the independent variables. collinearity between the independent variable. Machine learning using python (05101331) Prof Nirmit shah , prof Jigar Bhavsar 24