Machine Learning Unit-1 PDF

MACHINE LEARNING: UNIT I: INTRODUCTION Types of Machine Learning: Machine learning is the branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data and improve from previous experience without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data. In this article, we will explore the various types of machine learning algorithms that are important for future requirements. Machine learning is generally a training system to learn from past experiences and improve performance over time. Machine learning helps to predict massive amounts of data. It helps to deliver fast and accurate results to get profitable opportunities. Types of Machine Learning There are several types of machine learning, each with special characteristics and applications. Some of the main types of machine learning algorithms are as follows: 1. Supervised Machine Learning 2. Unsupervised Machine Learning 3. Semi-Supervised Machine Learning 4. Reinforcement Learning 1. Supervised Machine Learning Supervised learning is defined as when a model gets trained on a “Labelled Dataset”. Labelled datasets have both input and output parameters. In Supervised Learning algorithms learn to map points between inputs and correct outputs. It has both training and validation datasets labelled. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Let’s understand it with the help of an example. Example: Consider a scenario where you have to build an image classifier to differentiate between cats and dogs. If you feed the datasets of dogs and cats labelled images to the algorithm, the machine will learn to classify between a dog or a cat from these labelled images. When we input new dog or cat images that it has never seen before, it will use the learned algorithms and predict whether it is a dog or a cat. This is how supervised learning works, and this is particularly an image classification. There are two main categories of supervised learning that are mentioned below: Classification Regression Classification Classification deals with predicting categorical target variables, which represent discrete classes or labels. For instance, classifying emails as spam or not spam, or predicting whether a patient has a high risk of heart disease. Classification algorithms learn to map the input features to one of the predefined classes. Here are some classification algorithms: Logistic Regression Support Vector Machine Random Forest Decision Tree Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION K-Nearest Neighbors (KNN) Naive Bayes Regression Regression, on the other hand, deals with predicting continuous target variables, which represent numerical values. For example, predicting the price of a house based on its size, location, and amenities, or forecasting the sales of a product. Regression algorithms learn to map the input features to a continuous numerical value. Here are some regression algorithms: Linear Regression Polynomial Regression Ridge Regression Lasso Regression Decision tree Random Forest Advantages of Supervised Machine Learning Supervised Learning models can have high accuracy as they are trained on labelled data. The process of decision-making in supervised learning models is often interpretable. It can often be used in pre-trained models which saves time and resources when developing new models from scratch. Disadvantages of Supervised Machine Learning It has limitations in knowing patterns and may struggle with unseen or unexpected patterns that are not present in the training data. It can be time-consuming and costly as it relies on labelled data only. It may lead to poor generalizations based on new data. Applications of Supervised Learning Supervised learning is used in a wide variety of applications, including: Image classification: Identify objects, faces, and other features in images. Natural language processing: Extract information from text, such as sentiment, entities, and relationships. Speech recognition: Convert spoken language into text. Recommendation systems: Make personalized recommendations to users. Predictive analytics: Predict outcomes, such as sales, customer churn, and stock prices. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Medical diagnosis: Detect diseases and other medical conditions. Fraud detection: Identify fraudulent transactions. Autonomous vehicles: Recognize and respond to objects in the environment. Email spam detection: Classify emails as spam or not spam. Quality control in manufacturing: Inspect products for defects. Credit scoring: Assess the risk of a borrower defaulting on a loan. Gaming: Recognize characters, analyze player behavior, and create NPCs. Customer support: Automate customer support tasks. Weather forecasting: Make predictions for temperature, precipitation, and other meteorological parameters. Sports analytics: Analyze player performance, make game predictions, and optimize strategies. 2. Unsupervised Machine Learning Unsupervised learning is a type of machine learning technique in which an algorithm discovers patterns and relationships using unlabelled data. Unlike supervised learning, unsupervised learning doesn’t involve providing the algorithm with labelled target outputs. The primary goal of Unsupervised learning is often to discover hidden patterns, similarities, or clusters within the data, which can then be used for various purposes, such as data exploration, visualization, dimensionality reduction, and more. Let’s understand it with the help of an example. Example: Consider that you have a dataset that contains information about the purchases you made from the shop. Through clustering, the algorithm can group the same purchasing behaviour among you and other customers, which reveals potential customers without predefined labels. This type of information can help businesses get target customers as well as identify outliers. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION There are two main categories of unsupervised learning that are mentioned below: Clustering Association Clustering: Clustering is the process of grouping data points into clusters based on their similarity. This technique is useful for identifying patterns and relationships in data without the need for labelled examples. Here are some clustering algorithms: K-Means Clustering algorithm Mean-shift algorithm DBSCAN Algorithm Principal Component Analysis Independent Component Analysis Association Association rule learning is a technique for discovering relationships between items in a dataset. It identifies rules that indicate the presence of one item implies the presence of another item with a specific probability. Here are some association rule learning algorithms: Apriori Algorithm Eclat FP-growth Algorithm Advantages of Unsupervised Machine Learning It helps to discover hidden patterns and various relationships between the data. Used for tasks such as customer segmentation, anomaly detection, and data exploration. It does not require labeled data and reduces the effort of data labeling. Disadvantages of Unsupervised Machine Learning Without using labels, it may be difficult to predict the quality of the model’s output. Cluster Interpretability may not be clear and may not have meaningful interpretations. It has techniques such as autoencoders and dimensionality reduction that can be used to extract meaningful features from raw data. Applications of Unsupervised Learning Here are some common applications of unsupervised learning: Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Clustering: Group similar data points into clusters. Anomaly detection: Identify outliers or anomalies in data. Dimensionality reduction: Reduce the dimensionality of data while preserving its essential information. Recommendation systems: Suggest products, movies, or content to users based on their historical behavior or preferences. Topic modeling: Discover latent topics within a collection of documents. Density estimation: Estimate the probability density function of data. Image and video compression: Reduce the amount of storage required for multimedia content. Data preprocessing: Help with data preprocessing tasks such as data cleaning, imputation of missing values, and data scaling. Market basket analysis: Discover associations between products. Genomic data analysis: Identify patterns or group genes with similar expression profiles. Image segmentation: Segment images into meaningful regions. Community detection in social networks: Identify communities or groups of individuals with similar interests or connections. Customer behavior analysis: Uncover patterns and insights for better marketing and product recommendations. Content recommendation: Classify and tag content to make it easier to recommend similar items to users. Exploratory data analysis (EDA): Explore data and gain insights before defining specific tasks. 3. Semi-Supervised Learning Semi-Supervised learning is a machine learning algorithm that works between the supervised and unsupervised learning so it uses both labelled and unlabelled data. It’s particularly useful when obtaining labelled data is costly, time-consuming, or resource-intensive. This approach is useful when the dataset is expensive and time-consuming. Semi-supervised learning is chosen when labelled data requires skills and relevant resources in order to train or learn from it. We use these techniques when we are dealing with data that is a little bit labelled and the rest large portion of it is unlabelled. We can use the unsupervised techniques to predict labels and then feed these labels to supervised techniques. This technique is mostly applicable in the case of image data sets where usually all images are not labelled. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Let’s understand it with the help of an example. Example: Consider that we are building a language translation model, having labelled translations for every sentence pair can be resources intensive. It allows the models to learn from labelled and unlabelled sentence pairs, making them more accurate. This technique has led to significant improvements in the quality of machine translation services. Types of Semi-Supervised Learning Methods There are a number of different semi-supervised learning methods each with its own characteristics. Some of the most common ones include: Graph-based semi-supervised learning: This approach uses a graph to represent the relationships between the data points. The graph is then used to propagate labels from the labeled data points to the unlabeled data points. Label propagation: This approach iteratively propagates labels from the labeled data points to the unlabeled data points, based on the similarities between the data points. Co-training: This approach trains two different machine learning models on different subsets of the unlabeled data. The two models are then used to label each other’s predictions. Self-training: This approach trains a machine learning model on the labeled data and then uses the model to predict labels for the unlabeled data. The model is then retrained on the labeled data and the predicted labels for the unlabeled data. Generative adversarial networks (GANs): GANs are a type of deep learning algorithm that can be used to generate synthetic data. GANs can be used to generate unlabeled data for semi-supervised learning by training two neural networks, a generator and a discriminator. Advantages of Semi- Supervised Machine Learning It leads to better generalization as compared to supervised learning, as it takes both labeled and unlabeled data. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Can be applied to a wide range of data. Disadvantages of Semi- Supervised Machine Learning Semi-supervised methods can be more complex to implement compared to other approaches. It still requires some labeled data that might not always be available or easy to obtain. The unlabeled data can impact the model performance accordingly. Applications of Semi-Supervised Learning Here are some common applications of semi-supervised learning: Image Classification and Object Recognition: Improve the accuracy of models by combining a small set of labeled images with a larger set of unlabeled images. Natural Language Processing (NLP): Enhance the performance of language models and classifiers by combining a small set of labeled text data with a vast amount of unlabeled text. Speech Recognition: Improve the accuracy of speech recognition by leveraging a limited amount of transcribed speech data and a more extensive set of unlabeled audio. Recommendation Systems: Improve the accuracy of personalized recommendations by supplementing a sparse set of user-item interactions (labeled data) with a wealth of unlabeled user behavior data. Healthcare and Medical Imaging: Enhance medical image analysis by utilizing a small set of labeled medical images alongside a larger set of unlabeled images. 4. Reinforcement Machine Learning Reinforcement machine learning algorithm is a learning method that interacts with the environment by producing actions and discovering errors. Trial, error, and delay are the most relevant characteristics of reinforcement learning. In this technique, the model keeps on increasing its performance using Reward Feedback to learn the behavior or pattern. These algorithms are specific to a particular problem e.g. Google Self Driving car, AlphaGo where a bot competes with humans and even itself to get better and better performers in Go Game. Each time we feed in data, they learn and add the data to their knowledge which is training data. So, the more it learns the better it gets trained and hence experienced. Here are some of most common reinforcement learning algorithms: Q-learning: Q-learning is a model-free RL algorithm that learns a Q-function, which maps states to actions. The Q-function estimates the expected reward of taking a particular action in a given state. SARSA (State-Action-Reward-State-Action): SARSA is another model-free RL algorithm that learns a Q-function. However, unlike Q-learning, SARSA updates the Q- function for the action that was actually taken, rather than the optimal action. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Deep Q-learning: Deep Q-learning is a combination of Q-learning and deep learning. Deep Q-learning uses a neural network to represent the Q-function, which allows it to learn complex relationships between states and actions. Let’s understand it with the help of examples. Example: Consider that you are training an AI agent to play a game like chess. The agent explores different moves and receives positive or negative feedback based on the outcome. Reinforcement Learning also finds applications in which they learn to perform tasks by interacting with their surroundings. Types of Reinforcement Machine Learning There are two main types of reinforcement learning: Positive reinforcement Rewards the agent for taking a desired action. Encourages the agent to repeat the behavior. Examples: Giving a treat to a dog for sitting, providing a point in a game for a correct answer. Negative reinforcement Removes an undesirable stimulus to encourage a desired behavior. Discourages the agent from repeating the behavior. Examples: Turning off a loud buzzer when a lever is pressed, avoiding a penalty by completing a task. Advantages of Reinforcement Machine Learning It has autonomous decision-making that is well-suited for tasks and that can learn to make a sequence of decisions, like robotics and game-playing. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION This technique is preferred to achieve long-term results that are very difficult to achieve. It is used to solve a complex problems that cannot be solved by conventional techniques. Disadvantages of Reinforcement Machine Learning Training Reinforcement Learning agents can be computationally expensive and time- consuming. Reinforcement learning is not preferable to solving simple problems. It needs a lot of data and a lot of computation, which makes it impractical and costly. Applications of Reinforcement Machine Learning Here are some applications of reinforcement learning: Game Playing: RL can teach agents to play games, even complex ones. Robotics: RL can teach robots to perform tasks autonomously. Autonomous Vehicles: RL can help self-driving cars navigate and make decisions. Recommendation Systems: RL can enhance recommendation algorithms by learning user preferences. Healthcare: RL can be used to optimize treatment plans and drug discovery. Natural Language Processing (NLP): RL can be used in dialogue systems and chatbots. Finance and Trading: RL can be used for algorithmic trading. Supply Chain and Inventory Management: RL can be used to optimize supply chain operations. Energy Management: RL can be used to optimize energy consumption. Game AI: RL can be used to create more intelligent and adaptive NPCs in video games. Adaptive Personal Assistants: RL can be used to improve personal assistants. Virtual Reality (VR) and Augmented Reality (AR): RL can be used to create immersive and interactive experiences. Industrial Control: RL can be used to optimize industrial processes. Education: RL can be used to create adaptive learning systems. Agriculture: RL can be used to optimize agricultural operations. Input and Output Variables in Machine Learning In machine learning, input variables (also called features or predictors) are the characteristics or attributes of the data that are used to make predictions or take actions. The output variable (also called the target or label) is the desired outcome or prediction that the model is trained to produce. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Input variables can be of different types, such as numerical (e.g. age, income), categorical (e.g. color, gender), or text data (e.g. reviews, tweets). These variables are used as inputs to the model, and the model uses them to make predictions about the output variable. Output variables can also be of different types, such as numerical (e.g. price, temperature), categorical (e.g. class labels, sentiment), or text data (e.g. summarization, translation). The goal of the model is to learn the mapping between the input variables and the output variable, so that it can make accurate predictions on new, unseen data. In unsupervised learning, the model is not provided with labeled output variables, and the goal is to find patterns or structure in the input variables, such as clustering or dimensionality reduction. In addition to the types of variables, it's also important to consider the format of the input and output data. For example, the input data may need to be transformed into a specific format that is compatible with the model, such as converting text data into numerical representations using techniques like bag-of-words or word embeddings. Similarly, the output data may need to be transformed into a specific format that is suitable for the task at hand, such as converting a continuous numerical value into a categorical label (e.g. "high", "medium", "low"). Another important aspect of input and output variables is feature engineering, which is the process of creating new variables or transforming existing ones to improve the performance of the model. For example, one could create new variables by combining existing ones, or by applying mathematical operations such as logarithms or polynomials. Feature engineering can be a powerful tool to improve the performance of a model, but it also requires a good understanding of the data and the problem. Another important aspect is to consider the correlation between input variables, as high correlation among input variables may cause multicollinearity, which is when two or more predictor variables in a multiple regression are highly correlated, meaning that the unique information added by each variable is small. In such cases, it's better to drop one of the correlated variables. Example of a supervised learning problem using input and output variables An example of a supervised learning problem using input and output variables could be a model that predicts the price of a used car based on its characteristics. The input variables (or features) could include: Year of manufacturing Car brand Car model Engine size Number of miles on the odometer Number of previous owners The output variable (or label) could be the price of the car. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION To train the model, we would need a dataset of used cars with the corresponding values for each of the input variables and the output variable. The model would then learn the relationship between the input variables and the output variable, so that it can make predictions on new, unseen data. Here is an example of how input and output variables could be used in a supervised learning problem using Python and the scikit-learn library: from sklearn.linear_model import Linear Regression from sklearn.model_selection import train_test_split # Input data X = [[2014, 'Toyota', 'Camry', 2.5, 75000, 1], [2016, 'Ford', 'Fusion', 3.5, 65000, 2], [2018, 'Tesla', 'Model S', 100, 5000, 1], [2015, 'BMW', '3 Series', 2.0, 80000, 1]] # Output data y = [18000, 22000, 75000, 16000] # Split data into training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) In this example, the input variables (or features) are the year of manufacturing, car brand, car model, engine size, number of miles on the odometer and number of previous owners, which are stored in the variable X. The output variable (or label) is the price of the car, which is stored in the variable y. We use the train_test_split function to split the data into a training set and a test set, with 80% of the data being used for training and 20% for testing. The Linear Regression model is trained using the training set and then it makes predictions on the test set using the predict method. Finally, the model's performance is evaluated by comparing the predicted values with the true values using the score method, which returns the coefficient of determination R^2 of the prediction. Please note that this is just an example, and in real-world cases, the data may need to be preprocessed, transformed, and feature engineered to work well with the model, and also it's important to consider the correlation between input variables, as high correlation among input variables may cause multicollinearity. # Create and train the model model = LinearRegression() model.fit(X_train, y_train) # Make predictions on the test set Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION y_pred = model.predict(X_test) # Evaluate the model's performance score = model.score(X_test, y_test) print('Accuracy:', score) Training regimes : Train and Test datasets in Machine Learning Machine Learning is one of the booming technologies across the world that enables computers/machines to turn a huge amount of data into predictions. However, these predictions highly depend on the quality of the data, and if we are not using the right data for our model, then it will not generate the expected result. In machine learning projects, we generally divide the original dataset into training data and test data. We train our model over a subset of the original dataset, i.e., the training dataset, and then evaluate whether it can generalize well to the new or unseen dataset or test set. Therefore, train and test datasets are the two key concepts of machine learning, where the training dataset is used to fit the model, and the test dataset is used to evaluate the model. In this topic, we are going to discuss train and test datasets along with the difference between both of them. So, let's start with the introduction of the training dataset and test dataset in Machine Learning. What is Training Dataset? The training data is the biggest (in -size) subset of the original dataset, which is used to train or fit the machine learning model. Firstly, the training data is fed to the ML algorithms, which lets them learn how to make predictions for the given task. For example, for training a sentiment analysis model, the training data could be as below: The training data varies depending on whether we are using Supervised Learning or Unsupervised Learning Algorithms. For Unsupervised learning, the training data contains unlabeled data points, i.e., inputs are not tagged with the corresponding outputs. Models are required to find the patterns from the given training datasets in order to make predictions. On the other hand, for supervised learning, the training data contains labels in order to train the model and make predictions. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION The type of training data that we provide to the model is highly responsible for the model's accuracy and prediction ability. It means that the better the quality of the training data, the better will be the performance of the model. Training data is approximately more than or equal to 60% of the total data for an ML project. What is Test Dataset? Once we train the model with the training dataset, it's time to test the model with the test dataset. This dataset evaluates the performance of the model and ensures that the model can generalize well with the new or unseen dataset. The test dataset is another subset of original data, which is independent of the training dataset. However, it has some similar types of features and class probability distribution and uses it as a benchmark for model evaluation once the model training is completed. Test data is a well-organized dataset that contains data for each type of scenario for a given problem that the model would be facing when used in the real world. Usually, the test dataset is approximately 20-25% of the total original data for an ML project. At this stage, we can also check and compare the testing accuracy with the training accuracy, which means how accurate our model is with the test dataset against the training dataset. If the accuracy of the model on training data is greater than that on testing data, then the model is said to have overfitting. The testing data should: o Represent or part of the original dataset. o It should be large enough to give meaningful predictions. Need of Splitting dataset into Train and Test set o Splitting the dataset into train and test sets is one of the important parts of data pre- processing, as by doing so, we can improve the performance of our model and hence give better predictability. o We can understand it as if we train our model with a training set and then test it with a completely different test dataset, and then our model will not be able to understand the correlations between the features. Therefore, if we train and test the model with two different datasets, then it will decrease the performance of the model. Hence it is important to split a dataset into two parts, i.e., train and test set. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION In this way, we can easily evaluate the performance of our model. Such as, if it performs well with the training data, but does not perform well with the test dataset, then it is estimated that the model may be overfitted. For splitting the dataset, we can use the train_test_split function of scikit-learn. The bellow line of code can be used to split dataset: Overfitting and Underfitting issues Overfitting and underfitting are the most common problems that occur in the Machine Learning model. A model can be said as overfitted when it performs quite well with the training dataset but does not generalize well with the new or unseen dataset. The issue of overfitting occurs when the model tries to cover all the data points and hence starts caching noises present in the data. Due to this, it can't generalize well to the new dataset. Because of these issues, the accuracy and efficiency of the model degrade. Generally, the complex model has a high chance of overfitting. There are various ways by which we can avoid overfitting in the model, such as Using the Cross-Validation method, early stopping the training, or by regularization, etc. Training data vs. Testing Data o The main difference between training data and testing data is that training data is the subset of original data that is used to train the machine learning model, whereas testing data is used to check the accuracy of the model. o The training dataset is generally larger in size compared to the testing dataset. The general ratios of splitting train and test datasets are 80:20, 70:30, or 90:10. o Training data is well known to the model as it is used to train the model, whereas testing data is like unseen/new data to the model. How do training and testing data work in Machine Learning? Machine Learning algorithms enable the machines to make predictions and solve problems on the basis of past observations or experiences. These experiences or observations an algorithm can take from the training data, which is fed to it. Further, one of the great things about ML algorithms is that they can learn and improve over time on their own, as they are trained with the relevant training data. Once the model is trained enough with the relevant training data, it is tested with the test data. We can understand the whole process of training and testing in three steps, which are as follows: Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION 1. Feed: Firstly, we need to train the model by feeding it with training input data. 2. Define: Now, training data is tagged with the corresponding outputs (in Supervised Learning), and the model transforms the training data into text vectors or a number of data features. 3. Test: In the last step, we test the model by feeding it with the test data/unseen dataset. This step ensures that the model is trained efficiently and can generalize well. The above process is explained using a flowchart given below: Noise – Performance Evaluation: What is noise? In Machine Learning, random or irrelevant data can result in unpredictable situations that are different from what we expected, which is known as noise. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION It results from inaccurate measurements, inaccurate data collection, or irrelevant information. Similar to how background noise can mask speech, noise can also mask relationships and patterns in data. Handling noise is essential to precise modeling and forecasting. Its effects are lessened by methods including feature selection, data cleansing, and strong algorithms. In the end, noise reduction improves machine learning models' efficacy. Causes of Noise Errors in data collection, such as malfunctioning sensors or human error during data entry, can introduce noise into machine learning. Noise can also be introduced by measurement mistakes, such as inaccurate instruments or environmental conditions. Another form of noise in data is inherent variability resulting from either natural fluctuations or unforeseen events. If data pretreatment operations like normalization or transformation are not done appropriately, they may unintentionally add noise. Inaccurate data point labeling or annotation can introduce noise and affect the learning process. Types of Noise in Machine Learning Following are the types of noises in machine learning- 1. Feature Noise: It refers to superfluous or irrelevant features present in the dataset that might cause confusion and impede the process of learning. 2. Systematic Noise: Recurring biases or mistakes in measuring or data collection procedures that cause data to be biased or incorrect. 3. Random Noise: Unpredictable fluctuations in data brought on by variables such as measurement errors or ambient circumstances. 4. Background noise: It is the information in the data that is unnecessary or irrelevant and could distract the model from the learning job. Ways to Handle Noises Noise consists of measuring errors, anomalies, or discrepancies in the information gathered. Handling noise is important because it might result in models that are unreliable and forecasts that are not correct. 1. Data preprocessing: It consists of methods to improve the quality of the data and lessen noise from errors or inconsistencies, such as data cleaning, normalization, and outlier elimination. 2. Fourier Transform: The Fourier Transform is a mathematical technique used to transform signals from the time or spatial domain to the frequency domain. In the context of noise removal, it can help identify and filter out noise by representing the signal as a combination of different frequencies.Relevant frequencies can be retained while noise frequencies can be filtered out. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION 3. Constructive Learning: Constructive learning involves training a machine learning model to distinguish between clean and noisy data instances. This approach typically requires labeled data where the noise level is known. The model learns to classify instances as either clean or noisy, allowing for the removal of noisy data points from the dataset. 4. Autoencoders: Autoencoders are neural network architectures that consist of an encoder and a decoder. The encoder compresses the input data into a lower-dimensional representation, while the decoder reconstructs the original data from this representation. Autoencoders can be trained to reconstruct clean signals while effectively filtering out noise during the reconstruction process. 5. Principal Component Analysis (PCA): PCA is a dimensionality reduction technique that identifies the principal components of a dataset, which are orthogonal vectors that capture the maximum variance in the data. By projecting the data onto a reduced set of principal components, PCA can help reduce noise by focusing on the most informative dimensions of the data while discarding noise-related dimensions. Compensation techniques Dealing with noisy data are crucial in machine learning to improve model robustness and generalization performance. Two common approaches for compensating for noisy data are cross- validation and ensemble models. 1.Cross-validation: Cross-validation is a resampling technique used to assess how well a predictive model generalizes to an independent dataset. It involves partitioning the dataset into complementary subsets, performing training on one subset (training set) and validation on the other (validation set). This process is repeated multiple times with different partitions of the data. Common cross-validation methods include k-fold cross-validation and leave-one-out cross- validation. By training on different subsets of data, cross-validation helps in reducing the impact of noise in the data. It also aids in avoiding overfitting by providing a more accurate estimate of the model's performance. 2. Ensemble Models: Ensemble learning involves combining multiple individual models to improve predictive performance compared to any single model alone. Ensemble models work by aggregating the predictions of multiple base models, such as decision trees, neural networks, or other machine learning algorithms. Popular ensemble techniques include bagging (Bootstrap Aggregating), boosting, and stacking. By combining models trained on different subsets of the data or using different algorithms, ensemble models can mitigate the impact of noise in the data. Ensemble methods are particularly effective when individual models may be sensitive to noise or may overfit the data. They help in improving robustness and generalization performance by reducing the variance of the predictions. Performance Metrics in Machine Learning Evaluating the performance of a Machine learning model is one of the important steps while building an effective ML model. To evaluate the performance or quality of the model, different metrics are used, and these metrics are known as performance metrics or evaluation Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION metrics. These performance metrics help us understand how well our model has performed for the given data. In this way, we can improve the model's performance by tuning the hyper-parameters. Each ML model aims to generalize well on unseen/new data, and performance metrics help determine how well the model generalizes on the new dataset. In machine learning, each task or problem is divided into classification and Regression. Not all metrics can be used for all types of problems; hence, it is important to know and understand which metrics should be used. Different evaluation metrics are used for both Regression and Classification tasks. In this topic, we will discuss metrics used for classification and regression tasks. 1. Performance Metrics for Classification In a classification problem, the category or classes of data is identified based on training data. The model learns from the given dataset and then classifies the new data into classes or groups based on the training. It predicts class labels as the output, such as Yes or No, 0 or 1, Spam or Not Spam, etc. To evaluate the performance of a classification model, different metrics are used, and some of them are as follows: o Accuracy o Confusion Matrix o Precision o Recall o F-Score o AUC(Area Under the Curve I. Accuracy The accuracy metric is one of the simplest Classification metrics to implement, and it can be determined as the number of correct predictions to the total number of predictions. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION To implement an accuracy metric, we can compare ground truth and predicted values in a loop, or we can also use the scikit-learn module for this. When to Use Accuracy? It is good to use the Accuracy metric when the target variable classes in data are approximately balanced. For example, if 60% of classes in a fruit image dataset are of Apple, 40% are Mango. In this case, if the model is asked to predict whether the image is of Apple or Mango, it will give a prediction with 97% of accuracy. When not to use Accuracy? It is recommended not to use the Accuracy measure when the target variable majorly belongs to one class. For example, Suppose there is a model for a disease prediction in which, out of 100 people, only five people have a disease, and 95 people don't have one. In this case, if our model predicts every person with no disease (which means a bad prediction), the Accuracy measure will be 95%, which is not correct. II. Confusion Matrix A confusion matrix is a tabular representation of prediction outcomes of any binary classifier, which is used to describe the performance of the classification model on a set of test data when true values are known. The confusion matrix is simple to implement, but the terminologies used in this matrix might be confusing for beginners. A typical confusion matrix for a binary classifier looks like the below image(However, it can be extended to use for classifiers with more than two classes). Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION We can determine the following from the above matrix: o In the matrix, columns are for the prediction values, and rows specify the Actual values. Here Actual and prediction give two possible classes, Yes or No. So, if we are predicting the presence of a disease in a patient, the Prediction column with Yes means, Patient has the disease, and for NO, the Patient doesn't have the disease. o In this example, the total number of predictions are 165, out of which 110 time predicted yes, whereas 55 times predicted No. o However, in reality, 60 cases in which patients don't have the disease, whereas 105 cases in which patients have the disease. In general, the table is divided into four terminologies, which are as follows: 1. True Positive(TP): In this case, the prediction outcome is true, and it is true in reality, also. 2. True Negative(TN): in this case, the prediction outcome is false, and it is false in reality, also. 3. False Positive(FP): In this case, prediction outcomes are true, but they are false in actuality. 4. False Negative(FN): In this case, predictions are false, and they are true in actuality. III. Precision The precision metric is used to overcome the limitation of Accuracy. The precision determines the proportion of positive prediction that was actually correct. It can be calculated as the True Positive or predictions that are actually true to the total positive predictions (True Positive and False Positive). IV. Recall or Sensitivity It is also similar to the Precision metric; however, it aims to calculate the proportion of actual positive that was identified incorrectly. It can be calculated as True Positive or predictions that are actually true to the total number of positives, either correctly predicted as positive or incorrectly predicted as negative (true Positive and false negative). The formula for calculating Recall is given below: When to use Precision and Recall? Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION From the above definitions of Precision and Recall, we can say that recall determines the performance of a classifier with respect to a false negative, whereas precision gives information about the performance of a classifier with respect to a false positive. So, if we want to minimize the false negative, then, Recall should be as near to 100%, and if we want to minimize the false positive, then precision should be close to 100% as possible. In simple words, if we maximize precision, it will minimize the FP errors, and if we maximize recall, it will minimize the FN error. V. F-Scores F-score or F1 Score is a metric to evaluate a binary classification model on the basis of predictions that are made for the positive class. It is calculated with the help of Precision and Recall. It is a type of single score that represents both Precision and Recall. So, the F1 Score can be calculated as the harmonic mean of both precision and Recall, assigning equal weight to each of them. The formula for calculating the F1 score is given below: When to use F-Score? As F-score make use of both precision and recall, so it should be used if both of them are important for evaluation, but one (precision or recall) is slightly more important to consider than the other. For example, when False negatives are comparatively more important than false positives, or vice versa. VI. AUC-ROC Sometimes we need to visualize the performance of the classification model on charts; then, we can use the AUC-ROC curve. It is one of the popular and important metrics for evaluating the performance of the classification model. Firstly, let's understand ROC (Receiver Operating Characteristic curve) curve. ROC represents a graph to show the performance of a classification model at different threshold levels. The curve is plotted between two parameters, which are: o True Positive Rate o False Positive Rate Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Foundations of Supervised Learning: Decision Tree: Understanding Decision Tree A decision tree is a graphical representation of different options for solving a problem and show how different factors are related. It has a hierarchical tree structure starts with one main question at the top called a node which further branches out into different possible outcomes where: Root Node is the starting point that represents the entire dataset. Branches: These are the lines that connect nodes. It shows the flow from one decision to another. Internal Nodes are Points where decisions are made based on the input features. Leaf Nodes: These are the terminal nodes at the end of branches that represent final outcomes or predictions. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION They also support decision-making by visualizing outcomes. You can quickly evaluate and compare the “branches” to determine which course of action is best for you. Now, let’s take an example to understand the decision tree. Imagine you want to decide whether to drink coffee based on the time of day and how tired you feel. First the tree checks the time of day— if it’s morning it asks whether you are tired. If you’re tired the tree suggests drinking coffee if not it says there’s no need. Similarly in the afternoon the tree again asks if you are tired. If you recommends drinking coffee if not it concludes no coffee is needed. Classification of Decision Tree Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION We have mainly two types of decision tree based on the nature of the target variable: classification trees and regression trees. Classification trees: They are designed to predict categorical outcomes means they classify data into different classes. They can determine whether an email is “spam” or “not spam” based on various features of the email. Regression trees : These are used when the target variable is continuous It predict numerical values rather than categories. For example a regression tree can estimate the price of a house based on its size, location, and other features. How Decision Trees Work? A decision tree working starts with a main question known as the root node. This question is derived from the features of the dataset and serves as the starting point for decision-making. From the root node, the tree asks a series of yes/no questions. Each question is designed to split the data into subsets based on specific attributes. For example if the first question is “Is it raining?”, the answer will determine which branch of the tree to follow. Depending on the response to each question you follow different branches. If your answer is “Yes,” you might proceed down one path if “No,” you will take another path. This branching continues through a sequence of decisions. As you follow each branch, you get more questions that break the data into smaller groups. This step-by-step process continues until you have no more helpful questions. You reach at the end of a branch where you find the final outcome or decision. It could be a classification (like “spam” or “not spam”) or a prediction (such as estimated price). Advantages of Decision Trees Simplicity and Interpretability: Decision trees are straightforward and easy to understand. You can visualize them like a flowchart which makes it simple to see how decisions are made. Versatility: It means they can be used for different types of tasks can work well for both classification and regression No Need for Feature Scaling: They don’t require you to normalize or scale your data. Handles Non-linear Relationships: It is capable of capturing non-linear relationships between features and target variables. Disadvantages of Decision Trees Overfitting: Overfitting occurs when a decision tree captures noise and details in the training data and it perform poorly on new data. Instability: instability means that the model can be unreliable slight variations in input can lead to significant differences in predictions. Bias towards Features with More Levels: Decision trees can become biased towards features with many categories focusing too much on them during decision-making. This can cause the model to miss out other important features led to less accurate predictions. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Applications of Decision Trees Loan Approval in Banking: A bank needs to decide whether to approve a loan application based on customer profiles. o Input features include income, credit score, employment status, and loan history. o The decision tree predicts loan approval or rejection, helping the bank make quick and reliable decisions. Medical Diagnosis: A healthcare provider wants to predict whether a patient has diabetes based on clinical test results. o Features like glucose levels, BMI, and blood pressure are used to make a decision tree. o Tree classifies patients into diabetic or non-diabetic, assisting doctors in diagnosis. Predicting Exam Results in Education : School wants to predict whether a student will pass or fail based on study habits. o Data includes attendance, time spent studying, and previous grades. o The decision tree identifies at-risk students, allowing teachers to provide additional support. Errors in Machine Learning? In machine learning, an error is a measure of how accurately an algorithm can make predictions for the previously unknown dataset. On the basis of these errors, the machine learning model is selected that can perform best on the particular dataset. There are mainly two types of errors in machine learning, which are: Reducible errors: These errors can be reduced to improve the model accuracy. Such errors can further be classified into bias and Variance. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION What is Bias? While making predictions, a difference occurs between prediction values made by the model and actual values/expected values, and this difference is known as bias errors or Errors due to bias. o Low Bias: A low bias model will make fewer assumptions about the form of the target function. o High Bias: A model with a high bias makes more assumptions, and the model becomes unable to capture the important features of our dataset. A high bias model also cannot perform well on new data. Ways to reduce High Bias: High bias mainly occurs due to a much simple model. Below are some ways to reduce the high bias: o Increase the input features as the model is underfitted. o Decrease the regularization term. o Use more complex models, such as including some polynomial features. What is a Variance Error? The variance would specify the amount of variation in the prediction if the different training data was used. In simple words, variance tells that how much a random variable is different from its expected value. Ideally, a model should not vary too much from one training dataset to another, which means the algorithm should be good in understanding the hidden mapping between inputs and output variables. Variance errors are either of low variance or high variance. Inductive bias: Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION In the realm of machine learning, the concept of inductive bias plays a pivotal role in shaping how algorithms learn from data and make predictions. It serves as a guiding principle that helps algorithms generalize from the training data to unseen data, ultimately influencing their performance and decision-making processes. In this article, we delve into the intricacies of inductive bias, its significance in machine learning, and its implications for model development and interpretation. Types of Inductive Bias Inductive bias can manifest in various forms, depending on the algorithm and its underlying assumptions. Some common types of inductive bias include: 1. Bias towards simpler explanations: Many machine learning algorithms, such as decision trees and linear models, have a bias towards simpler hypotheses. They prefer explanations that are more parsimonious and less complex, as these are often more likely to generalize well to unseen data. 2. Bias towards smoother functions: Algorithms like kernel methods or Gaussian processes have a bias towards smoother functions. They assume that neighboring points in the input space should have similar outputs, leading to smooth decision boundaries. 3. Bias towards specific types of functions: Neural networks, for example, have a bias towards learning complex, nonlinear functions. This bias allows them to capture intricate patterns in the data but can also lead to overfitting if not regularized properly. 4. Bias towards sparsity: Some algorithms, like Lasso regression, have a bias towards sparsity. They prefer solutions where only a few features are relevant, which can improve interpretability and generalization. Importance of Inductive Bias Inductive bias is crucial in machine learning as it helps algorithms generalize from limited training data to unseen data. Without a well-defined inductive bias, algorithms may struggle to make accurate predictions or may overfit the training data, leading to poor performance on new data. Geometry and nearest neighbors: K-Nearest Neighbor(KNN) Algorithm for Machine Learning o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning technique. o K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. o K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems. o K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION o It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset. o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data. Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a similarity measure. Our KNN model will find the similar features of the new data set to the cats and dogs images and based on the most similar features it will put it in either cat or dog category. Why do we need a K-NN Algorithm? Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset. Consider the below diagram: How does K-NN work? Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION The K-NN working can be explained on the basis of the below algorithm: o Step-1: Select the number K of the neighbors o Step-2: Calculate the Euclidean distance of K number of neighbors o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance. o Step-4: Among these k neighbors, count the number of the data points in each category. o Step-5: Assign the new data points to that category for which the number of the neighbor is maximum. o Step-6: Our model is ready. Suppose we have a new data point and we need to put it in the required category. Consider the below image: Firstly, we will choose the number of neighbors, so we will choose the k=5. Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the distance between two points, which we have already studied in geometry. It can be calculated as: Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION o By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category A and two nearest neighbors in category B. Consider the below image: o As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to category A. o There is no particular way to determine the best value for "K", so we need to try some values to find the best out of them. The most preferred value for K is 5. o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model. o Large values for K are good, but it may find some difficulties. Advantages of KNN Algorithm: o It is simple to implement. o It is robust to the noisy training data o It can be more effective if the training data is large. Disadvantages of KNN Algorithm: o Always needs to determine the value of K which may be complex some time. o The computation cost is high because of calculating the distance between the data points for all the training samples. What is Logistic Regression? Logistic regression is used for binary classification where we use sigmoid function, that takes input as independent variables and produces a probability value between 0 and 1. For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to Class 0. It’s referred to as regression because it is the extension of linear regression but is mainly used for classification problems. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Logistic regression predicts the output of a categorical dependent variable. Therefore, the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1. In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic function, which predicts two maximum values (0 or 1). Logistic Function – Sigmoid Function The sigmoid function is a mathematical function used to map the predicted values to probabilities. It maps any real value into another value within a range of 0 and 1. The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the “S” form. The S-form curve is called the Sigmoid function or the logistic function. In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0. Types of Logistic Regression On the basis of the categories, Logistic Regression can be classified into three types: 1. Binomial: In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc. 2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep” 3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as “low”, “Medium”, or “High”. Terminologies involved in Logistic Regression Here are some common terms involved in logistic regression: Independent variables: The input characteristics or predictor factors applied to the dependent variable’s predictions. Dependent variable: The target variable in a logistic regression model, which we are trying to predict. Logistic function: The formula used to represent how the independent and dependent variables relate to one another. The logistic function transforms the input variables into a probability value between 0 and 1, which represents the likelihood of the dependent variable being 1 or 0. Odds: It is the ratio of something occurring to something not occurring. it is different from probability as the probability is the ratio of something occurring to everything that could possibly occur. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Log-odds: The log-odds, also known as the logit function, is the natural logarithm of the odds. In logistic regression, the log odds of the dependent variable are modeled as a linear combination of the independent variables and the intercept. Coefficient: The logistic regression model’s estimated parameters, show how the independent and dependent variables relate to one another. Intercept: A constant term in the logistic regression model, which represents the log odds when all independent variables are equal to zero. Maximum likelihood estimation: The method used to estimate the coefficients of the logistic regression model, which maximizes the likelihood of observing the data given the model Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Perceptron in Machine Learning In Machine Learning and Artificial Intelligence, Perceptron is the most commonly used term for all folks. It is the primary step to learn Machine Learning and Deep Learning technologies, which consists of a set of weights, input values or scores, and a threshold. Perceptron is a building block of an Artificial Neural Network.. Perceptron is a linear Machine Learning algorithm used for supervised learning for various binary classifiers. This algorithm enables neurons to learn elements and processes them one by one during preparation. What is the Perceptron model in Machine Learning? Perceptron is Machine Learning algorithm for supervised learning of various binary classification tasks. Further, Perceptron is also understood as an Artificial Neuron or neural network unit that helps to detect certain input data computations in business intelligence. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Perceptron model is also treated as one of the best and simplest types of Artificial Neural networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we can consider it as a single-layer neural network with four main parameters, i.e., input values, weights and Bias, net sum, and an activation function. o Input Nodes or Input Layer: This is the primary component of Perceptron which accepts the initial data into the system for further processing. Each input node contains a real numerical value. o Wight and Bias: Weight parameter represents the strength of the connection between units. This is another most important parameter of Perceptron components. Weight is directly proportional to the strength of the associated input neuron in deciding the output. Further, Bias can be considered as the line of intercept in a linear equation. o Activation Function: These are the final and important components that help to determine whether the neuron will fire or not. Activation Function can be considered primarily as a step function. Types of Activation functions: o Sign function o Step function, and o Sigmoid function The data scientist uses the activation function to take a subjective decision based on various problem statements and forms the desired outputs. Activation function may differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking whether the learning process is slow or has vanishing or exploding gradients. How does Perceptron work? In Machine Learning, Perceptron is considered as a single-layer neural network that consists of four main parameters named input values (Input nodes), weights and Bias, net sum, and an activation function. The perceptron model begins with the multiplication of all input values and their weights, then adds these values together to create the weighted sum. Then this Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION weighted sum is applied to the activation function 'f' to obtain the desired output. This activation function is also known as the step function and is represented by 'f'. This step function or Activation function plays a vital role in ensuring that output is mapped between required values (0,1) or (-1,1). It is important to note that the weight of input is indicative of the strength of a node. Similarly, an input's bias value gives the ability to shift the activation function curve up or down. Perceptron model works in two important steps as follows: Step-1 In the first step first, multiply all input values with corresponding weight values and then add them to determine the weighted sum. Mathematically, we can calculate the weighted sum as follows: Add a special term called bias 'b' to this weighted sum to improve the model's performance. ∑wi*xi + b Step-2 In the second step, an activation function is applied with the above-mentioned weighted sum, which gives us output either in binary form or a continuous value as follows: Types of Perceptron Models: 1. Single Layer Perceptron Model: Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION This is one of the easiest Artificial neural networks (ANN) types. A single-layered perceptron model consists feed-forward network and also includes a threshold transfer function inside the model. The main objective of the single-layer perceptron model is to analyze the linearly separable objects with binary outcomes. In a single layer perceptron model, its algorithms do not contain recorded data, so it begins with inconstantly allocated input for weight parameters. Further, it sums up all inputs (weight). After adding all inputs, if the total sum of all inputs is more than a pre-determined value, the model gets activated and shows the output value as +1. If the outcome is same as pre-determined or threshold value, then the performance of this model is stated as satisfied, and weight demand does not change. However, this model consists of a few discrepancies triggered when multiple weight inputs values are fed into the model. Hence, to find desired output and minimize errors, some changes should be necessary for the weights input. "Single-layer perceptron can learn only linearly separable patterns." 2. Multi-Layered Perceptron Model: Like a single-layer perceptron model, a multi-layer perceptron model also has the same model structure but has a greater number of hidden layers. The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in two stages as follows: o Forward Stage: Activation functions start from the input layer in the forward stage and terminate on the output layer. o Backward Stage: In the backward stage, weight and bias values are modified as per the model's requirement. In this stage, the error between actual output and demanded originated backward on the output layer and ended on the input layer. Hence, a multi-layered perceptron model has considered as multiple artificial neural networks having various layers in which activation function does not remain linear, similar to a single layer perceptron model. Instead of linear, activation function can be executed as sigmoid, TanH, ReLU, etc., for deployment. A multi-layer perceptron model has greater processing power and can process linear and non- linear patterns. Further, it can also implement logic gates such as AND, OR, XOR, NAND, NOT, XNOR, NOR. Advantages of Multi-Layer Perceptron: o A multi-layered perceptron model can be used to solve complex non-linear problems. o It works well with both small and large input data. o It helps us to obtain quick predictions after the training. o It helps to obtain the same accuracy ratio with large as well as small data. Disadvantages of Multi-Layer Perceptron: Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION o In Multi-layer perceptron, computations are difficult and time-consuming. o In multi-layer Perceptron, it is difficult to predict how much the dependent variable affects each independent variable. o The model functioning depends on the quality of the training. Perceptron Function: Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the learned weight coefficient 'w'. Mathematically, we can express it as follows: Binary Classification: Classification teaches a machine to sort things into categories. It learns by looking at examples with labels (like emails marked “spam” or “not spam”). After learning, it can decide which category new items belong to, like identifying if a new email is spam or not. For example a classification model might be trained on dataset of images labeled as either dogs or cats and it can be used to predict the class of new and unseen images as dogs or cats based on their features such as color, texture and shape. Explaining classification in ml, horizontal axis represents the combined values of color and texture features. Vertical axis represents the combined values of shape and size features. Each colored dot in the plot represents an individual image, with the color indicating whether the model predicts the image to be a dog or a cat. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION The shaded areas in the plot show the decision boundary, which is the line or region that the model uses to decide which category (dog or cat) an image belongs to. The model classifies images on one side of the boundary as dogs and on the other side as cats, based on their features. Types of Classification When we talk about classification in machine learning, we’re talking about the process of sorting data into categories based on specific features or characteristics. There are different types of classification problems depending on how many categories (or classes) we are working with and how they are organized. There are two main classification types in machine learning: 1. Binary Classification This is the simplest kind of classification. In binary classification, the goal is to sort the data into two distinct categories. Think of it like a simple choice between two options. Imagine a system that sorts emails into either spam or not spam. It works by looking at different features of the email like certain keywords or sender details, and decides whether it’s spam or not. It only chooses between these two options. 2. Multiclass Classification Here, instead of just two categories, the data needs to be sorted into more than two categories. The model picks the one that best matches the input. Think of an image recognition system that sorts pictures of animals into categories like cat, dog, and bird. Basically, machine looks at the features in the image (like shape, color, or texture) and chooses which animal the picture is most likely to be based on the training it received. 3. Multi-Label Classification In multi-label classification single piece of data can belong to multiple categories at once. Unlike multiclass classification where each data point belongs to only one class, multi-label classification allows datapoints to belong to multiple classes. A movie recommendation system could tag a movie as both action and comedy. The system checks various features (like Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION movie plot, actors, or genre tags) and assigns multiple labels to a single piece of data, rather than just one. Multilabel classification is relevant in specific use cases, but not as crucial for a starting overview of classification. How does Classification in Machine Learning Work? Classification involves training a model using a labeled dataset, where each input is paired with its correct output label. The model learns patterns and relationships in the data, so it can later predict labels for new, unseen inputs. In machine learning, classification works by training a model to learn patterns from labeled data, so it can predict the category or class of new, unseen data. Here’s how it works: 1. Data Collection: You start with a dataset where each item is labeled with the correct class (for example, “cat” or “dog”). 2. Feature Extraction: The system identifies features (like color, shape, or texture) that help distinguish one class from another. These features are what the model uses to make predictions. 3. Model Training: Classification – machine learning algorithm uses the labeled data to learn how to map the features to the correct class. It looks for patterns and relationships in the data. 4. Model Evaluation: Once the model is trained, it’s tested on new, unseen data to check how accurately it can classify the items. 5. Prediction: After being trained and evaluated, the model can be used to predict the class of new data based on the features it has learned. 6. Model Evaluation: Evaluating a classification model is a key step in machine learning. It helps us check how well the model performs and how good it is at handling new, unseen data. Depending on the problem and needs we can use different metrics to measure its performance. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION If the quality metric is not satisfactory, the ML algorithm or hyperparameters can be adjusted, and the model is retrained. This iterative process continues until a satisfactory performance is achieved. In short, classification in machine learning is all about using existing labeled data to teach the model how to predict the class of new, unlabeled data based on the patterns it has learned. Examples of Machine Learning Classification in Real Life Classification algorithms are widely used in many real-world applications across various domains, including: Email spam filtering Credit risk assessment: Algorithms predict whether a loan applicant is likely to default by analyzing factors such as credit score, income, and loan history. This helps banks make informed lending decisions and minimize financial risk. Medical diagnosis : Machine learning models classify whether a patient has a certain condition (e.g., cancer or diabetes) based on medical data such as test results, symptoms, and patient history. This aids doctors in making quicker, more accurate diagnoses, improving patient care. Image classification : Applied in fields such as facial recognition, autonomous driving, and medical imaging. Sentiment analysis: Determining whether the sentiment of a piece of text is positive, negative, or neutral. Businesses use this to understand customer opinions, helping to improve products and services. Fraud detection : Algorithms detect fraudulent activities by analyzing transaction patterns and identifying anomalies crucial in protecting against credit card fraud and other financial crimes. Recommendation systems : Used to recommend products or content based on past user behavior, such as suggesting movies on Netflix or products on Amazon. This personalization boosts user satisfaction and sales for businesses. Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation MACHINE LEARNING: UNIT I: INTRODUCTION Prepared by Mrs. JV.Pesha, Assistant Professor, Department of Robotics and Automation

Machine Learning Unit-1 PDF

Document Details

Tags

Related

Summary

Full Transcript