Support Vector Machines (SVM) PDF
Document Details
Uploaded by FruitfulOklahomaCity9061
University of Ruhuna
Ms. Yugani Gamlath
Tags
Related
- An Efficient Machine Learning-based Text Summarization in Malayalam PDF
- Supervised Learning - Naive Bayes & Support Vector Machines PDF
- Supervised Learning - Naive Bayes & Support Vector Machines PDF
- Nearest Neighbors and Support Vector Machine PDF
- Supervised Learning Lecture 6 PDF
- Support Vector Machine (SVM) PDF
Summary
These lecture notes provide an introduction to support vector machines (SVMs). They cover the basic concepts, working principles, advantages, disadvantages, and mathematical underpinnings of SVM algorithms. The notes are suitable for undergraduate students studying machine learning.
Full Transcript
SUPPORT VECTOR MACHINE (SVM) EE5253 - Machine Learning Ms. Yugani Gamlath, Lecturer, Department of Electrical & Information Engineering, Faculty of Engineering, University of Ruhuna. CONTENT Int...
SUPPORT VECTOR MACHINE (SVM) EE5253 - Machine Learning Ms. Yugani Gamlath, Lecturer, Department of Electrical & Information Engineering, Faculty of Engineering, University of Ruhuna. CONTENT Introduction to Support Vector Machine SVM Working Principle Support Vectors, Hyperplane, Margin Hard Margin and Soft Margin What about Linear Inseparable Datasets? SVM Kernel Trick Popular SVM Kernels Advantages and Disadvantages of SVM SVM Maths 2 INTRODUCTION TO SVM Supervised Learning Algorithm. SVM offers very high accuracy compared to other classification algorithms. Can be used in both types of classification and regression problems. Very good in handling non linear input spaces. Used in a variety of applications such as face detection, intrusion detection, classification of emails, news articles and web pages, handwriting recognition, text categorization, and image classification. SVM WORKING PRINCIPLE Hyperplane SVM WORKING PRINCIPLE SVM WORKING PRINCIPLE SELECTING THE BEST HYPERPLANE SVM Working Principle Few Possible Hyperplanes for the Earlier Figure SVM Working Principle Select the Maximum Margin SVM WORKING PRINCIPLE HOW DOES SVM WORKS? 1. The main objective is to segregate the given dataset in the best possible way. 2. The distance between the either nearest points is known as the margin. 3. The objective is to select a hyperplane with the maximum possible margin between support vectors in the given dataset. SVM searches for the maximum marginal hyperplane (MMH) in the following steps, 1. Generate hyperplanes which segregates the classes in the bestway. 2. Select the right hyperplane with the maximum segregation from the either nearest datapoints. SUPPORT VECTORS, HYPERPLANE, MARGIN Support Vectors Support vectors are the data points, which are closest to the hyperplane. These points will define the separating line better by calculating margins. These points are more relevant to the construction of the classifier. Hyperplane A hyperplane is a decision plane (decision boundary) which separates between a set of objects having different class memberships. SUPPORT VECTORS, HYPERPLANE, MARGIN Margin A margin is a gap between the two lines on the closest class points. This is calculated as the perpendicular distance from the line to support vectors or closest points. If the margin is larger in between the classes, then it is considered a good margin, a smaller margin is a bad margin. HARD MARGIN AND SOFT MARGIN Hard Margin (Maximum margin classifiers): If the training data is linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible. Soft Margin (Allow misclassifications): As most of the real-world data are not fully linearly separable, we will allow some margin violation to occur, which is called soft margin classification. It is better to have a large margin, even though some constraints are violated. Margin violation means choosing a hyperplane, which can allow some data points to stay in either the incorrect side of the hyperplane and between the margin and the correct side of the hyperplane. HARD MARGIN AND SOFT MARGIN WHAT ABOUT LINEAR INSEPARABLE DATASETS? WHAT ABOUT LINEAR INSEPARABLE DATASETS? SVM KERNEL TRICK The Kernel Trick is a function that can be used to transform a dataset into higher-dimensional space so that the data becomes linearly separable. The kernel trick effectively converts a non-separable problem into a separable one by increasing the number of dimensions in the problem space and mapping the data points in to the new problem space. SVM KERNEL TRICK SVM KERNEL TRICK SVM Kernel Trick 1D 2D POPULAR SVM KERNELS ADVANTAGES OF SVM Can use for both Regression and Classification problems. Memory efficient Algorithm. Because SVM dependent on very few support vectors and take up very little memory. Works well with high-dimensional data. Integration of kernel methods makes them very flexible, able to adapt to many types of data. Works well on small datasets with high accuracy and best performance. Regularization parameters helps in avoiding the overfitting and bias problems in the model. DISADVANTAGES OF SVM Not suited for large datasets. Computational cost can be very high for large number of training samples. The training time for an SVM can also be very high, depending on the dataset. SVM does not perform very well, when the data set has more noise. SVM MATHS SVM MATHS SVM MATHS + -1 || w || SVM MATHS THANK YOU… NAÏVE BAYES CLASSIFIER EE5253 – Machine Learning Ms. Yugani Gamlath, Lecturer, Department of Electrical & Information Engineering, Faculty of Engineering, University of Ruhuna. CONTENT Introduction to Naïve Bayes Conditional Probability Bayes Theorem Why is it called Naïve Bayes? Examples and Calculations Assumptions Types of Naïve Bayes Applications Advantages and Disadvantages Math behind Naïve Bayes INTRODUCTION TO NAÏVE BAYES Supervised Learning Algorithm. Classification algorithm based on Bayes Theorem. Algorithm for binary (2–class) and multi-class classification problems. It is a probabilistic classifier. It predicts on the basis of probability of an object. It is mainly used in text classification. CONDITIONAL PROBABILITY BAYES THEOREM Posterior = Likelihood x Prior Evidence BAYES THEOREM BAYES THEOREM - PROOF EXAMPLE – BAYES THEOREM EXAMPLE – BAYES THEOREM WHY IS IT CALLED NAÏVE BAYES? Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the occurrence of other features. Bayes: It is called Bayes because it depends on the principle of Bayes theorem. ASSUMPTIONS Naïve Bayes classifier assumes that all the features are unrelated to each other. Presence or absence of a feature does not influence the presence or absence of any other feature. (Features are independent of one another) It also assumes that all features contribute equally to the outcome. EXAMPLE 01 EXAMPLE 01 Frequency and Likelihood tables of ‘Color’ EXAMPLE 01 Frequency and Likelihood tables of ‘Type’ EXAMPLE 01 Frequency and Likelihood tables of ‘Origin’ EXAMPLE 01 3 predictors X Calculate the posterior probabilities P(Yes | X) and, P( No | X ) 1/2 0.024 1/2 0.072> 0.048, Since 0.144 > 0.024, Which means given the 0.072 features RED, SUV and Domestic, our example gets classified as ‘NO’. That means the car is not 16 stolen. EXAMPLE 02 EXAMPLE O2 EXAMPLE O2 STEPS FOR CALCULATIONS First, creating a Frequency Table for each attribute against the target. Then, creating the Likelihood Tables. Finally, use the Naïve Bayes equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of the prediction. TYPES OF NAÏVE BAYES Bernoulli is used when all the features are binary, 0 and 1, yes and no features only. Multinomial is used when the values of the features are discrete or completely categorical with different figures representing different categories. Gaussian is used when the features contain continuous values. APPLICATIONS Spam filtering Text classification Sentimental analysis Recommendation systems Real time prediction News classification Medical diagnosis Weather prediction ADVANTAGES Fast and simple ML algorithm. Easy to implement. It can be used for binary classification as well as multi-class classifications. It performs well in multi-class predictions as compared to the other algorithms. It is the most popular choice for text classification problems. As it is fast, it can be used in real time predictions. DISADVANTAGES It assumes all features are unrelated or independent. So it cannot learn the relationships between features. MATH BEHIND NAÏVE BAYES Bayes theorem can be rewritten as follows: The variable y is the class variable. Variable X represents the parameters/features. X is given as, MATH BEHIND NAÏVE BAYES Here x1, x2…, xn represent the features, By substituting for X and expanding using the chain rule we get, Now, you can obtain the values for each by looking at the dataset and substitute them into the equation. For all entries in the dataset, the denominator does not change, it remains static. Therefore, the denominator can be removed and proportionality can be injected. MATH BEHIND NAÏVE BAYES Now we have to find the class variable(y) with maximum probability. Using the above function, we can obtain the class, given the predictors/features. THANK YOU… K-MEANS CLUSTERING & HIERACHICAL CLUSTERING CONTENT K-Means Clustering Hierarchical Clustering WHAT IS CLUSTERING ? The goal of clustering is to divide the set of data points into a number of groups. So that the data points within each group are more comparable to one another and different from the data points within the other groups. It is essentially a grouping of things based on how similar and different they are to one another. CLUSTERING EXAMPLE APPLICATIONS OF CLUSTERING Customer Segmentation Search Engines Recommendation Systems Social Network Analysis Identify Fake News Identify Criminal Activities Document Analysis Market Segmentation POPULAR CLUSTERING ALGORITHMS K-Means Clustering Hierarchical Clustering Mean shift Clustering DBSCAN (Density-based Spatial Clustering) K-MEANS CLUSTERING ALGORITHM… INTRODUCTION TO K-MEANS CLUSTERING It is an unsupervised learning algorithm that is used to solve the clustering problems. It groups unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created. It is a centroid-based algorithm, where each cluster is associated with a centroid. It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties. INTRODUCTION TO K-MEANS CLUSTERING HOW DOES THE K-MEANS ALGORITHM WORK? K-Means Clustering Algorithm involves the following steps: Step 1: Calculate the number of K (Clusters). Step 2: Randomly select K data points as cluster center. Step 3: Using the Euclidean distance formula measure the distance between each data point and each cluster center. Step 4: Assign each data point to that cluster whose center is nearest to that data point. HOW DOES THE K-MEANS ALGORITHM WORK? Step 5: Re-compute the center of newly formed clusters. The center of a cluster is computed by taking the mean of all the data points contained in that cluster. Step 6: Keep repeating the procedure from Step 3 to Step 5 until any of the following stopping criteria is met. ✓ If data points fall in the same cluster ✓ Reached maximum of iteration ✓ The newly formed cluster does not change in center points GRAPHICAL REPRESENTATION OF STEPS GRAPHICAL REPRESENTATION OF STEPS GRAPHICAL REPRESENTATION OF STEPS 15 GRAPHICAL REPRESENTATION OF STEPS GRAPHICAL REPRESENTATION OF STEPS GRAPHICAL REPRESENTATION OF STEPS GRAPHICAL REPRESENTATION OF STEPS GRAPHICAL REPRESENTATION OF STEPS FLOW CHART FOR K-MEANS CLUSTERING EXAMPLE Data points: P1(1,3) , P2(2,2) , P3(5,8) , P4(8,5) , P5(3,9) , P6(10,7) , P7(3,3) , P8(9,4) , P9(3,7) First, we take our K value as 3 and we assume that our Initial cluster centers are P7(3,3), P9(3,7), P8(9,4) as C1, C2, C3. Step 1 Find the distance between data points and Centroids. Which data points have a minimum distance that points moved to the nearest cluster centroid. EXAMPLE Iteration 1 For P2, Calculate the distance between data C1P2 =>(3,3)(2,2) => sqrt[(2–3)²+(2–3)²] = points and K (C1,C2,C3) sqrt =1.4 C2P2 =>(3,7)(2,2)=> sqrt[(2–3)²+(2–7)²] = For P1, sqrt =5.1 C1P1 =>(3,3)(1,3) => sqrt[(1–3)²+(3–3)²] = C3P2 =>(9,4)(2,2) => sqrt[(2–9)²+(2–4)²]= sqrt =7.3 sqrt = 2 C2P1 =>(3,7)(1,3)=> sqrt[(1–3)²+(3–7)²] = For P3, sqrt = 4.5 C3P1 =>(9,4)(1,3) => sqrt[(1–9)²+(3–4)²]= sqrt = 8.1 C1P3 =>(3,3)(5,8) => sqrt[(5–3)²+(8–3)²] = sqrt =5.3 C2P3 =>(3,7)(5,8)=> sqrt[(5–3)²+(8–7)²] = sqrt =2.2 C3P3 =>(9,4)(5,8) => sqrt[(5–9)²+(8–4)²]= sqrt =5.7 EXAMPLE Similarly for other distances. EXAMPLE Now, take new centroids and repeat Cluster 1 => P1(1,3) , P2(2,2) , P7(3,3) the same steps which are to calculate Cluster 2 => P3(5,8) , P5(3,9) , P9(3,7) the distance between data points and Cluster 3 => P4(8,5) , P6(10,7) , P8(9,4) new center points and find cluster We re-compute the new clusters and the groups. new cluster center is computed by taking Iteration 2 the mean of all the points contained in that Calculate the distance between particular cluster. data points and K (C1,C2,C3) C1(2,2.7) , C2(3.7,8) , C3(9,5.3) New center of Cluster 1 => (1+2+3)/3 , (3+2+3)/3 => 2,2.7 C1P1 =>(2,2.7)(1,3) => sqrt[(1–2)²+(3–2.7)²] = New center of Cluster 2 => (5+3+3)/3 , sqrt[1.1] =1.0 (8+9+7)/3 => 3.7,8 C2P1 =>(3.7,8)(1,3)=> sqrt[(1–3.7)²+(3–8)²] = New center of Cluster 3 => (8+10+9)/3 , sqrt[32.29] =4.5 (5+7+4)/3 => 9,5.3 C3P1 =>(9,5.3)(1,3) => sqrt[(1–9)²+(3–5.3)²]= Iteration 1 is over. sqrt[69.29] =8.3 24 EXAMPLE Similarly for other distances. Cluster 1 => P1(1,3) , P2(2,2) , P7(3,3) EXAMPLE Cluster 2 => P3(5,8) , P5(3,9) , P9(3,7) The graph explained the difference between Cluster 3 => P4(8,5) , P6(10,7) , P8(9,4) iterations 1 and 2. Center of Cluster 1 => (1+2+3)/3 , (3+2+3)/3 => 2,2.7 Center of Cluster 2 => (5+3+3)/3 , (8+9+7)/3 => 3.7,8 Center of Cluster 3 => (8+10+9)/3 , (5+7+4)/3 => 9,5.3 We got the same centroid and cluster groups. K-Means clustering stops iteration because of the same cluster repeating so no need to continue iteration and display the last iteration as the best cluster groups for this dataset. HOW TO CHOOSE THE VALUE OF K ? HOW TO CHOOSE THE OPTIMAL VALUE OF K? These methods can be used to find the optimal K value. The Elbow Method The Silhouette Method ELBOW METHOD Elbow method is one of the most popular ways to find the optimal number of clusters. This method uses the concept of WCSS value. WCSS stands for Within Cluster Sum of Squares, which defines the total variations within a cluster. To measure the distance between data points and centroid, we can use Euclidean distance. WCSS VALUE STEPS OF ELBOW METHOD It executes the K-means clustering on a given dataset for different K values (ranges from 1-10). For each value of K, calculates the WCSS value. Plots a curve between calculated WCSS values and the number of clusters K. The sharp point of bend or a point of the plot looks like an arm, then that point is considered as the best value of K. ELBOW METHOD FOR OPTIMAL K ADVANTAGES & DISADVANTAGES OF K- MEANS CLUSTERING ALGORITHM Advantages Disadvantages Choosing the K value manually is It is very simple to implement. a tough job. It is scalable and faster to a As a number of dimensions larger dataset. increase its scalability decrease. It adapts the new examples very frequently. Sensitivity to initial centroids. Sensitive to outliers. HIERACHICAL CLUSTERING ALGORITHM… INTRODUCTION TO HIERARCHICAL CLUSTERING Unsupervised Machine Learning Algorithm. Which is used to group the unlabeled dataset into a cluster. In this algorithm, we can develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as dendrogram. In K-Means clustering we have seen that there is a challenge with this algorithm, which is a predetermined number of clusters. To solve this challenge, we can use this algorithm. In this algorithm we do not need to have knowledge about the predefined number of clusters. TYPES OF HIERARCHICAL CLUSTERING Agglomerative Clustering Agglomerative is a bottom up approach, in which the algorithm starts with taking all data points as a single clusters and merging them until one cluster is left. Divisive Clustering Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down approach. WHAT IS A DENDROGRAM ? A Dendrogram is a type of tree diagram showing hierarchical relationships between different sets of data. Dendrogram contains the memory of hierarchical clustering algorithm. Just by looking at a Dendrogram you can tell how the cluster is formed. Left part is Right part is showing how showing the clusters are corresponding created in Dendrogram. agglomerative clustering. THANK YOU… ETHICS & RISKS OF MACHINE LEARNING CONTENT Introduction to Machine Learning Ethics Risks and Ethical Challenges in Machine Learning Key Ethical Principles Practical Use Cases Regulatory and Legal Frameworks Best Practices for Mitigating Risks Conclusion INTRODUCTION TO MACHINE LEARNING ETHICS Machine learning ethics is the application of ethical principles to the development, deployment, and use of ML algorithms. ML systems impact decision-making in sensitive areas (e.g., healthcare, law, finance) with the potential for harm, bias, or discrimination. Challenges: Balancing innovation with ethical concerns like privacy, transparency, and fairness. RISKS AND ETHICAL CHALLENGES IN MACHINE LEARNING Bias and Discrimination: Machine learning models can perpetuate social inequalities if trained on biased data. Lack of Transparency: Many ML algorithms, especially deep learning models, are not transparent in decision making process. Privacy Concerns: Using personal data without adequate safeguards risks violating individuals' privacy. Security: ML models are susceptible to adversarial attacks, where small changes to input data lead to incorrect outputs. Autonomy: Over-reliance on automated decision-making can undermine human oversight. KEY ETHICAL PRINCIPLES Fairness: Ensure ML systems are inclusive and do not favor any specific group. Accountability: Define who is responsible for the outcomes of ML models. Transparency: Make ML decisions explainable, so people can understand how conclusions are reached. Privacy: Protect personal data and minimize invasive data collection. Non-maleficence: Ensure ML systems do no harm. BIAS AND DISCRIMINATION IN MACHINE LEARNING Problem: ML models learn from historical data, which can contain biases. These biases get encoded in the models, leading to discriminatory outcomes. Example 1: COMPAS Algorithm: This is a risk assessment tool used in the US criminal justice system was biased against African Americans, labeling them as higher-risk for reoffending. Example 2: Facial Recognition Software: Studies have shown that many facial recognition algorithms have higher error rates for people with darker skin tones. Discussion: How can we mitigate bias in ML models? LACK OF TRANSPARENCY AND EXPLAINABILITY Problem: Deep learning models, especially those used in sensitive sectors like healthcare, are difficult to interpret. Example: Healthcare AI: ML models predicting cancer or heart disease from medical scans offer little insight into how they arrive at their decisions. Discussion: Why is explainability crucial for trust in AI systems, especially in healthcare? PRIVACY CONCERNS IN MACHINE LEARNING Problem: Many ML models rely on huge amounts of personal data, such as browsing history, social media activity, and biometric data. This raises concerns over how data is collected, stored, and used. Example: Target Case (2012): Target used machine learning to predict when women were pregnant, leading to privacy invasions and exposing sensitive information unintentionally. Discussion: How can we protect privacy while still leveraging the power of ML? SECURITY RISKS IN MACHINE LEARNING Problem: Adversarial attacks involve adding small, imperceptible changes to input data that lead ML models to make incorrect predictions. Example: Researchers have shown that adding a few pixels to an image can cause an autonomous vehicle to misidentify a stop sign as a yield sign, with potentially dangerous outcomes. Discussion: What are the consequences of these attacks in critical systems like autonomous driving or cybersecurity? AUTONOMY AND HUMAN OVERSIGHT Problem: Automated decision-making systems can replace human judgment in high-stakes scenarios, such as medical diagnosis, loan approvals, or autonomous weapons. Example: In autonomous weapons, AI systems may make lethal decisions without human input, leading to moral and ethical concerns. Discussion: How should humans maintain oversight over AI-driven decisions in life- or-death scenarios? REGULATORY AND LEGAL FRAMEWORKS Existing Regulations: GDPR (EU): Focuses on data privacy and protection, with specific rules about automated decision-making. EU AI Act: Aims to regulate AI based on the risk posed, with different rules for high- risk and low-risk AI systems. Future of AI Regulation: Explore ongoing efforts to create global standards for ethical AI development. PRACTICAL USE CASE 1: HEALTHCARE AI in Medical Diagnostics: Machine learning models can analyze medical images (e.g., X-rays, MRIs) to detect diseases such as cancer or pneumonia. Ethical Risk: Bias in training data can lead to unequal healthcare outcomes (e.g., underdiagnosis in minority populations). Mitigation Strategy: Ensure that training data is representative and includes diverse patient populations. PRACTICAL USE CASE 2: AUTONOMOUS VEHICLES AI in Self-Driving Cars: Machine learning enables cars to make real-time decisions about driving (e.g., when to stop, turn, or avoid obstacles). Ethical Risk: In accident scenarios, ML systems may face ethical dilemmas (e.g., who to prioritize in an unavoidable crash). Mitigation Strategy: Develop ethical decision-making frameworks for autonomous vehicles, ensuring that human values are encoded into their programming. PRACTICAL USE CASE 3: RECRUITMENT AND HIRING AI in Hiring: Companies use AI systems to screen resumes, schedule interviews, and evaluate candidates. Ethical Risk: If the training data is biased (e.g., underrepresentation of women or minority groups), AI systems may perpetuate discriminatory hiring practices. Mitigation Strategy: Regular audits and bias testing of hiring algorithms. BEST PRACTICES FOR MITIGATING RISKS Bias Audits: Regularly check algorithms for potential bias in predictions. Explainability Tools: Use techniques like LIME (Local Interpretable Model-Agnostic Explanations) or SHAP (Shapley Additive Explanations) to make ML models more interpretable. Privacy-Preserving Techniques: Implement privacy-enhancing technologies like differential privacy, homomorphic encryption, federated learning and hybrid approaches. Human-in-the-Loop: Ensure that human oversight remains a key part of AI decision- making processes. SELF LEARNING ACTIVITIES Activity 1: Write down the purpose, approach and characteristics of LIME and SHAP techniques separately. Activity 2: Explain about privacy-enhancement technologies; differential privacy, homomorphic encryption, federated learning and hybrid approaches. Activity 3: Explain about the “Critical Role of Bias Audits”. Activity 4: Search and explain about GDPR (EU) and EU AI Act. Activity 5: Search and learn more about COMPAS Algorithm and TARGET Case(2012). Activity 6: Search about ethical issues of Deepfake technology and Midjourney AI. CONCLUSION Ethical machine learning is about balancing innovation with responsibility, fairness, and accountability. As AI systems become more complex, we need continued focus on ethics, updated regulations, and a culture of responsibility in AI development. REFERENCE VIDEO https://www.youtube.com/watch?v=1oeoosMrJz4 REFERENCE ARTICLES https://www.propublica.org/article/machine-bias-risk-assessments-in- criminal-sentencing https://www.propublica.org/article/how-we-analyzed-the-compas- recidivism-algorithm https://www.aclu-mn.org/en/news/biased-technology-automated- discrimination-facial-recognition https://montrealethics.ai/when-algorithms-infer-pregnancy-or-other- sensitive-information-about-people/ https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai- act-first-regulation-on-artificial-intelligence https://securiti.ai/impact-of-the-gdpr-on-artificial-intelligence/ THANK YOU…