02 Machine Learning Overview.pdf
Document Details
Uploaded by SteadyNeumann
Tags
Related
- Machine Learning 1_ classification methods - lectures-1.pdf
- Artificial Intelligence and Machine Learning for Business (AIMLB) PDF
- Machine Learning Overview PDF
- Fundamentals of AI and Machine Learning PDF
- V Semester Diploma Make-Up Examination, July 2024 Artificial Intelligence & Data Science PDF
- Fundamentals of Data Science DS302 Lecture Notes PDF
Full Transcript
Machine Learning Overview Foreword ⚫ Machine learning is a core research field of AI, and it is also a necessary knowledge for deep learning. Therefore, this chapter mainly introduces the main concepts of machine learning, the classification of machine learning, the over...
Machine Learning Overview Foreword ⚫ Machine learning is a core research field of AI, and it is also a necessary knowledge for deep learning. Therefore, this chapter mainly introduces the main concepts of machine learning, the classification of machine learning, the overall process of machine learning, and the common algorithms of machine learning. 2 Huawei Confidential Objectives Upon completion of this course, you will be able to: Master the learning algorithm definition and machine learning process. Know common machine learning algorithms. Understand concepts such as hyperparameters, gradient descent, and cross validation. 3 Huawei Confidential Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case Study 4 Huawei Confidential Artificial intelligence is the capability of a computer system to mimic human cognitive functions such as learning and problem-solving. Machine learning is a branch of artificial intelligence that uses algorithms to extract data and then predict future trends. Data science is a field that studies data and how to extract meaning from it, using a series of methods, Deep learning algorithms are a branch off algorithms, systems, and tools to the broader field of machine learning that extract insights from structured uses neural networks to solve problems. and unstructured data. “a very large neural network that is 5 trained using a very large amount of data” Huawei Confidential Machine learning vs human learning 6 Huawei Confidential Machine learning vs Deep learning 7 Huawei Confidential Machine Learning ⚫ Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming, coined the term “Machine Learning”. He defined machine learning as – “Field of study that gives computers the capability to learn without being explicitly programmed”. Arthur Samuel (1959) Program (Model) 8 Huawei Confidential Machine Learning Algorithms (1) ⚫ A computer program is said to learn from experience 𝐸 with respect to some class of tasks 𝑇 and performance measure 𝑃 if its performance at tasks in 𝑇, as measured by 𝑃, improves with experience 𝐸. ⚫ Example: playing checkers. E = the experience of playing many games of checkers. T = the task of playing checkers against computer. P = the probability that the program (Computer) will win the next game. Learning Basic Data algorithms understanding (Experience E) (Task T) (Measure P) 9 Huawei Confidential Machine Learning Algorithms (2) Experience Historical data Induction Training Input Prediction Input Prediction New New Future Regularity Future Model problems data attributes 10 Huawei Confidential Created by: Jim Liang Differences Between Machine Learning Algorithms and Traditional Rule-Based Algorithms Rule-based algorithms Machine learning Training data Machine learning New data Model Prediction Samples are used for training. Explicit programming is used to solve problems. The decision-making rules are complex or difficult to describe. Rules can be manually specified. Rules are automatically learned by machines. 11 Huawei Confidential Application Scenarios of Machine Learning (1) Complex Manual rules Machine learning Rule complexity algorithms Rule-based Simple Simple problems algorithms Small Large Scale of the problem 12 Huawei Confidential Application Scenarios of Machine Learning (2) ⚫ The solution to a problem is complex, or the problem may involve a large amount of data without a clear data distribution function. ⚫ Machine learning can be used in the following scenarios: 13 Huawei Confidential Application Scenarios of Machine Learning (3) 14 Huawei Confidential Rational Understanding of Machine Learning Algorithms Target equation 𝑓: 𝑋 → 𝑌 Ideal Actual Training data Hypothesis function Learning algorithms 𝐷: {(𝑥1 , 𝑦1 ) ⋯ , (𝑥𝑛 , 𝑦𝑛 )} 𝒈≈𝒇 ⚫ Target function f is unknown. Learning algorithms cannot obtain a perfect function f. ⚫ Assume that hypothesis function g approximates function f, but may be different from function f. 15 Huawei Confidential Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study 16 Huawei Confidential Machine Learning Types 17 Huawei Confidential Machine Learning Types ⚫ Supervised learning: Obtain an optimal model with required performance through training and learning based on the samples of known categories. Then, use the model to map all inputs to outputs and check the output for the purpose of classifying unknown data. ⚫ Unsupervised learning: For unlabeled samples, the learning algorithms directly model the input datasets. Clustering is a common form of unsupervised learning. We only need to put highly similar samples together, calculate the similarity between new samples and existing ones, and classify them by similarity. ⚫ Semi-supervised learning: In one task, a machine learning model that automatically uses a large amount of unlabeled data to assist learning directly of a small amount of labeled data. ⚫ Reinforcement learning: It is an area of machine learning concerned with how agents ought to take actions in an environment to maximize some notion of cumulative reward. The difference between reinforcement learning and supervised learning is the teacher signal. The reinforcement signal provided by the environment in reinforcement learning is used to evaluate the action (scalar signal) rather than telling the learning system how to perform correct actions. 18 Huawei Confidential Main Problems Solved by Machine Learning ⚫ Machine learning can deal with many types of tasks. Classification: A computer program needs to specify which of the k categories some input belongs to. To accomplish this task, learning algorithms usually output a function 𝑓: 𝑅𝑛 → (1,2, … , 𝑘). For example, the image classification algorithm in computer vision is developed to handle classification tasks. Regression: For this type of task, a computer program predicts the output for the given input. Learning algorithms typically output a function 𝑓: 𝑅𝑛 → 𝑅. An example of this task type is to predict the claim amount of an insured person (to set the insurance premium) or predict the security price. Clustering: A large amount of data from an unlabeled dataset is divided into multiple categories according to internal similarity of the data. Data in the same category is more similar than that in different categories. This feature can be used in scenarios such as image retrieval and user profile management. ⚫ Classification and regression are two main types of prediction, accounting from 80% to 90%. ⚫ The output of classification is discrete category values, and the output of regression is continuous. 19 Huawei Confidential Supervised Learning Data feature Label Feature 1... Feature n Goal Supervised learning Feature 1... Feature n Goal algorithm Feature 1... Feature n Goal Wind Enjoy Weather Temperature Speed Sports Sunny Warm Strong Yes Rainy Cold Fair No Sunny Cold Weak Yes 21 Huawei Confidential Supervised Learning - Regression Questions ⚫ Regression: reflects the features of attribute values of samples in a sample dataset. The dependency between attribute values is discovered by expressing the relationship of sample mapping through functions. How much will I benefit from the stock next week? What's the temperature on Tuesday? 22 Huawei Confidential Supervised Learning - Classification Questions ⚫ Classification: maps samples in a sample dataset to a specified category by using a classification model. Will there be a traffic jam on XX road during the morning rush hour tomorrow? Which method is more attractive to customers: 5$ voucher or 25% off? 23 Huawei Confidential Unsupervised Learning Data Feature Feature 1... Feature n Unsupervised learning Internal Feature 1... Feature n similarity algorithm Feature 1... Feature n Monthly Consumption Commodity Consumption Time Category Badminton Cluster 1 1000–2000 6:00–12:00 racket Cluster 2 500–1000 Basketball 18:00–24:00 1000–2000 Game console 00:00–6:00 24 Huawei Confidential Unsupervised Learning - Clustering Questions ⚫ Clustering: classifies samples in a sample dataset into several categories based on the clustering model. The similarity of samples belonging to the same category is high. Which audiences like to watch movies of the same subject? Which of these components are damaged in a similar way? 25 Huawei Confidential Semi-Supervised Learning Data Feature Label Feature 1... Feature n Goal Semi-supervised Feature 1... Feature n Unknown learning algorithms Feature 1... Feature n Unknown Wind Enjoy Weather Temperature Speed Sports Sunny Warm Strong Yes Rainy Cold Fair / Sunny Cold Weak / 26 Huawei Confidential Reinforcement Learning ⚫ Reinforcement learning uses a series of actions to maximize the reward function to learn models. ⚫ Both good and bad behaviors can help reinforcement learning in model learning. ⚫ For example, autonomous vehicles learn by continuously interacting with the environment. Model Reward or Action 𝑎𝑡 Status 𝑠𝑡 punishment 𝑟𝑡 𝑟𝑡+1 𝑠𝑡+1 Environment The model perceives the environment, takes actions, and makes adjustments and choices based on the status and award or punishment. 27 Huawei Confidential Reinforcement Learning - Best Behavior ⚫ Reinforcement learning: always looks for best behaviors. Reinforcement learning is targeted at machines or robots. Autopilot: Should it brake or accelerate when the yellow light starts to flash? Cleaning robot: Should it keep working or go back for charging? Model Reward or Action 𝑎𝑡 Status 𝑠𝑡 punishment 𝑟𝑡 𝑟𝑡+1 𝑠𝑡+1 Environment 28 Huawei Confidential Contents 1. Machine learning algorithm 2. Machine Learning Classification 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study 29 Huawei Confidential Machine Learning Process Data Feature Model Data Model Model cleansing extraction deployment collection training evaluation and selection and integration Feedback and iteration 30 Huawei Confidential 31 Huawei Confidential Basic Machine Learning Concept — Dataset ⚫ Dataset: a collection of data used in machine learning tasks. Each data record is called a sample. Events or attributes that reflect the performance or nature of a sample in a particular aspect are called features. 32 Huawei Confidential Basic Machine Learning Concept — Dataset ⚫ Training set: a dataset used in the training process, where each sample is referred to as a training sample. The process of creating a model from data is called learning (training). ⚫ Test set: Testing refers to the process of using the model obtained after learning for prediction. The dataset used is called a test set, and each sample is called a test sample. 33 Huawei Confidential Basic Machine Learning Concept — Dataset ⚫ Training set: a dataset used in the training process, where each sample is referred to as a training sample. The process of creating a model from data is called learning (training). ⚫ Test set: Testing refers to the process of using the model obtained after learning for prediction. The dataset used is called a test set, and each sample is called a test sample. 34 Huawei Confidential Checking Data Overview ⚫ Typical dataset form Feature 1 Feature 2 Feature 3 Label No. Area School Districts Direction House Price 1 100 8 South 1000 2 120 9 Southwest 1300 Training set 3 60 6 North 700 4 80 9 Southeast 1100 Test set 5 95 3 South 850.. …. …. …. …. 35 Huawei Confidential Importance of Data Processing ⚫ Data is crucial to models. It is the ceiling of model capabilities. Without good data, there is no good model. Data Data cleansing Data normalization preprocessing Fill in missing values and detect and Normalize data to reduce noise and eliminate causes of dataset exceptions. improve model accuracy. Data dimension reduction Simplify data attributes to avoid dimension explosion. 36 Huawei Confidential Data Normalization ⚫ Normalization in machine learning is the process of translating data into the range [0, 1] (or any other range) or simply transforming data onto the unit sphere. Some machine learning algorithms benefit from normalization and standardization, particularly when Euclidean distance is used. 37 Huawei Confidential Workload of Data Cleansing ⚫ Statistics on data scientists' work in machine learning 3% Remodeling training datasets 5% Others 4% Optimizing models 9% Mining modes from data 19% Collecting datasets 60% Cleansing and sorting data CrowdFlower Data Science Report 2016 38 Huawei Confidential Data Cleansing ⚫ Most machine learning models process features, which are usually numeric representations of input variables that can be used in the model. ⚫ In most cases, the collected data can be used by algorithms only after being preprocessed. The preprocessing operations include the following: Data filtering Processing of lost data Processing of possible exceptions, errors, or abnormal values Combination of data from multiple data sources Data consolidation 39 Huawei Confidential Dirty Data (1) ⚫ Generally, real data may have some quality problems. Incompleteness: contains missing values or the data that lacks attributes Noise: contains incorrect records or exceptions. Inconsistency: contains inconsistent records. 40 Huawei Confidential Dirty Data (2) #Students IsTeacher # Id Name Birthday Gender Country City 1 111 John 31/12/1990 M 0 0 Ireland Dublin 2 222 Mery 15/10/1978 F 1 15 Iceland Missing value 3 333 Alice 19/04/2000 F 0 0 Spain Madrid 4 444 Mark 01/11/1997 M 0 0 France Paris 5 555 Alex 15/03/2000 A 1 23 Germany Berlin Invalid value 6 555 Peter 1983-12-01 M 1 10 Italy Rome 7 777 Calvin 05/05/1995 M 0 0 Italy Italy Value that should be in another 8 888 Roxane 03/08/1948 F 0 0 Portugal Lisbon column Genev 9 999 Anne 05/09/1992 F 0 5 Switzerland Invalid duplicate item a 10 101010 Paul 14/11/1992 M 1 26 Ytali Rome Incorrect format Attribute dependency Misspelling 41 Huawei Confidential Data Conversion ⚫ After being preprocessed, the data needs to be converted into a representation form suitable for the machine learning model. Common data conversion forms include the following: With respect to classification, category data is encoded into a corresponding numerical representation. Value data is converted to category data to reduce the value of variables (for age segmentation). Other data ◼ In the text, the word is converted into a word vector through word embedding (generally using the word2vec model, BERT model, etc). ◼ Process image data (color space, grayscale, geometric change, Haar feature, and image enhancement) Feature engineering ◼ Normalize features to ensure the same value ranges for input variables of the same model. ◼ Feature expansion: Combine or convert existing variables to generate new features, such as the average. 42 Huawei Confidential Necessity of Feature Selection ⚫ Generally, a dataset has many features, some of which may be redundant or irrelevant to the value to be predicted. Feature selection is necessary in the following aspects: Simplify models to Reduce the make them easy for users training time to interpret Improve Avoid model dimension generalization explosion and avoid overfitting 43 Huawei Confidential Feature Selection Methods 44 Huawei Confidential Feature Selection Methods 45 Huawei Confidential Feature Selection Methods - Filter ⚫ Filter methods are independent of the model during feature selection. By evaluating the correlation between each feature and the target attribute, these methods use a statistical measure to assign a value to each feature. Features are then sorted by score, which is helpful for preserving or eliminating specific features. Select the Common methods Traverse all Train models Evaluate the features optimal feature performance Pearson correlation coefficient subset Chi-square coefficient Mutual information Procedure of a filter method Limitations The filter method tends to select redundant variables as the relationship between features is not considered. 46 Huawei Confidential Feature Selection Methods - Wrapper ⚫ Wrapper methods use a prediction model to score feature subsets. Wrapper methods consider feature selection as a search issue for which different combinations are evaluated and compared. Select the optimal A predictive model is used to evaluate a combination feature subset of features and assign a score based on model accuracy. Generate a Traverse all Train models feature Evaluate features subset models Common methods Recursive feature elimination (RFE) Procedure of a wrapper method Limitations Wrapper methods train a new model for each subset, resulting in a huge number of computations. A feature set with the best performance is usually provided for a specific type of model. 47 Huawei Confidential Feature Selection Methods - Embedded ⚫ Embedded methods consider feature selection as a part of model construction. The most common type of embedded feature selection method is the regularization method. Regularization methods are also called penalization Select the optimal feature subset methods that introduce additional constraints into the optimization of a predictive algorithm that bias the model toward lower complexity and reduce the number of Traverse all Generate a Train models features feature subset + Evaluate the effect features. Common methods Procedure of an embedded method Lasso regression Ridge regression 48 Huawei Confidential Overall Procedure of Building a Model After data cleansing and feature extraction, we need to start building the model. The general procedure for building a model is shown above (supervised learning). Model Building Procedure 1 2 3 Data splitting: Model training: Model verification: Divide data into training Use data that has been cleaned Use validation sets to sets, test sets, and up and feature engineering to validate the model validation sets. train a model. validity. 6 5 4 Model fine-tuning: Model deployment: Model test: Continuously tune the Deploy the model in Use test data to evaluate the model based on the actual an actual production generalization capability of data of a service scenario. scenario. the model in a real 49 Huawei Confidential environment. Examples of Supervised Learning - Learning Phase ⚫ Use the classification model to predict whether a person is a basketball player. Feature (attribute) Target Service Name City Age Label Training set data Mike Miami 42 yes The model searches Jerry New York 32 no for the relationship (Cleansed features and tags) between features and Splitting Bryan Orlando 18 no targets. Task: Use a classification model to predict Patricia Miami 45 yes whether a person is a basketball player under a specific feature. Elodie Phoenix 35 no Test set Remy Chicago 72 yes Use new data to verify the model validity. John New York 48 yes Model training Each feature or a combination of several features can provide a basis for a model to make a judgment. 50 Huawei Confidential Examples of Supervised Learning - Prediction Phase Name City Age Label Marine Miami 45 ? Julien Miami 52 ? Unknown data Recent data, it is not New Fred Orlando 20 ? known whether the data Michelle Boston 34 ? people are basketball Nicolas Phoenix 90 ? players. IF city = Miami → Probability = +0.7 IF city= Orlando → Probability = +0.2 IF age > 42 → Probability = +0.05*age + 0.06 Application IF age ≤ 42 → Probability = +0.01*age + 0.02 model Name City Age Prediction Marine Miami 45 0.3 New Possibility prediction Julien Miami 52 0.9 data Apply the model to the Fred Orlando 20 0.6 new data to predict Prediction whether the customer will data Michelle Boston 34 0.5 change the supplier. Nicolas Phoenix 90 0.4 51 Huawei Confidential What Is a Good Model? Which factors are used to determine a model? The last three are engineering factors. The generalization capability is the most important factor. Generalization capability Can it accurately predict the actual service data? (real problem data) Interpretability Is the prediction result easy to interpret? Prediction speed How long does it take to predict each piece of data? Practicability Is the prediction rate still acceptable when the service volume increases with a huge data volume? 52 Huawei Confidential Model Validity (1) ⚫ Generalization capability: The goal of machine learning is that the model obtained after learning should perform well on new samples, not just on samples used for training. The capability of applying a model to new samples is called generalization or robustness. ⚫ Error: difference between the sample result predicted by the model obtained after learning and the actual sample result. Training error: error that you get when you run the model on the training data. Generalization error: error that you get when you run the model on new samples. Obviously, we prefer a model with a smaller generalization error. ⚫ Underfitting: occurs when the model or the algorithm does not fit the data well enough. ⚫ Overfitting: occurs when the training error of the model obtained after learning is small but the generalization error is large (poor generalization capability). 53 Huawei Confidential Model Validity (2) ⚫ Model capacity: model's capability of fitting functions, which is also called model complexity. When the capacity suits the task complexity and the amount of training data provided, the algorithm effect is usually optimal. Models with insufficient capacity cannot solve complex tasks and underfitting may occur. A high-capacity model can solve complex tasks, but overfitting may occur if the capacity is higher than that required by a task. Underfitting Good fitting Overfitting Not all features are learned. Noises are learned. 54 Huawei Confidential Overfitting Cause — Error ⚫ Total error of final prediction = Bias2 + Variance + Irreducible error ⚫ Generally, the prediction error can be divided into two types: Error caused by "bias" Variance Error caused by "variance" ⚫ Variance: Bias Offset of the prediction result from the average value Error caused by the model's sensitivity to small fluctuations in the training set ⚫ Bias: Difference between the expected (or average) prediction value and the correct value we are trying to predict. 55 Huawei Confidential Variance and Bias ⚫ Combinations of variance and bias are as follows: Low bias & low variance –> Good model Low bias & high variance High bias & low variance High bias & high variance –> Poor model ⚫ Ideally, we want a model that can accurately capture the rules in the training data and summarize the invisible data (new data). However, it is usually impossible for the model to complete both tasks at the same time. 56 Huawei Confidential Model Complexity and Error ⚫ As the model complexity increases, the training error decreases. ⚫ As the model complexity increases, the test error decreases to a certain point and then increases in the reverse direction, forming a convex curve. High bias & low Low bias & high variance variance Testing error Error Training error Model Complexity 57 Huawei Confidential Machine Learning Performance Evaluation - Regression ⚫ The closer the Mean Absolute Error (MAE) is to 0, the better the model can fit the training data. 𝑚 1 𝑀𝐴𝐸 = 𝑦𝑖 − 𝑦ො𝑖 m 𝑖=1 ⚫ Mean Square Error (MSE) m 1 2 𝑀𝑆𝐸 = 𝑦𝑖 − 𝑦ො𝑖 m 𝑖=1 ⚫ The value range of R2 is (–∞, 1]. A larger value indicates that the model can better fit the training data. TSS indicates the difference between samples. RSS indicates the difference between the predicted value and sample value. σ𝑚 2 2 𝑅𝑆𝑆 𝑖=1 𝑦𝑖 − 𝑦ො𝑖 𝑅 =1− =1− 𝑚 2 𝑇𝑆𝑆 σ𝑖=1 𝑦𝑖 − 𝑦ത𝑖 58 Huawei Confidential Machine Learning Performance Evaluation - Classification (1) ⚫ Terms and definitions: Estimated amount yes no Total 𝑃: positive, indicating the number of real positive cases Actual amount in the data. yes 𝑇𝑃 𝐹𝑁 𝑃 𝑁: negative, indicating the number of real negative cases no 𝐹𝑃 𝑇𝑁 𝑁 in the data. Total 𝑃′ 𝑁′ 𝑃+𝑁 𝑇P : true positive, indicating the number of positive cases that are correctly classified by the classifier. Confusion matrix 𝑇𝑁: true negative, indicating the number of negative cases that are correctly classified by the classifier. 𝐹𝑃: false positive, indicating the number of positive cases that are incorrectly classified by the classifier. 𝐹𝑁: false negative, indicating the number of negative cases that are incorrectly classified by the classifier. ⚫ Confusion matrix: at least an 𝑚 × 𝑚 table. 𝐶𝑀𝑖,𝑗 of the first 𝑚 rows and 𝑚 columns indicates the number of cases that actually belong to class 𝑖 but are classified into class 𝑗 by the classifier. Ideally, for a high accuracy classifier, most prediction values should be located in the diagonal from 𝐶𝑀1,1 to 𝐶𝑀𝑚,𝑚 of the table while values outside the diagonal are 0 or close to 0. That is, 𝐹𝑃 and 𝐹𝑃 are close to 0. 59 Huawei Confidential Machine Learning Performance Evaluation - Classification (1) Estimated amount yes no Total Actual amount yes 𝑇𝑃 𝐹𝑁 𝑃 no 𝐹𝑃 𝑇𝑁 𝑁 Total 𝑃′ 𝑁′ 𝑃+𝑁 Confusion matrix 60 Huawei Confidential Machine Learning Performance Evaluation - Classification (2) Measurement Ratio 𝑇𝑃 + 𝑇𝑁 Accuracy and recognition rate 𝑃+𝑁 𝐹𝑃 + 𝐹𝑁 Error rate and misclassification rate 𝑃+𝑁 𝑇𝑃 Sensitivity, true positive rate, and recall 𝑃 𝑇𝑁 Specificity and true negative rate 𝑁 𝑇𝑃 Precision 𝑇𝑃 + 𝐹𝑃 𝐹1 , harmonic mean of the recall rate and 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 precision 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 𝐹𝛽 , where 𝛽 is a non-negative real (1 + 𝛽 2 ) × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 number 𝛽 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 61 Huawei Confidential Example of Machine Learning Performance Evaluation ⚫ We have trained a machine learning model to identify whether the object in an image is a cat. Now we use 200 pictures to verify the model performance. Among the 200 images, objects in 170 images are cats, while others are not. The identification result of the model is that objects in 160 images are cats, while others are not. 𝑇𝑃 140 Precision: 𝑃 = 𝑇𝑃+𝐹𝑃 = 140+20 = 87.5% Estimated amount Actual 𝒚𝒆𝒔 𝒏𝒐 Total amount 𝑇𝑃 140 Recall: 𝑅 = 𝑃 = 170 = 82.4% 𝑦𝑒𝑠 140 30 170 𝑇𝑃+𝑇𝑁 140+10 Accuracy: 𝐴𝐶𝐶 = 𝑃+𝑁 = 170+30 = 75% 𝑛𝑜 20 10 30 Total 160 40 200 62 Huawei Confidential Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study 63 Huawei Confidential Machine Learning Training Method - Gradient Descent (1) ⚫ The gradient descent method uses the negative gradient Cost surface direction of the current position as the search direction, which is the steepest direction. The formula is as follows: wk +1 = wk − f wk ( x ) i ⚫ In the formula, 𝜂 indicates the learning rate and 𝑖 indicates the data record number 𝑖. The weight parameter w indicates the change in each iteration. ⚫ Convergence: The value of the objective function changes very little, or the maximum number of iterations is reached. 64 Huawei Confidential Machine Learning Training Method - Gradient Descent (1) ⚫ If the Second-order derivative is equal to 0, the curvature is said to be linear. Cost surface ⚫ If the Second-order derivative is greater than 0, the curvature is said to be moving upward. ⚫ If the Second-order derivative is less than 0, the curvature is said to be moving downwards. The above steps are repeated until the cost function converges. Now, by the convergence we mean, the gradient of the cost function would be equal to 0. wk +1 = wk − f wk ( x i ) 65 Huawei Confidential Machine Learning Training Method - Gradient Descent (2) ⚫ Batch Gradient Descent (BGD) uses the samples (m in total) in all datasets to update the weight parameter based on the gradient value at the current point. 1 m wk +1 = wk − f wk ( x i ) m i =1 ⚫ Stochastic Gradient Descent (SGD) randomly selects a sample in a dataset to update the weight parameter based on the gradient value at the current point. wk +1 = wk − f wk ( x i ) ⚫ Mini-Batch Gradient Descent (MBGD) combines the features of BGD and SGD and selects the gradients of n samples in a dataset to update the weight parameter. 1 t + n −1 wk +1 = wk − f wk ( x i ) n i=t 66 Huawei Confidential Machine Learning Training Method - Gradient Descent (3) ⚫ Comparison of three gradient descent methods In the SGD, samples selected for each training are stochastic. Such instability causes the loss function to be unstable or even causes reverse displacement when the loss function decreases to the lowest point. BGD has the highest stability but consumes too many computing resources. MBGD is a method that balances SGD and BGD. BGD Uses all training samples for training each time. SGD Uses one training sample for training each time. MBGD Uses a certain number of training samples for training each time. 67 Huawei Confidential Parameters and Hyperparameters in Models ⚫ The model contains not only parameters but also hyperparameters. The purpose is to enable the model to learn the optimal parameters. Parameters are automatically learned by models. Hyperparameters are manually set. Model parameters are "distilled" from data. Model Training Use hyperparameters to control training. 68 Huawei Confidential Hyperparameters of a Model 𝝀 during Lasso/Ridge regression Often used in model parameter Learning rate for training a neural estimation processes. network, number of iterations, batch size, activation function, and number Often specified by the practitioner. of neurons Can often be set using heuristics. 𝑪 and 𝝈 in support vector machines (SVM) Often tuned for a given predictive K in k-nearest neighbor (KNN) modeling problem. Number of trees in a random forest Model hyperparameters are Common model hyperparameters external configurations of models. 69 Huawei Confidential Hyperparameter Search Procedure and Method 1. Dividing a dataset into a training set, validation set, and test set. 2. Optimizing the model parameters using the training set based on the model performance indicators. 3. Searching for the model hyper-parameters using the validation set based on the model Procedure for performance indicators. searching 4. Perform step 2 and step 3 alternately. Finally, determine the model parameters and hyperparameters hyperparameters and assess the model using the test set. Grid search Random search Heuristic intelligent search Search algorithm (step Bayesian search 3) 70 Huawei Confidential Hyperparameter Searching Method - Grid Search ⚫ Grid search attempts to exhaustively search all possible hyperparameter combinations to form a hyperparameter value grid. Grid search 5 ⚫ In practice, the range of hyperparameter values to search is Hyperparameter 1 4 specified manually. 3 ⚫ Grid search is an expensive and time-consuming method. 2 This method works well when the number of hyperparameters 1 is relatively small. Therefore, it is applicable to generally machine learning algorithms but inapplicable to neural networks 0 1 2 3 4 5 (see the deep learning part). Hyperparameter 2 71 Huawei Confidential Hyperparameter Searching Method - Random Search ⚫ When the hyperparameter search space is large, random search is better than grid search. Random search ⚫ In random search, each setting is sampled from the distribution of possible parameter values, in an attempt to find the best subset of hyperparameters. Parameter 1 ⚫ Note: Search is performed within a coarse range, which then will be narrowed based on where the best result appears. Some hyperparameters are more important than others, and the Parameter 2 search deviation will be affected during random search. 72 Huawei Confidential Cross Validation (1) ⚫ Cross validation: It is a statistical analysis method used to validate the performance of a classifier. The basic idea is to divide the original dataset into two parts: training set and validation set. Train the classifier using the training set and test the model using the validation set to check the classifier performance. ⚫ k-fold cross validation (𝐊 − 𝐂𝐕): Divide the raw data into 𝑘 groups (generally, evenly divided). Use each subset as a validation set, and use the other 𝑘 − 1 subsets as the training set. A total of 𝑘 models can be obtained. Use the mean classification accuracy of the final validation sets of 𝑘 models as the performance indicator of the 𝐾 − 𝐶𝑉 classifier. 73 Huawei Confidential Cross Validation (2) Entire dataset Training set Test set Training set Validation set Test set ⚫ Note: The K value in K-fold cross validation is also a hyperparameter. 74 Huawei Confidential Cross Validation (2) 75 Huawei Confidential Cross Validation (2) 76 Huawei Confidential Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study 77 Huawei Confidential Machine Learning Algorithm Overview Machine learning Supervised Unsupervised learning learning Classification Regression Clustering Others Logistic regression Linear regression K-means Correlation rule SVM SVM Hierarchical Principal component clustering analysis (PCA) Neural network Neural network Density-based Gaussian mixture Decision tree Decision tree clustering model (GMM) Random forest Random forest GBDT GBDT SVM Support Vector Machine KNN GBDT Gradient Boosting Decision Trees Naive Bayes KNN K-Nearest Neighbour 78 Huawei Confidential Linear Regression (1) ⚫ Linear regression: a statistical analysis method to determine the quantitative relationships between two or more variables through regression analysis in mathematical statistics. ⚫ Linear regression is a type of supervised learning. Unary linear regression Multi-dimensional linear regression 79 Huawei Confidential Linear Regression (2) ⚫ The model function of linear regression is as follows, where 𝑤 indicates the weight parameter, 𝑏 indicates the bias, and 𝑥 indicates the sample attribute. hw ( x) = wT x + b ⚫ The relationship between the value predicted by the model and actual value is as follows, where 𝑦 indicates the actual value, and 𝜀 indicates the error. y = w x+b+ T ⚫ The error 𝜀 is influenced by many factors independently. According to the central limit theorem, the error 𝜀 follows normal distribution. According to the normal distribution function and maximum likelihood estimation, the loss function of linear regression is as follows: 1 J ( w) = ( hw ( x) − y ) 2 2m ⚫ To make the predicted value close to the actual value, we need to minimize the loss value. We can use the gradient descent method to calculate the weight parameter 𝑤 when the loss function reaches the minimum, and then complete model building. 80 Huawei Confidential Linear Regression Extension - Polynomial Regression ⚫ Polynomial regression is an extension of linear regression. Generally, the complexity of a dataset exceeds the possibility of fitting by a straight line. That is, obvious underfitting occurs if the original linear regression model is used. The solution is to use polynomial regression. hw ( x ) = w1 x + w2 x 2 + + wn x n + b ⚫ where, the nth power is a polynomial regression dimension (degree). ⚫ Polynomial regression belongs to linear regression as the relationship between its weight parameters 𝑤 is still linear while its nonlinearity is reflected in the feature Comparison between linear regression and dimension. polynomial regression 81 Huawei Confidential Linear Regression and Overfitting Prevention ⚫ Regularization terms can be used to reduce overfitting. The value of 𝑤 cannot be too large or too small in the sample space. You can add a square sum loss on the target function. 1 J ( w) = ( w − ) 2 2 2 h ( x ) y + w ⚫ Regularization terms (norm): The2regularization m term here is called L2-norm. Linear regression that uses this loss function is also called Ridge regression. 1 J ( w) = ( w − ) + w 1 2 h ( x ) y 2m ⚫ Linear regression with absolute loss is called Lasso regression. 82 Huawei Confidential Logistic Regression (1) ⚫ Logistic regression: The logistic regression model is used to solve classification problems. The model is defined as follows: 𝑒 𝑤𝑥+𝑏 𝑃 𝑌=1𝑥 = 1 + 𝑒 𝑤𝑥+𝑏 1 𝑃 𝑌=0𝑥 = 1 + 𝑒 𝑤𝑥+𝑏 where 𝑤 indicates the weight, 𝑏 indicates the bias, and 𝑤𝑥 + 𝑏 is regarded as the linear function of 𝑥. Compare the preceding two probability values. The class with a higher probability value is the class of 𝑥. 83 Huawei Confidential Logistic Regression (2) ⚫ Both the logistic regression model and linear regression model are generalized linear models. Logistic regression introduces nonlinear factors (the sigmoid function) based on linear regression and sets thresholds, so it can deal with binary classification problems. ⚫ According to the model function of logistic regression, the loss function of logistic regression can be estimated as follows by using the maximum likelihood estimation: 1 J ( w) = − ( y ln hw ( x) + (1 − y ) ln(1 − hw ( x)) ) m ⚫ where 𝑤 indicates the weight parameter, 𝑚 indicates the number of samples, 𝑥 indicates the sample, and 𝑦 indicates the real value. The values of all the weight parameters 𝑤 can also be obtained through the gradient descent algorithm. 84 Huawei Confidential Logistic Regression Extension - Softmax Function (1) ⚫ Logistic regression applies only to binary classification problems. For multi-class classification problems, use the Softmax function. Binary classification problem Multi-class classification problem Grape? Male? Orange? Apple? Female? Banana? 85 Huawei Confidential Logistic Regression Extension - Softmax Function (2) ⚫ Softmax regression is a generalization of logistic regression that we can use for K-class classification. ⚫ The Softmax function is used to map a K-dimensional vector of arbitrary real values to another K-dimensional vector of real values, where each vector element is in the interval (0, 1). ⚫ The regression probability function of Softmax is as follows: wkT x e p( y = k | x; w) = K , k = 1, 2 ,K e l =1 wlT x 86 Huawei Confidential Logistic Regression Extension - Softmax Function (3) ⚫ Softmax assigns a probability to each class in a multi-class problem. These probabilities must add up to 1. Softmax may produce a form belonging to a particular class. Example: Category Probability Grape? 0.09 Sum of all probabilities: Orange? 0.22 0.09 + 0.22 + 0.68 + 0.01 =1 Most probably, this picture is an Apple? apple. 0.68 Banana? 0.01 87 Huawei Confidential Decision Tree ⚫ A decision tree is a tree structure (a binary tree or a non-binary tree). Each non-leaf node represents a test on a feature attribute. Each branch represents the output of a feature attribute in a certain value range, and each leaf node stores a category. To use the decision tree, start from the root node, test the feature attributes of the items to be classified, select the output branches, and use the category stored on the leaf node as the final result. Root Short Tall Cannot Can Short Long squeak squeak neck neck Short Long Might be a Might be a Might be a nose nose giraffe squirrel rat On land In water Might be an elephant Might be a Might be a rhinoceros hippo 88 Huawei Confidential Decision Tree Structure Root Node Internal Internal Node Node Internal Leaf Node Leaf Node Node Leaf Node Leaf Node Leaf Node Leaf Node 89 Huawei Confidential Key Points of Decision Tree Construction ⚫ To create a decision tree, we need to select attributes and determine the tree structure between feature attributes. The key step of constructing a decision tree is to divide data of all feature attributes, compare the result sets in terms of 'purity', and select the attribute with the highest 'purity' as the data point for dataset division. ⚫ The metrics to quantify the 'purity' include the information entropy and GINI Index. The formula is as follows: K K H ( X )= - pk log 2 ( pk ) Gini = 1 − pk2 k =1 k =1 ⚫ where 𝑝𝑘 indicates the probability that the sample belongs to class k (there are K classes in total). A greater difference between purity before segmentation and that after segmentation indicates a better decision tree. ⚫ Common decision tree algorithms include ID3, C4.5, and CART. 90 Huawei Confidential Decision Tree Construction Process ⚫ Feature selection: Select a feature from the features of the training data as the split standard of the current node. (Different standards generate different decision tree algorithms.) ⚫ Decision tree generation: Generate internal node upside down based on the selected features and stop until the dataset can no longer be split. ⚫ Pruning: The decision tree may easily become overfitting unless necessary pruning (including pre-pruning and post-pruning) is performed to reduce the tree size and optimize its node structure. 91 Huawei Confidential Decision Tree Example ⚫ The following figure shows a classification when a decision tree is used. The classification result is impacted by three attributes: Refund, Marital Status, and Taxable Income. Marital Taxable Tid Refund Cheat Status Income 1 Yes Single 125,000 No Refund 2 No Married 100,000 No 3 No Single 70,000 No Marital No Status 4 Yes Married 120,000 No 5 No Divorced 95,000 Yes Taxable 6 No Married 60,000 No Income No 7 Yes Divorced 220,000 No 8 No Single 85,000 Yes No Yes 9 No Married 75,000 No 10 No Single 90,000 Yes 92 Huawei Confidential SVM ⚫ SVM is a binary classification model whose basic model is a linear classifier defined in the eigenspace with the largest interval. SVMs also include kernel tricks that make them nonlinear classifiers. The SVM learning algorithm is the optimal solution to convex quadratic programming. weight Projection Complex segmentation Easy segmentation in height in low-dimensional high-dimensional space space 93 Huawei Confidential Linear SVM (1) ⚫ How do we split the red and blue datasets by a straight line? or With binary classification Both the left and right methods can be used to divide Two-dimensional dataset datasets. Which of them is correct? 94 Huawei Confidential Linear SVM (2) ⚫ Straight lines are used to divide data into different classes. Actually, we can use multiple straight lines to divide data. The core idea of the SVM is to find a straight line and keep the point close to the straight line as far as possible from the straight line. This can enable strong generalization capability of the model. These points are called support vectors. ⚫ In two-dimensional space, we use straight lines for segmentation. In high-dimensional space, we use hyperplanes for segmentation. Distance between support vectors is as far as possible 95 Huawei Confidential Nonlinear SVM (1) ⚫ How do we classify a nonlinear separable dataset? Linear SVM can function well for Nonlinear datasets cannot be split linear separable datasets. with straight lines. 96 Huawei Confidential Nonlinear SVM (2) ⚫ Kernel functions are used to construct nonlinear SVMs. ⚫ Kernel functions allow algorithms to fit the largest hyperplane in a transformed high-dimensional feature space. Common kernel functions Linear Polynomial kernel kernel function function Gaussian Sigmoid kernel kernel function function Input space High-dimensional feature space 97 Huawei Confidential KNN Algorithm (1) ⚫ The KNN classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms. According to this method, if the majority of k samples most similar to one sample (nearest neighbors in the eigenspace) ? belong to a specific category, this sample also belongs to this category. The target category of point ? varies with the number of the most adjacent nodes. 98 Huawei Confidential KNN Algorithm (2) ⚫ As the prediction result is determined based on the number and weights of neighbors in the training set, the KNN algorithm has a simple logic. ⚫ KNN is a non-parametric method which is usually used in datasets with irregular decision boundaries. The KNN algorithm generally adopts the majority voting method for classification prediction and the average value method for regression prediction. ⚫ KNN requires a huge number of computations. 99 Huawei Confidential KNN Algorithm (3) ⚫ Generally, a larger k value reduces the impact of noise on classification, but obfuscates the boundary between classes. A larger k value means a higher probability of underfitting because the segmentation is too rough. A smaller k value means a higher probability of overfitting because the segmentation is too refined. The boundary becomes smoother as the value of k increases. As the k value increases to infinity, all data points will eventually become all blue or all red. 100 Huawei Confidential Naive Bayes (1) ⚫ Naive Bayes algorithm: a simple multi-class classification algorithm based on the Bayes theorem. ⚫ Class conditional independence: The Bayes classifier assumes that the effect of an attribute value on a given class is independent of the values of other attributes. This assumption is made to simplify the calculation and becomes "naive" in this sense. ⚫ The Bayes classifier, featuring high accuracy and speed, can be applied to large databases. 101 Huawei Confidential Naive Bayes (1) ⚫ It assumes that features are independent of each other. For a given sample feature 𝑋, the probability that a sample belongs to a category 𝐻 is: 𝑋1 , … , 𝑋𝑛 are data features, which are usually described by measurement values of m attribute sets. ◼ For example, the color feature may have three attributes: red, yellow, and blue. 𝐶𝑘 indicates that the data belongs to a specific category 𝐶 𝑃 𝐶𝑘 |𝑋1 , … , 𝑋𝑛 is a posterior probability, or a posterior probability of under condition 𝐶𝑘. 𝑃 𝐶𝑘 is a prior probability that is independent of 𝑋1 , … , 𝑋𝑛 𝑃 𝑋1 , … , 𝑋𝑛 is the priori probability of 𝑋. 102 Huawei Confidential Naive Bayes (2) ⚫ Independent assumption of features. For example, if a fruit is red, round, and about 10 cm (3.94 in.) in diameter, it can be considered an apple. A Naive Bayes classifier considers that each feature independently contributes to the probability that the fruit is an apple, regardless of any possible correlation between the color, roundness, and diameter. 104 Huawei Confidential Ensemble Learning ⚫ Ensemble learning is a machine learning paradigm in which multiple learners are trained and combined to solve the same problem. When multiple learners are used, the integrated generalization capability can be much stronger than that of a single learner. ⚫ If you ask a complex question to thousands of people at random and then summarize their answers, the summarized answer is better than an expert's answer in most cases. This is the wisdom of the masses. Training set Dataset 1 Dataset 2 Dataset m Model 1 Model 2 Model m Large Model model synthesis 105 Huawei Confidential Classification of Ensemble Learning Bagging (Random Forest) Independently builds several basic learners and then averages Bagging their predictions. On average, a composite learner is usually better than a single-base learner because of a smaller variance. Ensemble learning Boosting (Adaboost, GBDT, and XGboost) Constructs basic learners in sequence to gradually Boosting reduce the bias of a composite learner. The composite learner can fit data well, which may also cause overfitting. 106 Huawei Confidential Classification of Ensemble Learning 107 Huawei Confidential Classification of Ensemble Learning 108 Huawei Confidential Ensemble Methods in Machine Learning (1) ⚫ Random forest = Bagging + CART decision tree ⚫ Random forests build multiple decision trees and merge them together to make predictions more accurate and stable. Random forests can be used for classification and regression problems. Bootstrap sampling Decision tree building Aggregation prediction result Data subset 1 Prediction 1 Data subset 2 Prediction 2 Category: majority All training data voting Final prediction Regression: Prediction average value Data subset Prediction n 109 Huawei Confidential Ensemble Methods in Machine Learning (2) ⚫ GBDT is a type of boosting algorithm. ⚫ For an aggregative mode, the sum of the results of all the basic learners equals the predicted value. In essence, the residual of the error function to the predicted value is fit by the next basic learner. (The residual is the error between the predicted value and the actual value.) ⚫ During model training, GBDT requires that the sample loss for model prediction be as small as possible. Prediction 30 years old 20 years old Residual calculation Prediction 10 years old 9 years old Residual calculation Prediction 1 year old 1 year old 110 Huawei Confidential Unsupervised Learning - K-means ⚫ K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. ⚫ For the k-means algorithm, specify the final number of clusters (k). Then, divide n data objects into k clusters. The clusters obtained meet the following conditions: (1) Objects in the same cluster are highly similar. (2) The similarity of objects in different clusters is small. x1 x1 K-means clustering The data is not tagged. K-means clustering can automatically classify datasets. x2 x2 111 Huawei Confidential Unsupervised Learning - Hierarchical Clustering ⚫ Hierarchical clustering divides a dataset at different layers and forms a tree-like clustering structure. The dataset division may use a "bottom-up" aggregation policy, or a "top-down" splitting policy. The hierarchy of clustering is represented in a tree graph. The root is the unique cluster of all samples, and the leaves are the cluster of only a sample. 112 Huawei Confidential Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study 113 Huawei Confidential Comprehensive Case ⚫ Assume that there is a dataset containing the house areas and prices of 21,613 housing units sold in a city. Based on this data, we can predict the prices of other houses in the city. House Area Price 1,180 221,900 2,570 538,000 770 180,000 1,960 604,000 1,680 510,000 5,420 1,225,000 Dataset 1,715 257,500 1,060 291,850 1,160 468,000 1,430 310,000 1,370 400,000 1,810 530,000 … … 114 Huawei Confidential Problem Analysis ⚫ This case contains a large amount of data, including input x (house area), and output y (price), which is a continuous value. We can use regression of supervised learning. Draw a scatter chart based on the data and use linear regression. ⚫ Our goal is to build a model function h(x) that infinitely approximates the function that expresses true distribution of the dataset. ⚫ Then, use the model to predict unknown price data. x Unary linear regression function Feature: house area h( x) = wo + w1 x Input Price Dataset Learning h(x) algorithm Output y Label: price House area 115 Huawei Confidential Goal of Linear Regression ⚫ Linear regression aims to find a straight line that best fits the dataset. ⚫ Linear regression is a parameter-based model. Here, we need learning parameters 𝑤0 and 𝑤1. When these two parameters are found, the best model appears. Which line is the best parameter? h( x) = wo + w1 x Price Price House area House area 116 Huawei Confidential Loss Function of Linear Regression ⚫ To find the optimal parameter, construct a loss function and find the parameter values when the loss function becomes the minimum. 1 J ( w) = ( − ) 2 Loss function of linear h ( x ) y regression: 2m Error Error Error Error Goal: Price 1 arg min J ( w) = ( h( x ) − y ) 2 w 2m where, m indicates the number of samples, h(x) indicates the predicted value, and y indicates House area the actual value. 117 Huawei Confidential Gradient Descent Method ⚫ The gradient descent algorithm finds the minimum value of a function through iteration. ⚫ It aims to randomize an initial point on the loss function, and then find the global minimum value of the loss function based on the negative gradient direction. Such parameter value is the optimal parameter value. Point A: the position of 𝑤0 and 𝑤1 after random initialization. 𝑤0 and 𝑤1 are the required parameters. A-B connection line: a path formed based on descents in Cost surface a negative gradient direction. Upon each descent, values 𝑤0 and 𝑤1 change, and the regression line also changes. Point B: global minimum value of the loss function. Final values of 𝑤0 and 𝑤1 are also found. 118 Huawei Confidential Iteration Example ⚫ The following is an example of a gradient descent iteration. We can see that as red points on the loss function surface gradually approach a lowest point, fitting of the linear regression red line with data becomes better and better. At this time, we can get the best parameters. 119 Huawei Confidential Model Debugging and Application ⚫ After the model is trained, test it with the test set The final model result is as follows: to ensure the generalization capability. h( x) = 280.62 x − 43581 ⚫ If overfitting occurs, use Lasso regression or Ridge regression with regularization terms and tune the hyperparameters. Price ⚫ If underfitting occurs, use a more complex regression model, such as GBDT. ⚫ Note: For real data, pay attention to the functions of data cleansing and feature engineering. House area 120 Huawei Confidential Summary ⚫ First, this course describes the definition and classification of machine learning, as well as problems machine learning solves. Then, it introduces key knowledge points of machine learning, including the overall procedure (data collection, data cleansing, feature extraction, model training, model training and evaluation, and model deployment), common algorithms (linear regression, logistic regression, decision tree, SVM, naive Bayes, KNN, ensemble learning, K-means, etc.), gradient descent algorithm, parameters and hyper-parameters. ⚫ Finally, a complete machine learning process is presented by a case of using linear regression to predict house prices. 121 Huawei Confidential Quiz 1. (True or false) Gradient descent iteration is the only method of machine learning algorithms. ( ) A. True B. False 2. (Single-answer question) Which of the following algorithms is not supervised learning ? ( ) A. Linear regression B. Decision tree C. KNN D. K-means 122 Huawei Confidential Recommendations ⚫ Online learning website https://e.huawei.com/en/talent/#/ ⚫ Huawei Knowledge Base https://support.huawei.com/enterprise/en/knowledge?lang=en 123 Huawei Confidential Thank you. 把数字世界带入每个人、每个家庭、 每个组织,构建万物互联的智能世界。 Bring digital to every person, home, and organization for a fully connected, intelligent world. Copyright©2020 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.