T1 Introduction to Machine Learning PDF
Document Details
Uploaded by Deleted User
Universidade de Coimbra
Jorge Henriques
Tags
Summary
This document is an introduction to machine learning and covers various aspects. It details the machine learning processing steps, examples, and the evolution of the field. The document also touches upon the differences between hard and soft computing.
Full Transcript
1 AC – Aprendizagem Computacional / Machine Learning T1 – Introduction to Machine Learning Jorge Henriques [email protected] Departamento de Engenharia Informática Faculdade de Ciências e Tecnologia https://in.pinterest.com/pin/772719248567487299/ ...
1 AC – Aprendizagem Computacional / Machine Learning T1 – Introduction to Machine Learning Jorge Henriques [email protected] Departamento de Engenharia Informática Faculdade de Ciências e Tecnologia https://in.pinterest.com/pin/772719248567487299/ 2 ▪Note ▪ These slides summarize the subjects covered in the classes. ▪ They do do not expose and detail all the necessary information. ▪ They are not assumed to be the only element of study JH | AC | T1. Introduction 3 ▪ NOTE ▪ These slides were prepared based on the ones presented by Prof. Antonio Dourado (2023/24) ▪ Thank you to Prof. Dourado for permitting their use. JH | AC | T1. Introduction Contents 4 ▪ 1| Introduction ▪ 2| Machine learning processing steps ▪ 3| Example ▪ 4| Learning process ▪ 5| Hard and soft computing ▪ 6| Machine Learning evolution ▪ 7| Conclusion JH | AC | T1. Introduction ▪ Bibliography 1| Introduction 5 ▪ Machine learning ▪ “Machine Learning (ML) is a branch of AI/computer science that aims to develop algorithms capable of learning and improving from experience, without being explicitly programmed for each task“* ▪ These algorithms are designed to analyse and interpret data, identifying patterns, making predictions, making decisions, based on the information they have been trained on.” ** JH | AC | T1. Introduction *Q. Bi, K. E. Goodman, J. Kaminsky, and J. Lessler, “What is machine learning? a primer for the epidemiologist,” American Journal of Epidemiology, vol. 188, 2019 **K. P. Murphy, Machine learning: a probabilistic perspective (adaptive computation and machine learning series), vol. 621485037. MIT Press, 2012. 1| Introduction 6 ▪ Machine learning ▪ AIArtificial intelligence versus MLMachine Learning versus DLDeep Learning Artificial Intelligence Techniques (symbolic and data based) to enable machines to mimic human behaviour Machine Learning Subset of AI techniques, based on data to enable machines to improve with experience Deep Learning Subset of ML techniques, which make multi-layer neural networks JH | AC | T1. Introduction feasible https://www.bindt.org/admin/Downloads/Niels%20Jeppesen%20Presentation%20NDT%20Workshop%20Blyth.pdf 1| Introduction 8 ▪ Machine learning - roots on artificial intelligence Automatic methods capable to learn from data Focused on improving performance of learning methods Able to deal with: o Large volumes, heterogeneous, incremental data (online, adaptive) Machine learning models/techniques o Clustering o Neural networks o Decision trees o Fuzzy systems o Deep learning methods JH | AC | T1. Introduction o... 1| Introduction 9 Statistics versus machine learning Statistics Machine learning Inference from a sample set Efficient algorithms to solve optimization problems ▪ Statistics has the roots on mathematics ▪ Machine learning has the roots on artificial intelligence More theory-based More heuristic More focused on testing hypotheses Focused on improving performance of earning (confidence intervals, validation methods, …) (sensitivity, specificity, accuracy) ▪ Differences (statistics/ML) ▪ Origin of the data | primary secondary ▪ Volume of data | low high JH | AC | T1. Introduction ▪ Presence of noise/artifacts | none frequent 1| Introduction 10 ML extraction of patterns and knowledge from data ? ? ? ▪ Patterns, knowledge ? ▪ Descriptive ▪ Predictive ▪ Explanatory, find patterns that describe the data ▪ Unsupervised (problem is formulated without knowledge of the target) Descriptive ▪ Typical methods Clustering, Association rules 1. Clustering Patients with similar characteristics ▪ Group similar data together into clusters ▪ Stratification of patients JH | AC | T1. Introduction 1| Introduction 11 ML extraction of patterns or knowledge : descriptive / predictive methods ▪ Use some variables to predict other variables or future values ▪ Supervised (problem is formulated knowing the target) Predictive ▪ Typical methods Classification, Prediction Classification of arrhythmias 2. Classification ▪ Map data into predefined groups or known classes ▪ Diagnosis occurrence of a disease/event ? 3. Regression CV risk assessment (risk factors → event) ▪ Predict a value of a variable (continuous value) JH | AC | T1. Introduction Hypertension based on the values of other variables Smoking Cardiovascular risk based assessment based Diabetes Obesity on risk factors CVD risk 1| Introduction 12 ▪ Static / temporal data 30 ▪ Static data 20 Each set of values does not depend on the others 10 ▪ Temporal data time Sequences of data 0 1 2 3 4 5 6 7 8 9 o Time series (data ordered with respect to time) 30 4. Similarity search 20 Retrieve the sequences similar to a pattern Involves the definition of time series similarity 10 Similarity between a disease progression 0 1 2 3 4 5 6 7 8 9 Temporal 30 5. Predict future values ? JH | AC | T1. Introduction 20 Forecast a specific value for future data Early detection of a specific event 10 0 1 2 3 4 5 6 7 8 9 10 10 1| Introduction 13 ▪ Static / temporal data 30 ▪ Static data Each set of values does not depend on the others 20 ▪ Temporal data 10 Sequences of data time o Gene sequence 0 1 2 3 4 5 6 7 8 9 o Time series (data ordered with respect to time) 30 4. Similarity search - indexing 20 Temporal Retrieve the sequences that are similar to a pattern Indexing problem 10 Involves the definition of time series similarity 0 1 2 3 4 5 6 7 8 9 6. Segmentation ECG segmentation Identification/delineation 5. Predict future values of main components 30 ? Peaks, main Waves and intervals JH | AC | T1. Introduction Forecast a specific value for future data 20 in an ECG – electrocardiogram Trends: long trends, cyclic trends or seasonal trends 10 ECG – electrical activity of the heart 0 1 2 3 4 5 6 7 8 9 10 10 1| Introduction 15 ▪ Typical problem ▪ Inputs ▪ Outputs labels/target Variables/attributes/features/ch The task ! aracteristics used as inputs for By means of clustering, the model classification, prediction,... X T Inputs Classification Target/ desired value JH | AC | T1. Introduction 1| Introduction 16 ▪ Classification of fruits ▪ Based on the characteristics of a fruit, develop a system for automatic classification. ▪ Multi-class (3 classes) Inputs | X= {size, Color, Weight, Form} |Characteristics Output | T={apple, pear, orange} |One among three fruits (classes) Fruit Size Apple Color Weight Classification Pear Orange ? JH | AC | T1. Introduction Form 1| Introduction 17 ▪ Fraud Detection – VISA card transaction ▪ Based on the information of the transaction determine whether a transaction is fraudulent or not. ▪ Binary (two) classes X Type of transaction Location T Type of purchase Classification Fraud {no/yes}={0,1} Value JH | AC | T1. Introduction Customers’ profile... 1| Introduction 18 ▪ Cardiac risk ▪ Based on a set of characteristics of an individual (risk factors), develop an automated system for the classification of the cardiovascular risk. Inputs | X={Age, gender, heart rate, blood pressure, weight, cholesterol} Output | T={Cardiac risk} - {Low risk, Intermediate risk, High risk} Age Cardiac risk ? gender Heart rate Low blood pressure Classification Intermediate JH | AC | T1. Introduction weight High cholesterol 1| Introduction 19 ▪Profit (regression) ▪ A credit institution, based on data from daily operations, wants to calculate the amount of daily profit Inputs | The data that characterize each transaction Output | Daily profit Type of transaction Daily Profit Value Profile of the customers Regression... JH | AC | T1. Introduction 1| Introduction 20 ▪ Euromilhões Based on the history of previous results, predict the numbers for the euromilhões in the next week. Past numbers Mean values Standard deviation.. Numbers for the next ??? Prediction week JH | AC | T1. Introduction Contents 21 ▪ 1| Introduction ▪ 2| Machine learning processing steps ▪ 3| Example ▪ 4| Learning process ▪ 5| Hard and soft computing ▪ 6| Machine Learning evolution ▪ 7| Conclusion JH | AC | T1. Introduction ▪ Bibliography 2| ML processing steps 22 ▪ Steps ▪ 4. Modelling ▪ 1. Data selection ▪ Clustering , classification, Select data from various sources prediction ▪ 2. Pre-processing ▪ 5. Evaluating and interpretation Cleaning and preparing data ▪ Assessing results, present results to ▪ 3. Data transform user in meaningful manner Map complex objects (e.g. time series) to simple features Decision ▪ Clustering ▪ Classification ▪ Prediction Data sources Data Pre Data Modelling Evaluation Selection Processing transform JH | AC | T1. Introduction Data Data Data Data Data Selection Selection Selection Selection Selection 2| ML processing steps 23 ▪ Steps ▪ 4. Modelling ▪ 1. Data selection ▪ Clustering , classification, Select data from various sources prediction ▪ 2. Pre-processing ▪ 5. Evaluating and interpretation Cleaning and preparing data ▪ Assessing results, present results to ▪ 3. Data transform user in meaningful manner Map complex objects (e.g. time series) to simple features Decision ▪ Clustering ▪ Classification ▪ Prediction Data sources Data Pre Data Modeling Evaluation Selection Processing transform JH | AC | T1. Introduction Data Data Data Data Data Selection Selection Selection Selection Selection 2| ML processing steps 24 ▪ 1| Data selection ▪ Available data? ▪ From the available data, which subset may be used? Privacy concerns for example! ▪ Which characteristics/features are need to solve our problem ? If we want to estimate cardiovascular risk, do we have access to cardiac variables of the patient? Is the color of the eyes relevant? Data Preview Processing model Rating Selection processing data JH | AC | T1. Introduction Data Data Data Data Data Check Check Check Check Check 2| ML processing steps 25 ▪ Data types ▪ 1| Binary data Two possible values | {no, yes} Ex. Gender | {male, female} = {0, 1} ▪ 2| Categoric A finite number of values – discrete values Unordered (nominal) | Ordered (ordinal) Cannot be compared | Can be compared Ex. Colors | Ex. valuation of a service o {red, geeen, yellow} { very poor, poor, fair, good , very good } ▪ 3| Continuous JH | AC | T1. Introduction A value, assuming infinite possiblities – real value Ex. Temperature, height, or weight. Temperature= 23.37, or 36.21 2| ML processing steps 26 ▪ Data types ▪ Static ▪ Temporal, dynamic The registered data is independent from the The data at a given instant, others dependent on the previous values s Ex. The weight and height of students in a class Ex. The weight of an individual collected on a daily basis Weight Height 80 Student 1 79 78 Student 2 77 75 67 74 73 72 JH | AC | T1. Introduction Body weight of José Manuel Student n 2| ML processing steps 27 ▪ Steps ▪ 4. Modelling ▪ 1. Data selection ▪ Clustering , classification, Select data from various sources prediction ▪ 2. Pre-processing ▪ 5. Evaluating and interpretation Cleaning and preparing data ▪ Assessing results, present results to ▪ 3. Data transform user in meaningful manner Map complex objects (e.g. time series) to simple features Decision ▪ Clustering ▪ Classification ▪ Prediction Data sources Data Pre Data Modeling Evaluation Selection Processing transform JH | AC | T1. Introduction Data Data Data Data Data Selection Selection Selection Selection Selection 2| ML processing steps 28 ▪ 2. Pre-processing ▪ Also known as Data cleaning or Data preparation ▪ To ensure that the data is the suitable quality for analysis. How to deal with missing values ? How to deal with abnormal values (outliers) ? How to reduce the noise present in the data ? JH | AC | T1. Introduction 2| ML processing steps 29 ▪ 2. Pre-processing ▪ Outliers Values are abnormal Different from the typical values Telemonitoring: measurements on a daily basis The typical value is around at 75 kg One day the weight is 140 kg! Weight JH | AC | T1. Introduction Height 2| ML processing steps 30 ▪ 2. Pre-processing ▪ Noise Adverse effect on the original signal High-frequency / low-frequency noise High-frequency noise No noise Low-frequency noise JH | AC | T1. Introduction 2| ML processing steps 31 ▪2. Pre-processing ▪ Missing Values Missing date Patient data The measurement of the weight – daily basis For a couple of days, the values were not collected. Glucose an Age are missing for some patients JH | AC | T1. Introduction 2| ML processing steps 32 ▪ Steps ▪ 4. Modelling ▪ 1. Data selection ▪ Clustering , classification, Select data from various sources prediction ▪ 2. Pre-processing ▪ 5. Evaluating and interpretation Cleaning and preparing data ▪ Assessing results, present results to ▪ 3. Data transform user in meaningful manner Map complex objects (e.g. time series) to simple features Decision ▪ Clustering ▪ Classification ▪ Prediction Data sources Data Pre Data Modeling Evaluation Selection Processing transform JH | AC | T1. Introduction Data Data Data Data Data Selection Selection Selection Selection Selection 2| ML processing steps 33 ▪3. Data transform ▪ Also known as Feature Extraction or Feature Engineering ▪ Important to use characteristics/features that “summarize” the original data It would be beneficial (to facilitate the modelling process) A time series acquisition over one hour can be summarized using a few set of parameters ? ▪ Compute features from data ▪ Some attributes can be redundant! No sense to use it ! ▪ A collection of data or the signal may have a high dimension. JH | AC | T1. Introduction Transform the original data to a lower dimension 2| ML processing steps 34 ▪ 3. Data transform ▪ Feature extraction A sine wave (time series) can be characterized by two parameters. It's not necessary to use the entire wave. It is sufficient to use two features: the amplitude and the period JH | AC | T1. Introduction 2| ML processing steps 35 ▪ 3. Data transform ▪ Feature Extraction Characterize the ECG using a set of relevant parameters o ST segment deviation o PR interval The deviation of the ST segment Interval PR = 0.2 s Ischemia Myocardial infarction PR interval ST JH | AC | T1. Introduction deviation 2| ML processing steps 36 ▪ 3. Data transform ▪ The attributes can be redundant Use three features: height, weight, and body mass index ? The BMI can be computed based on the weight and height It makes no sense to use BMI, weight, and height simultaneously! JH | AC | T1. Introduction 2| ML processing steps 38 ▪ Steps ▪ 4. Modelling ▪ 1. Data selection ▪ Clustering , classification, Select data from various sources prediction ▪ 2. Pre-processing ▪ 5. Evaluating and interpretation Cleaning and preparing data ▪ Assessing results, present results to ▪ 3. Data transform user in meaningful manner Map complex objects (e.g. time series) to simple features Decision ▪ Clustering ▪ Classification ▪ Prediction Data sources Data Pre Data Modelling Evaluation Selection Processing transform JH | AC | T1. Introduction Data Data Data Data Data Selection Selection Selection Selection Selection 2| ML processing steps 39 ▪4. Model ▪ Building a model to solve a specific problem, based on the available features/attributes: Linear / Non-linear Unsupervised / Supervised Classification / Regression / Prediction JH | AC | T1. Introduction 2| ML processing steps 40 ▪4. Model ▪ Building a model to solve a specific problem, based on the available features/attributes: Linear / Non-linear Unsupervised / Supervised Classification / Regression / Prediction JH | AC | T1. Introduction 2| ML processing steps 41 ▪ Linear – Non-linear Methods ▪ A set of values separated by a boundary defined by a linear or non-linear function Linear Non linear y = mx + b y = (mx + b) JH | AC | T1. Introduction 2| ML processing steps 42 ▪4. Model ▪ Building a model to solve a specific problem, based on the available features/attributes: Linear / Non-linear Unsupervised / Supervised / Reinforcement Classification / Regression / Prediction JH | AC | T1. Introduction 2| ML processing steps 43 ▪ Supervised / Unsupervised ▪ Unsupervised: The desired outputs are not known! It is only possible to organize the data (X) into "clusters" or "classes.“ ▪ Supervised: The data consists of inputs (X) and the desired outputs (T) Unsupervised Supervised ✓ Good JH | AC | T1. Introduction Bad Patients with similar Classification of customers into symptoms two classes {good, bad} 2| ML processing steps 44 ▪ Supervised / Unsupervised ▪ Unsupervised: oriented to grouping ▪ Supervised: oriented to the task ▪ Reinforcement learning: oriented to the action JH | AC | T1. Introduction https://medium.com/@ga3435/reinforcement-learning-from-human-feedback-rlhf-a911730e8732 2| ML processing steps 45 ▪4. Model ▪ Building a model to solve a specific problem, based on the available features/attributes: Linear / Non-linear Unsupervised / Supervised Clustering / classification / Regression / Prediction JH | AC | T1. Introduction 2| ML processing steps 46 ▪ Clustering ▪ The classes are not known ▪ Find objects that share common characteristics Ex: Grouping objects based on their similarities (distances) Ex. Group patients into classes of similar characteristics JH | AC | T1. Introduction 2| ML processing steps 47 ▪ Classification ▪ Identify the class to which an object belongs. ▪ The classes are à priori defined (finite number) Ex: Binary classification |{yes/no – (heart disease/healthy) Ex: Multi-class | Identifying a fruit as one of the classes {pears, apples, oranges}. JH | AC | T1. Introduction 2| ML processing steps 48 ▪ Regression ▪ Estimate a value (continuous) ▪ The estimation is performed using other variables/features at the same instant Ex: Estimating the temperature at a given instant based on other weather conditions. Ex. Determine a model/function that approximates a set of values. y=? JH | AC | T1. Introduction X=32 2| ML processing steps 49 ▪ Prediction / forecasting ▪ Predict a future value based on historical data. ▪ May involve classification or regression Ex: Estimating tomorrow's temperature based on today's and previous days’ temperature. Note: The term “prediction" is often used as a synonym for "regression." Past Future ? JH | AC | T1. Introduction present 2| ML processing steps 50 ▪ Steps ▪ 4. Modelling ▪ 1. Data selection ▪ Clustering , classification, Select data from various sources prediction ▪ 2. Pre-processing ▪ 5. Evaluating and interpretation Cleaning and preparing data ▪ Assessing results, present results to ▪ 3. Data transform user in meaningful manner Map complex objects (e.g. time series) to simple features Decision ▪ Clustering ▪ Classification ▪ Prediction Data sources Data Pre Data Modeling Evaluation Selection Processing transform JH | AC | T1. Introduction Data Data Data Data Data Selection Selection Selection Selection Selection 2| ML processing steps 51 ▪ 5. Evaluation ▪ How to measure the quality / how to validate a model? Ex: Measures of sensitivity and specificity. What is the percentage of examples that the model classifies correctly? Ex: Out of 50 fruits, the classification system correctly identifies 45. What is the error between the actual value and the model’s prediction? Ex: The model estimates that the profit as 120 Euros, while the actual JH | AC | T1. Introduction profit is 150 Euros (thus an error = 30 Euros). 2| ML processing steps 52 ▪ 5. Evaluation ▪ Classification – confusion matrix. Binary classification {0,1} TP – True Positives Those that are “1” and the classifier classified as “1” Actual values TN – True Negatives Those that are “0” and the classifier classified as “0” Estimated values FP – False Positives Those that are “0” and the classifier classified as “1” FN – False Negatives Those that are “1” and the classifier classified as “0” TP TN SP – Specificity SP = JH | AC | T1. Introduction SE = TP + FN TN + FP What percentage of the total negatives is correct SE – Sensitivity What percentage of the total positives is correct 2| ML processing steps 53 ▪ 5. Evaluation T=0 T=1 ▪ Other metrics T =0 TN FN TP SE = PR T =1 FP TP ▪ SE| sensibilidade / Recall TP + FN ▪ SP| Specificity TN SP = TN + FP SP SE ▪ PR| Precision TP Precision of positive estimations PR = TP + FP ▪ AC| Accuracy TP + TN TP + TN Global capacity of the model AC = = TN + TP + FN + FP N ▪ F1Score JH | AC | T1. Introduction Pr ecison Re call F1 = 2 Pr ecison + Re call 2| ML processing steps 54 ▪ 5. Evaluation ▪ Regression N 1 error = (T −T i) 2 i SSE| sum of squared errors N i =1 R2 | coefficient of determination SS res N R =1− SS tot = (Ti − T ) 2 2 SStot i =1 N SS res = (Ti − T i ) 2 i =1 T Actual real T JH | AC | T1. Introduction Estimated value 2| ML goals 55 ▪ Steps Pattern Recognition ▪ 4. Modelling Course ▪ 1. (2º semester) Data selection Machine Learning ▪ Clustering , classification, Select data from various sources prediction This course (1º semester) ▪ 2. Pre-processing ▪ 5. Evaluating and interpretation Preprocessing Cleaning and preparing data ▪ Assessing results, present results to Features ▪ 3. Data transform Modelling user in meaningful manner Map complex objects Dimensionality (e.g. time reduction series) to simple features Decision Pattern recognition ▪ Clustering ▪ Classification ▪ Prediction Data sources Data Pre Data Modeling Evaluation Selection Processing transform JH | AC | T1. Introduction Data Data Data Data Data Selection Selection Selection Selection Selection 2| ML goals 56 Machine Learning Model Inputs Output Data Clean data Check Relevant features Clustering Tabular data JH | AC | T1. Introduction Classification Numerical Regression values Prediction 2| ML goals 57 ▪ Tabular data Supervised Unsupersived Attribute 1 Target … Attribute j … Attribute M 1..Q Instance 1 X11 X1j X1M T1 Instance 2 X21 … X2j … X2M T2 Instance i Xi1 … Xij … XiM Ti Instance N XN1 … XNj … XNM TN JH | AC | T1. Introduction X ij Ti i=1,..N instances j=1…M attributes Contents 58 ▪ 1| Introduction ▪ 2| Machine learning processing steps ▪ 3| Example ▪ 4| Learning process ▪ 5| Hard and soft computing ▪ 6| Machine Learning evolution ▪ 7| Conclusion JH | AC | T1. Introduction ▪ Bibliography 3| Example 59 ▪ Machine learning – processing steps ▪ 1| Data selection ▪ 2| Pre-processing ▪ 3| Data transform ▪ 4| Modelling ▪ 5| Validation ▪ Example: Diagnosing diabetes JH | AC | T1. Introduction https://medium.com/edureka/what-is-data-science-a-beginners-guide-to-data-science-e684c2d6752d 3| Example 60 ▪ Diabetes Diagnosis ▪ Classification problem (binary) ▪ Target = {NO, YES} ▪ Based on a set of variables (X), is it possible to automatically diagnose the presence of diabetes ? ▪ If this case we can take adequate clinical interventions, … JH | AC | T1. Introduction 3| Example 61 ▪ 1| Data selection ▪ We assume that the data is available on the clinical patient record. ▪ Tabular data ▪ Thus, we assume the existence of a N examples, each one of them with a given set of X attributes: o x1| Age |continuous o x2| Gender |binary o x3| Blood pressure |continuous o x4| Glucose |continuous o x5| Weight |continue o x6| Pregnant |binary o... ▪ In addition, for this N examples, it is assumed that the diagnosis is know. Supervised problem JH | AC | T1. Introduction Target T = {0,1} – {negative, positive} diabetes 3| Example 62 ▪ 2|Pre-processing ▪ After the data is available, we need to clean and prepare it. There may be inconsistencies, such as missing values, incorrect values, … Tabular data JH | AC | T1. Introduction 3| Example 63 ▪ 2| Pre-processing ▪ For example, missing values should be replaced with appropriate values, and outliers should be detected and handled. JH | AC | T1. Introduction 3| Example 64 ▪ 3| Data transform ▪ At this stage, original data must be summarized to facilitate the task of classification ▪ For example, the statistical information The mean, median, The standard deviation of The correlation between the inputs and the desired output i JH | AC | T1. Introduction 3| Example 65 ▪ 4| Modelling ▪ Based on the characteristics or features, a model must be developed to classify diabetes as {0, 1}. ▪ If interpretability is a key requirement, a decision tree structure can be a good option.. JH | AC | T1. Introduction In this case, the most important parameter is the level of glucose in the root node. 3| Example 66 ▪ 5| Validation ▪ To evaluate the model's performance in predicting diabetes (classified as {0, 1}), compare its predictions to the actual outcomes, where the patient either has diabetes or does not. ▪ To quantify the performance of the model The total number of cases in which the estimation is correct ? o Accuracy = 80% o Sensitivity = 90 % o Specificity = 78 % o.... JH | AC | T1. Introduction Contents 67 ▪ 1| Introduction ▪ 2| Machine learning processing steps ▪ 3| Example ▪ 4| Learning process ▪ 5| Hard and soft computing ▪ 6| Machine Learning evolution ▪ 7| Conclusion JH | AC | T1. Introduction ▪ Bibliography 4| Learning process 68 Least mean squares Ti Optimization Process T1 Yi m,b Parameters Data training Model (structure) Features y =mx+b output Targets - T Loss function criterion JH | AC | T1. Introduction N 1 error = (T − Yi ) 2 i N i =1 4| Learning process 69 ▪ The Inductive Learning Hypothesis (Mitchell, p. 23): ▪ Any hypothesis (model) that approximates the target function well over a sufficiently large set of training examples, should also approximate the target function well for other unobserved examples. ▪ This is the basis of all machine learning algorithms. ▪ Capability of generalization: to work on unseen data ! JH | AC | T1. Introduction 4| Learning process 70 Generalization ▪ Definition: capability of a classifier to provide acceptable results during operation, that is, with data not used in the training process ▪ After the training process, a decision boundary is obtained ▪ Which classification model is better? (in terms of generalization) Which ones provides better training results? Hypothesis A or B? And in the operation phase ? Hypothesis A or B? Training Results - A Training Results - B JH | AC | T1. Introduction 4| Learning process 71 ▪ Evaluation Training ▪ Training / validation (Test) Data All Data Test ▪ Hold-out The entire dataset is generally divided into two independent sets: Training Set (e.g., 2/3) used for building the model Testing Set (e.g., 1/3) used for the testing phase ▪ The idea is to test the model and verify if it performs well on data that it has "never seen" before. JH | AC | T1. Introduction 4| Learning process 72 ▪ Evaluation ▪ Training / validation (Test) Data ▪ Cross validation (k-fold) K Partitions Repeat K times More robust than hold-out JH | AC | T1. Introduction 4| Learning process 73 ▪ Learning ▪ To infer a function/mapping/model and its parameters using a training set of examples. 1| Each example contains inputs and outputs of the function to be inferred. 2| The structure of the model is assumed (hyperparameters) 3| Assuming a model, the learning process computes a set of parameters JH | AC | T1. Introduction 4| Learning process 74 ▪ Ex. Autoregressive model ▪ Given a set of examples a learning algorithm (least means squares error) computes the coefficients/parameters of the autoregressive model. 1| Examples Training data 2| Model structure 3| Learning process N 1 ( T (i ) − y (i ) ) 2 X Y Criterion E= N i =1 Regression m = 1.10 model Parameters b = 0.82 Structure Y = mX + b JH | AC | T1. Introduction Parameters { m , b} 1. reta.m 4| Learning process 75 ▪ Ex. structure: a Neural network ▪ Hyperparameters Number of neurons, number of layers, activation functions, … ▪ Parameters Weights between neurons, … X Y ▪ Criterion (loss function) Minimization of squared errors ▪ Learning algorithm JH | AC | T1. Introduction Back propagation 4| Learning process 76 Size Fruit ▪ Data characteristics ? Color Classification Apple Form Pear ▪ Training data set {X, T} ? Data balance: Data training examples must include positive and negative cases (ideally the same percentage); o When classification fruits (pears, apples), the number of pears should be the same number as apples o With only have negative instances (pears) is not possible to develop a classifier for positive instances (apples) Representative: Data training must be representative of all possible examples that can occur. JH | AC | T1. Introduction o Fruits of all combinations of inputs {size, color, form} o This is “impossible” since future instances are not known at present. Contents 77 ▪ 1| Introduction ▪ 2| Machine learning processing steps ▪ 3| Example ▪ 4| Learning process ▪ 5| Hard and soft computing ▪ 6| Machine Learning evolution ▪ 7| Conclusion JH | AC | T1. Introduction ▪ Bibliography 5| Hard computing and soft computing 78 ▪ The term “soft computing” is often used to name some ML models, in contrast with other possible models of “hard computing”. ▪ How to develop an automatic pilot for an automobile ? Equations of mechanics Equations of fluid dynamics to obtain a mathematical model of the car-road system The computer of the automatic pilot uses these equations to control and guide the car. hard Which would be the complexity of the model ? computing Would it be possible to write exact equations for that ? JH | AC | T1. Introduction 5| Hard computing and soft computing 79 ▪ Development of an automatic pilot for an automobile ▪ Does the human pilot use such equations to drive ? ▪ How do we drive ? How did we learn ? ▪ Will it be possible to project an automatic pilot that in some way learns and behaves like a human driver ? ▪ How to compute in a similar way to the human driver ? JH | AC | T1. Introduction. 5| Hard computing and soft computing 80 ▪ Methodologies for machine learning capable of ▪ Tolerate the imprecision, the uncertainty, the partial truth, the approximation soft computing ▪ with the aim to obtain ▪ Tractability, robustness, and computational solutions with low cost ▪...The role model for soft computing is the human mind.... “A Definition of Soft Computing”, adapted from L.A. Zadeh JH | AC | T1. Introduction http://www.soft-computing.de/def.html 5| Hard computing and soft computing 81 ▪ Biological Neurons: inspiration for artificial neural network models. JH | AC | T1. Introduction https://www.researchgate.net/figure/a-Biological-neuron-b-Artificial-neuron_fig1_363833676 5| Hard computing and soft computing 82 ▪ Fuzzy brain: young theory of visual receptors Intensity of the activity of each receptor in each color (wavelength). Each receptor is maximally excited by one color, and progressively less by the adjacent colors receptor receptor. receptor JH | AC | T1. Introduction Red Orange Yellow Green Blue Violet (from Erikson, R., M.I. Chalaru, C.V. Buhusi, The Brain as a Fuzzy Machine: a Modelling Problem, in Theodorescu et al. (Editors), Fuzzy and Neuro-Fuzzy in Medicine, CRC Press, 1999 Contents 83 ▪ 1| Introduction ▪ 2| Machine learning processing steps ▪ 3| Example ▪ 4| Learning process ▪ 5| Hard and soft computing ▪ 6| Machine Learning evolution ▪ 7| Conclusion JH | AC | T1. Introduction ▪ Bibliography 6| Machine Learning evolution 84 Darmstadt JH | AC | T1. Introduction Neuron Perceptron Multilayer NN Convolutional NN Deep NN GAI https://medium.com/@lmpo/a-brief-history-of-ai-with-deep-learning-26f7948bc87b 6| Machine Learning evolution 85 ▪ Note: This is a brief “history” ! ▪ 1| Linear/non-linear models ▪ 2| Connectionist models ▪ 3| Generative AI JH | AC | T1. Introduction 6| Machine Learning evolution 86 ▪ Note: This is a brief “history” ! ▪ 1| Linear/non-linear models ▪ 2| Connectionist models ▪ 3| Generative AI JH | AC | T1. Introduction 6| Machine Learning evolution 87 ▪ 1| Linear / non-linear models ▪ Linear models (regressive) presents limitations Linear Non linear y = mx + b y = (mx + b) JH | AC | T1. Introduction 6| Machine Learning evolution 88 ▪ 1| Linear / non-linear models ▪ As result, several alternatives (non-linear models) have been proposed Bayesian models Decision trees Support vector machines Neural networks – inspired in human brain Fuzzy systems Deep models … … JH | AC | T1. Introduction … 6| Machine Learning evolution 89 ▪ Note: This is a brief “history” ! ▪ 1| Linear/non-linear models ▪ 2| Connectionist models ▪ 3| Generative AI JH | AC | T1. Introduction 6| Machine Learning evolution 90 ▪ 2| Connectionist models (basic element is the neuron) ▪ The human brain is capable of efficiently performing extremely complex tasks Pattern recognition (e.g. faces) Inductive reasoning Arithmetic calculation Learning from examples ▪ Brain: pattern recognition tasks, learning from examples JH | AC | T1. Introduction 6| Machine Learning evolution 91 ▪ 2| Connectionist models ▪ Neuron Biology: basic structure Inputs Processing elements Outputs JH | AC | T1. Introduction 6| Machine Learning evolution 92 ▪ 2| Connectionist models ▪ Biological neuron / artificial neuron dendrites synapses nucleus axon Sum + activation JH | AC | T1. Introduction inputs weights output https://www.geeksforgeeks.org/artificial-neural-networks-and-its-applications/ 6| Machine Learning evolution 93 ▪ 2| Connectionist models ▪ Artificial neural network A set of neurons Layers: input, output, hidden layers JH | AC | T1. Introduction https://www.geeksforgeeks.org/artificial-neural-networks-and-its-applications/ 6| Machine Learning evolution 94 ▪ 2| Connectionist models ▪ Artificial neural network One of the most relevant aspects associated with multilayer neural networks concerns their ability to approximate any non-linear mapping. [Cybenko, 1989] A neuronal network with sigmoidal activation functions containing at least one hidden layer can approximate, with arbitrary approximation level, any nonlinear function Rm → Rp. JH | AC | T1. Introduction https://www.geeksforgeeks.org/artificial-neural-networks-and-its-applications/ 6| Machine Learning evolution 95 ▪ 2| Connectionist models ▪ Deep models A traditional ML requires a feature extraction process (by humans) A Deep model considers feature extraction as an internal process Human JH | AC | T1. Introduction Automatic 6| Machine Learning evolution 96 ▪ 2| Connectionist models ▪ Deep models – Structure Features are automatically extracted JH | AC | T1. Introduction 6| Machine Learning evolution 97 ▪ 2| Connectionist models ▪ Feedforward neural networks Implements static mappings They are not able to deal with temporal data / time series X(k) y(k) y (k ) = f x (k ) JH | AC | T1. Introduction 6| Machine Learning evolution 98 ▪ 2| Connectionist models ▪ Recurrent neural networks Internal/external recurrence Delay ht-1 h(k-1) h(k) X(k) y(k) y (k ) = f x (k ), h(k ), h(k − 1), y (k − 1),... JH | AC | T1. Introduction Delay y(k-1) Present + Past values 6| Machine Learning evolution 99 ▪ 2| Connectionist models ▪ Recurrent neural networks Long-term dependencies? In theory, RNNs are capable of handling these “long-term dependencies”. In practice, they present some difficulties (training) Backpropagation ? Vanishing gradient !! JH | AC | T1. Introduction 6| Machine Learning evolution 100 ▪ 2| Connectionist models ▪ LSTM – Long short term memory Overcoming the “long term” difficulty Use of “gates” JH | AC | T1. Introduction 6| Machine Learning evolution 101 ▪ Note: This is a brief “history” ! ▪ 1| Linear/non-linear models ▪ 2| Connectionist models ▪ 3| Generative AI JH | AC | T1. Introduction 6| Machine Learning evolution 102 ▪ Generative AI JH | AC | T1. Introduction 6| Machine Learning evolution 103 ▪ Generative AI ▪ Using AI to create new content, such as text, images, music, audio, and videos. ▪ Creating something “new and original” based on what it has learned. ▪ For example, creating faces of fictional people that look completely real. ▪ Creating texts on a specific topic JH | AC | T1. Introduction 6| Machine Learning evolution 104 ▪ Large Language Models (LLMs) / Transformers ▪ In 2017, (Google) proposed creating a mechanism where the relationship between the various words in the text, in different positions, predicting what the next word would be ▪ A special attention were paid to how the words appear in the text, one would have a good idea of what the next words might be. ▪ It is not necessary to use recurrence to represent memory!… ▪ Transformer: This set of “numbers” that correlated the words are fed into a neural network (transformer), which calculates the probability of JH | AC | T1. Introduction a next word for the output 6| Machine Learning evolution 105 ▪ LLM - Transformers ▪ Next word ? Allows you to predict the next word based on probabilities! JH | AC | T1. Introduction 6| Machine Learning evolution 106 ▪ LLM - Transformers ▪ Neural network architecture based on an attention mechanism. ▪ The structure consist of an encoder and a decoder, each containing several layers JH | AC | T1. Introduction 6| Machine Learning evolution 107 ▪ LLM - Transformers ▪ GPT - Generative Pre-Trained Transformer ▪ GPT-3 is a neural network (transformer) designed to process and generate text in natural language based on training examples of various types of textual data. It has 175 billion parameters, allowing it to perform tasks such as text generation, automatic translation, document summarization, etc. ▪ 100 trillion parameters in ChatGPT-4... JH | AC | T1. Introduction Contents 108 ▪ 1| Introduction ▪ 2| Machine learning processing steps ▪ 3| Example ▪ 4| Learning process ▪ 5| Hard and soft computing ▪ 6| Machine Learning evolution ▪ 7| Conclusion JH | AC | T1. Introduction ▪ Bibliography 7| Conclusions 109 ▪ ML model / algorithms ▪ Data can be classified in two main types: numerical and symbolic. In this course of ML we will work with numerical data. Symbolic data will be studied in the Artificial Intelligence course. ▪ There are several algorithms (models) for ML. ▪ There is no law to select the “best one” for a given problem. Depends on experience, trial and error, art and science. ▪ Train and test a candidate model: JH | AC | T1. Introduction Generalization capability of a model Patience, resilience and enthusiasm are the keys for success in ML. 7| Conclusions 110 ▪ Main goal of this first module (1| Introduction) ▪ Introductory concepts: you should be familiar with … ▪ Data Examples/O