Lecture 2_ Loss Functions, Evaluation, and Linear Regression (4).pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
CS-ELEC1A: Advanced Intelligent Systems Loss Functions, Evaluation, and Linear Regression What was discussed in the previous meeting? What was discussed in the previous meeting? What is Artificial Intelligence? What was discussed in the previous meeting? Wha...
CS-ELEC1A: Advanced Intelligent Systems Loss Functions, Evaluation, and Linear Regression What was discussed in the previous meeting? What was discussed in the previous meeting? What is Artificial Intelligence? What was discussed in the previous meeting? What is Artificial Intelligence? Artificial Intelligence is the study and creation of machines that perform tasks normally associated with intelligence. People from varying backgrounds have their own reasons for interests in AI. What was discussed in the previous meeting? What is Artificial Intelligence? Artificial Intelligence is the study and creation of machines that perform tasks normally associated with intelligence. People from varying backgrounds have their own reasons for interests in AI. Why is Artificial Intelligence relevant? What was discussed in the previous meeting? What is Artificial Intelligence? Artificial Intelligence is the study and creation of machines that perform tasks normally associated with intelligence. People from varying backgrounds have their own reasons for interests in AI. Why is Artificial Intelligence relevant? Traditionally human capabilities can be undertaken in software inexpensively and at scale. AI can be applied to every sector to enable new possibilities and efficiencies. What was discussed in the previous meeting? What is Artificial Intelligence? Artificial Intelligence is the study and creation of machines that perform tasks normally associated with intelligence. People from varying backgrounds have their own reasons for interests in AI. Why is Artificial Intelligence relevant? Traditionally human capabilities can be undertaken in software inexpensively and at scale. AI can be applied to every sector to enable new possibilities and efficiencies. What is Machine Learning? What was discussed in the previous meeting? What is Artificial Intelligence? Artificial Intelligence is the study and creation of machines that perform tasks normally associated with intelligence. People from varying backgrounds have their own reasons for interests in AI. Why is Artificial Intelligence relevant? Traditionally human capabilities can be undertaken in software inexpensively and at scale. AI can be applied to every sector to enable new possibilities and efficiencies. What is Machine Learning? Machine learning is a branch of Artificial Intelligence which focuses on the use of data and algorithms to imitate the way that humans learn. What was discussed in the previous meeting? What is Artificial Intelligence? Artificial Intelligence is the study and creation of machines that perform tasks normally associated with intelligence. People from varying backgrounds have their own reasons for interests in AI. Why is Artificial Intelligence relevant? Traditionally human capabilities can be undertaken in software inexpensively and at scale. AI can be applied to every sector to enable new possibilities and efficiencies. What is Machine Learning? Machine learning is a branch of Artificial Intelligence which focuses on the use of data and algorithms to imitate the way that humans learn. What goes in the Machine Learning Workflow? What was discussed in the previous meeting? What is Artificial Intelligence? Artificial Intelligence is the study and creation of machines that perform tasks normally associated with intelligence. People from varying backgrounds have their own reasons for interests in AI. Why is Artificial Intelligence relevant? Traditionally human capabilities can be undertaken in software inexpensively and at scale. AI can be applied to every sector to enable new possibilities and efficiencies. What is Machine Learning? Machine learning is a branch of Artificial Intelligence which focuses on the use of data and algorithms to imitate the way that humans learn. What goes in the Machine Learning Workflow? Preprocessing the data. Creating models specific for the task. Evaluating if the model has performed with respect to expectations Recap on the Machine Learning Setup What are the elements required for machine learning? Recap on the Machine Learning Setup What are the elements required for machine learning? Output Space Recap on the Machine Learning Setup What are the elements required for machine learning? Target variable that is desired to be estimated. These are all Output Space possible outputs that can be generated by the model based on the inputs. Recap on the Machine Learning Setup What are the elements required for machine learning? Target variable that is desired to be estimated. These are all Output Space possible outputs that can be generated by the model based on the inputs. Hypothesis Recap on the Machine Learning Setup What are the elements required for machine learning? Target variable that is desired to be estimated. These are all Output Space possible outputs that can be generated by the model based on the inputs. Some speculative relationship between the input space and the Hypothesis output space. It is expressed as a collection of parameters characterizing the behavior of the model. Recap on the Machine Learning Setup What are the elements required for machine learning? Target variable that is desired to be estimated. These are all Output Space possible outputs that can be generated by the model based on the inputs. Some speculative relationship between the input space and the Hypothesis output space. It is expressed as a collection of parameters characterizing the behavior of the model. Input Space Recap on the Machine Learning Setup What are the elements required for machine learning? Target variable that is desired to be estimated. These are all Output Space possible outputs that can be generated by the model based on the inputs. Some speculative relationship between the input space and the Hypothesis output space. It is expressed as a collection of parameters characterizing the behavior of the model. This is the input data. These can be either called as variables, Input Space features, and attributes. The input space comprises all potential sets of values for input. Recap on the Machine Learning Setup What are the elements required for machine learning? Target variable that is desired to be estimated. These are all Output Space possible outputs that can be generated by the model based on the inputs. Some speculative relationship between the input space and the Hypothesis output space. It is expressed as a collection of parameters characterizing the behavior of the model. This is the input data. These can be either called as variables, Input Space features, and attributes. The input space comprises all potential sets of values for input. Waze Scenario Waze Scenario Scenario: Suppose you are travelling from your home to UST to attend your morning classes. Your class is at 9:00 AM and Waze has estimated that you will arrive at 8:30AM. However, due to heavy traffic, you arrived at 8:50AM Waze Scenario Scenario: Suppose you are travelling from your home to UST to attend your morning classes. Your class is at 9:00 AM and Waze has estimated that you will arrive at 8:30AM. However, due to heavy traffic, you arrived at 8:50AM Waze Scenario Scenario: Suppose you are travelling from your home to UST to attend your morning classes. Your class is at 9:00 AM and Waze has estimated that you will arrive at 8:30AM. However, due to heavy traffic, you arrived at 8:50AM Questions: Waze Scenario Scenario: Suppose you are travelling from your home to UST to attend your morning classes. Your class is at 9:00 AM and Waze has estimated that you will arrive at 8:30AM. However, due to heavy traffic, you arrived at 8:50AM Questions: What are were the inputs to Waze? Waze Scenario Scenario: Suppose you are travelling from your home to UST to attend your morning classes. Your class is at 9:00 AM and Waze has estimated that you will arrive at 8:30AM. However, due to heavy traffic, you arrived at 8:50AM Questions: What are were the inputs to Waze? What did Waze estimate? Waze Scenario Scenario: Suppose you are travelling from your home to UST to attend your morning classes. Your class is at 9:00 AM and Waze has estimated that you will arrive at 8:30AM. However, due to heavy traffic, you arrived at 8:50AM Questions: What are were the inputs to Waze? What did Waze estimate? How far was Waze’s estimate from the actual time of arrival? Loss Functions What are Loss Functions? Loss Functions What are Loss It is the function that computes the distance between the Functions? current output of the algorithm and the expected output. Loss Functions What are Loss It is the function that computes the distance between the Functions? current output of the algorithm and the expected output. More formally… Loss Functions What are Loss It is the function that computes the distance between the Functions? current output of the algorithm and the expected output. More formally… In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function. Loss Functions Why do we need this? Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. If you recall from last lecture… Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. If you recall from last lecture… Modelling: Mathematically speaking, a model is a description of a system using mathematical concepts and languages. It is a mathematical representation of objects and their relationships Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. If you recall from last lecture… Modelling: Mathematically speaking, a model is a description of a system using mathematical concepts and languages. It is a mathematical representation of objects and their relationships Target Output Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. If you recall from last lecture… Modelling: Mathematically speaking, a model is a description of a system using mathematical concepts and languages. It is a mathematical representation of objects and their relationships Target Output Function Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. If you recall from last lecture… Modelling: Mathematically speaking, a model is a description of a system using mathematical concepts and languages. It is a mathematical representation of objects and their relationships Target Output Function Inputs Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model predicts if the picture contains a cat or a dog. Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model predicts if the picture contains a cat or a dog. Input Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model predicts if the picture contains a cat or a dog. Input Model Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model predicts if the picture contains a cat or a dog. Cat Input Model Output Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model predicts if the picture contains a cat or a dog. Cat Dog Input Model Output True Value Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model predicts if the picture contains a cat or a dog. Cat Dog Input Model Output True Value Simple Loss Function Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model Waze predicts the estimated time of arrival Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model Waze predicts the estimated time of arrival Input Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model Waze predicts the estimated time of arrival Input Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model Waze predicts the estimated time of arrival Input Model Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model Waze predicts the estimated time of arrival 8:30 Input Model Output Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model Waze predicts the estimated time of arrival 8:30 8:50 Input Model Output True Value Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Given the a model output, how to know if it gave the right output? Suppose a model Waze predicts the estimated time of arrival 8:30 8:50 Input Model Output True Value Simple Loss Function Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. What are mostly used loss functions? Regression Problems Classification Problems Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. What are mostly used loss functions? Regression Problems Classification Problems Mean Squared Error Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. What are mostly used loss functions? Regression Problems Classification Problems Mean Squared Error Root Mean Squared Error Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. What are mostly used loss functions? Regression Problems Classification Problems Mean Squared Error Root Mean Squared Error Sum of Squared Error Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. What are mostly used loss functions? Regression Problems Classification Problems Mean Squared Error Root Mean Squared Error Binary Cross Entropy Loss Sum of Squared Error Loss Functions Why do we It quantifies the difference between predicted and actual need this? values in a machine learning model. What are mostly used loss functions? Regression Problems Classification Problems Mean Squared Error Root Mean Squared Error Binary Cross Entropy Loss Sum of Squared Error Cross Entropy Loss Waze Scenario Scenario: Suppose you are travelling from your home to UST to attend your morning classes. Your class is at 9:00 AM and Waze has estimated that you will arrive at 8:30AM. However, due to heavy traffic, you arrived at 8:50AM Questions: What are were the inputs to Waze? What did Waze estimate? How far was Waze’s estimate from the actual time of arrival? Waze Scenario Scenario: Suppose you are travelling from your home to UST to attend your morning classes. Your class is at 9:00 AM and Waze has estimated that you will arrive at 8:30AM. However, due to heavy traffic, you arrived at 8:50AM Questions: What are were the inputs to Waze? What did Waze estimate? How far was Waze’s estimate from the actual time of arrival? What factors do you think Waze considered for estimating? Recap on the Machine Learning Setup What are the elements required for machine learning? Target variable that is desired to be estimated. These are all Output Space possible outputs that can be generated by the model based on the inputs. Some speculative relationship between the input space and the Hypothesis output space. It is expressed as a collection of parameters characterizing the behavior of the model. This is the input data. These can be either called as variables, Input Space features, and attributes. The input space comprises all potential sets of values for input. How can we… How can we… What is Regression? What is Regression? What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables What do the problems earlier have in common? What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables What do the problems earlier have in common? The targets or predictions are continuous variables. (e.g. house prices, stock prices, etc.). We can call them What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables What do the problems earlier have in common? The targets or predictions are continuous variables. (e.g. house prices, stock prices, etc.). We can call them What do we need to predict these outputs? What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables What do the problems earlier have in common? The targets or predictions are continuous variables. (e.g. house prices, stock prices, etc.). We can call them What do we need to predict these outputs? Features: These are the inputs. We can call them (or if vector) What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables What do the problems earlier have in common? The targets or predictions are continuous variables. (e.g. house prices, stock prices, etc.). We can call them What do we need to predict these outputs? Features: These are the inputs. We can call them (or if vector) Training Samples: Many samples of for which is known What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables What do the problems earlier have in common? The targets or predictions are continuous variables. (e.g. house prices, stock prices, etc.). We can call them What do we need to predict these outputs? Features: These are the inputs. We can call them (or if vector) Training Samples: Many samples of for which is known Model: Function that models relationship between and What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables What do the problems earlier have in common? The targets or predictions are continuous variables. (e.g. house prices, stock prices, etc.). We can call them What do we need to predict these outputs? Features: These are the inputs. We can call them (or if vector) Training Samples: Many samples of for which is known Model: Function that models relationship between and Loss: Tells how well the model approximates the target, given the training examples What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables What do the problems earlier have in common? The targets or predictions are continuous variables. (e.g. house prices, stock prices, etc.). We can call them What do we need to predict these outputs? Features: These are the inputs. We can call them (or if vector) Training Samples: Many samples of for which is known Model: Function that models relationship between and Loss: Tells how well the model approximates the target, given the training examples Optimization: A way of finding the parameters of our model that minimizes the loss function What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… Circles are data points (i.e. training examples) that are provided What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… Circles are data points (i.e. training examples) that are provided The data points are uniform in but may be displaced in with some noise Recall What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… Circles are data points (i.e. training examples) that are provided The data points are uniform in but may be displaced in with some noise Recall The function f is the model that the algorithm wants to estimate. What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… Circles are data points (i.e. training examples) that are provided The data points are uniform in but may be displaced in with some noise Recall There is a certain error or noise term here that doesn’t allow us to perfectly estimate the target outputs What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… Circles are data points (i.e. training examples) that are provided The data points are uniform in but may be displaced in with some noise There is a certain error or noise term here that doesn’t allow us to perfectly estimate the target outputs What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… Circles are data points (i.e. training examples) that are provided The data points are uniform in but may be displaced in with some noise Recall The target given that there is also error involved since the algorithm isn’t perfect What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… Circles are data points (i.e. training examples) that are provided The data points are uniform in but may be displaced in with some noise The green is the true curve that we don’t know What is Regression? What is Regression is a statistical technique that relates a Regression? dependent variable to one or more independent variables Here’s a Simple 1-D Regression… Circles are data points (i.e. training examples) that are provided The data points are uniform in but may be displaced in with some noise The green is the true curve that we don’t know Goal: We want to fit a curve to the data points Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Features Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Features Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Features Training Data First possible feature: Per Capita Crime Rate Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Features Training Data First possible feature: Per Capita Crime Rate Model Do you think this is a good feature? Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to represent the data? Features Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to represent the data? Features Data is described as pairs Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to represent the data? Features Data is described as pairs Training Data is the input feature (per capita crime rate) Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to represent the data? Features Data is described as pairs Training Data is the input feature (per capita crime rate) is the target output (median house price) Model simply indicates the training examples (we have N examples) Loss How does it look like for our example? The Median House Price is Each dot in the plot is one data point Optimization The Per Capita Crime Rate is Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to represent the data? Features Data is described as pairs Training Data is the input feature (per capita crime rate) is the target output (median house price) Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to represent the data? Features Data is described as pairs Training Data is the input feature (per capita crime rate) is the target output (median house price) Model simply indicates the training examples (we have N examples) Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to represent the data? Features Data is described as pairs Training Data is the input feature (per capita crime rate) is the target output (median house price) Model simply indicates the training examples (we have N examples) Loss How does it look like for our example? Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to represent the data? Features Data is described as pairs Training Data is the input feature (per capita crime rate) is the target output (median house price) Model simply indicates the training examples (we have N examples) Loss How does it look like for our example? The Median House Price is Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to represent the data? Features Data is described as pairs Training Data is the input feature (per capita crime rate) is the target output (median house price) Model simply indicates the training examples (we have N examples) Loss How does it look like for our example? The Median House Price is Optimization The Per Capita Crime Rate is Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What model do we use? Features Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What model do we use? Features We have 1 feature / variable. Is there any model that you can think of? Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What model do we use? Features We have 1 feature / variable. Is there any model that you can think of? Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What model do we use? Features We have 1 feature / variable. Is there any model that you can think of? Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What model do we use? Features We have 1 feature / variable. Is there any model that you can think of? Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What model do we use? Features We have 1 feature / variable. Is there any model that you can think of? Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What model do we use? Features We have 1 feature / variable. Is there any model that you can think of? Training Data Model Loss In practice, at this point, the data should have been split already to training and testing sets. The model Optimization should map the input to Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What about noise? Features Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What about noise? Features A simple model typically does not exactly fit the data – lack of fit can be considered noise Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What about noise? Features A simple model typically does not exactly fit the data – lack of fit can be considered noise Training Data What are the sources of noise? Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What about noise? Features A simple model typically does not exactly fit the data – lack of fit can be considered noise Training Data What are the sources of noise? Model Imprecision in data Attributes (input noise) Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What about noise? Features A simple model typically does not exactly fit the data – lack of fit can be considered noise Training Data What are the sources of noise? Model Imprecision in data Attributes (input noise) Errors in data targets (mislabelling) Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What about noise? Features A simple model typically does not exactly fit the data – lack of fit can be considered noise Training Data What are the sources of noise? Model Imprecision in data Attributes (input noise) Errors in data targets (mislabelling) Loss Additional attributes not taken into account by data attributes, affect Optimization target values (latent variables) Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics What about noise? Features A simple model typically does not exactly fit the data – lack of fit can be considered noise Training Data What are the sources of noise? Model Imprecision in data Attributes (input noise) Errors in data targets (mislabelling) Loss Additional attributes not taken into account by data attributes, affect Optimization target values (latent variables) Model may be too simple to account for data targets Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Model Loss Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Model Loss Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Model Loss Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Using Sum of Squared Error Loss Model Loss Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Using Sum of Squared Error Loss The loss function measures the squared error between true labels Model Loss Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Using Sum of Squared Error Loss The loss function measures the squared error between true labels Model Loss Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Using Sum of Squared Error Loss The loss function measures the squared error between true labels Model Loss This is the model that predicts the output given feature x Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Using Sum of Squared Error Loss The loss function measures the squared error between true labels Model Loss This is the true labels of the example Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Using Sum of Squared Error Loss The loss function measures the squared error between true labels Model Loss Why? Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Using Sum of Squared Error Loss The loss function measures the squared error between true labels Model Loss Why do we need to square the difference? Optimization Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Using Sum of Squared Error Loss The loss function measures the squared error between true labels Model Loss For a particular hypothesis (y(x) defined by a choice of w, drawn in Optimization red), what does the loss represent geometrically? Identifying the Loss Function Objective: Estimate median house price in a neighborhood based on neighborhood statistics What loss do we use? Features Training Data Using Sum of Squared Error Loss The loss function measures the squared error between true labels Model Loss For a particular hypothesis (y(x) defined by a choice of w, drawn in Optimization red), what does the loss represent geometrically? The loss for the red hypothesis is the sum of the squared vertical errors Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features Training Data Model Loss Optimization Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features Training Data Model Loss Optimization Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Model Loss Optimization Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Model Loss Optimization Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Loss Optimization Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Sample Update: Repeatedly update based on the gradient Loss Optimization Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Sample Update: Repeatedly update based on the gradient Loss is the learning rate Optimization Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Sample Update: Repeatedly update based on the gradient Loss is the learning rate Optimization Partial derivative of the loss Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Sample Update: Repeatedly update based on the gradient Loss is the learning rate Optimization Partial derivative of the loss with respect to weights Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Sample Update: Repeatedly update based on the gradient Loss is the learning rate Optimization Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Sample Update: Repeatedly update based on the gradient Loss is the learning rate Optimization For a single case, this gives the least mean squares update rule: Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Sample Update: Repeatedly update based on the gradient Loss is the learning rate Optimization For a single case, this gives the least mean squares update rule: Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Sample Update: Repeatedly update based on the gradient Loss is the learning rate After solving the equation, it Optimization For a single case, this gives the least mean squares update rule: equates to the following Optimizing the Objective Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to figure out the weights? Features We need to find weights such that it minimizes the loss Training Data Using Gradient Descent for one example Sample: Initialize (e.g. random initialization) Model Sample Update: Repeatedly update based on the gradient Loss is the learning rate Optimization For a single case, this gives the least mean squares update rule: Note: As the error approaches 0, so does the update ( stops changing) Optimizing Across the Dataset Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to optimize across the entire dataset? Features Training Data Model Loss Optimization Optimizing Across the Dataset Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to optimize across the entire dataset? Features Two ways to generalize this for all examples in the training set Training Data Model Loss Optimization Optimizing Across the Dataset Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to optimize across the entire dataset? Features Two ways to generalize this for all examples in the training set Training Data 1. Batch updates: Sum of average updates across every example , then try to change parameter values Model Loss Optimization Optimizing Across the Dataset Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to optimize across the entire dataset? Features Two ways to generalize this for all examples in the training set Training Data 1. Batch updates: Sum of average updates across every example , then try to change parameter values Model Loss Optimization Optimizing Across the Dataset Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to optimize across the entire dataset? Features Two ways to generalize this for all examples in the training set Training Data 1. Batch updates: Sum of average updates across every example , then try to change parameter values Model Loss 2. Stochastic or Online Updates: Update the parameters for each training case in turn, according to its own gradients Optimization Optimizing Across the Dataset Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to optimize across the entire dataset? Features Two ways to generalize this for all examples in the training set Training Data 1. Batch updates: Sum of average updates across every example , then try to change parameter values Model Loss 2. Stochastic or Online Updates: Update the parameters for each training case in turn, according to its own gradients Optimization Optimizing Across the Dataset Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to optimize across the entire dataset? Features Two ways to generalize this for all examples in the training set Training Data 1. Batch updates: Sum of average updates across every example , then try to change parameter values Model Loss 2. Stochastic or Online Updates: Update the parameters for each training case in turn, according to its own gradients Optimization Optimizing Across the Dataset Objective: Estimate median house price in a neighborhood based on neighborhood statistics How to optimize across the entire dataset? Features Two ways to generalize this for all examples in the training set Training Data 1. Batch updates: Sum of average updates across every example , then try to change parameter values Model Loss 2. Stochastic or Online Updates: Update the parameters for each training case in turn, according to its own gradients Optimization The underlying assumption is that each sample is independent and identically distributed Visualized Optimization Process Objective: Estimate median house price in a neighborhood based on neighborhood statistics Features Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Is there a way to improve the model? Features Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Is there a way to improve the model? Features One method of extending the model is to consider other input dimensions Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Is there a way to improve the model? Features One method of extending the model is to consider other input dimensions Training Data Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Is there a way to improve the model? Features One method of extending the model is to consider other input dimensions Training Data x here is a vector now Model Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Is there a way to improve the model? Features One method of extending the model is to consider other input dimensions Training Data Model In the Boston house pricing example, number of rooms can also be explored as a feature Loss Optimization Working Example: Boston Housing Data Objective: Estimate median house price in a neighborhood based on neighborhood statistics Is there a way to improve the model? Features One method of extending the model is to consider other input dimensions Training Data Model In the Boston house pricing example, number of rooms can also be explored as a feature Loss Optimization Working with Multidimensional Inputs Objective: Estimate median house price in a neighborhood based on neighborhood statistics How do we represent multiple features / multidimensional inputs? Features Training Data Model Loss Optimization Working with Multidimensional Inputs Objective: Estimate median house price in a neighborhood based on neighborhood statistics How do we represent multiple features / multidimensional inputs? Features Each house is a data point , with observations indexed by : Training Data Model Loss Optimization Working with Multidimensional Inputs Objective: Estimate median house price in a neighborhood based on neighborhood statistics How do we represent multiple features / multidimensional inputs? Features Each house is a data point , with observations indexed by : Training Data Model We can incorporate the bias into , by using then, Loss Optimization Working with Multidimensional Inputs Objective: Estimate median house price in a neighborhood based on neighborhood statistics How do we represent multiple features / multidimensional inputs? Features Each house is a data point , with observations indexed by : Training Data Model We can incorporate the bias into , by using then, Loss Optimization Basically the same but accounting for more features in the input data Working with Multidimensional Inputs Objective: Estimate median house price in a neighborhood based on neighborhood statistics How do we represent multiple features / multidimensional inputs? Features Each house is a data point , with observations indexed by : Training Data Model We can incorporate the bias into , by using then, Loss Optimization We can then solve for. How? Working with Multidimensional Inputs Objective: Estimate median house price in a neighborhood based on neighborhood statistics How do we represent multiple features / multidimensional inputs? Features Each house is a data point , with observations indexed by : Training Data Model We can incorporate the bias into , by using then, Loss Optimization We can then solve for. How? We can use gradient descent to solve for each coefficient, or compute analytically. Increasing Complexity Objective: Estimate median house price in a neighborhood based on neighborhood statistics What if the linear model is not good? Can we create a more complicated model? Features Training Data Model Loss Optimization Increasing Complexity Objective: Estimate median house price in a neighborhood based on neighborhood statistics What if the linear model is not good? Can we create a more complicated model? Features We can create a more complicated model by defining input variables that are combinations of components of Training Data Model Loss Optimization Increasing Complexity Objective: Estimate median house price in a neighborhood based on neighborhood statistics What if the linear model is not good? Can we create a more complicated model? Features We can create a more complicated model by defining input variables that are combinations of components of Training Data An -th order polynomial function one dimensional feature : Model Loss Optimization Increasing Complexity Objective: Estimate median house price in a neighborhood based on neighborhood statistics What if the linear model is not good? Can we create a more complicated model? Features We can create a more complicated model by defining input variables that are combinations of components of Training Data An -th order polynomial function one dimensional feature : Model where is the -th power of Loss Optimization Increasing Complexity Objective: Estimate median house price in a neighborhood based on neighborhood statistics What if the linear model is not good? Can we create a more complicated model? Features We can create a more complicated model by defining input variables that are combinations of components of Training Data An -th order polynomial function one dimensional feature : Model where is the -th power of Loss Optimization We can use the same approach to optimize for the weights Increasing Complexity Which fit is the best? Features Training Data Model Loss Optimization Generalization What is generalization? Features Training Data Model Loss Optimization Generalization What is generalization? Generalization is the model’s ability to predict the held out data Features Training Data Model Loss Optimization Generalization What is generalization? Generalization is the model’s ability to predict the held out data Features What is happening? Training Data Model Loss Optimization Generalization What is generalization? Generalization is the model’s ability to predict the held out data Features What is happening? Our model with = 9 overfits the data (it models also noise) Training Data Model Loss Optimization Generalization What is generalization? Generalization is the model’s ability to predict the held out data Features What is happening? Our model with = 9 overfits the data (it models also noise) Training Data Not a problem if we have lots of training examples Model Loss Optimization Generalization What is generalization? Generalization is the model’s ability to predict the held out data Features What is happening? Our model with = 9 overfits the data (it models also noise) Training Data Not a problem if we have lots of training examples Let’s look at the estimated weights for various in the case of fewer examples Model Loss Optimization Generalization What is generalization? Generalization is the model’s ability to predict the held out data Features What is happening? Our model with = 9 overfits the data (it models also noise) Training Data Not a problem if we have lots of training examples The weights are becoming huge to compensate for the noise Model Loss Optimization Generalization What is generalization? Generalization is the model’s ability to predict the held out data Features What is happening? Our model with = 9 overfits the data (it models also noise) Training Data Not a problem if we have lots of training examples The weights are becoming huge to compensate for the noise Model One way of dealing with this is to encourage the weights to be small (this way no input dimension will have too much influence on prediction). This is called regularization. Loss Optimization Regularization How to regularize? Features Training Data Model Loss Optimization Regularization How to regularize? Goal: Select the appropriate model complexity automatically Features Training Data Model Loss Optimization Regularization How to regularize? Goal: Select the appropriate model complexity automatically Features Standard approach: Regularization Training Data Model Loss Optimization Regularization How to regularize? Goal: Select the appropriate model complexity automatically Features Standard approach: Regularization Training Data Model Penalty term Loss Optimization Regularization How to regularize? Goal: Select the appropriate model complexity automatically Features Standard approach: Regularization Training Data Model Intuition: Since we are minimizing the loss, the second term will encourage smaller values in Loss Optimization Regularization How to regularize? Goal: Select the appropriate model complexity automatically Features Standard approach: Regularization Training Data Model Intuition: Since we are minimizing the loss, the second term will encourage smaller values in Loss The penalty on the squared weights is known as ridge regression in statistics Optimization Regularization How to regularize? Goal: Select the appropriate model complexity automatically Features Standard approach: Regularization Training Data Model Intuition: Since we are minimizing the loss, the second term will encourage smaller values in Loss The penalty on the squared weights is known as ridge regression in statistics Leads to a modified update rule for gradient descent: Optimization Regularization Choose carefully Features Training Data Model Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Model Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Model Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Training Data Model Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Training Data Model such that Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Training Data Model such that Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Training Data Model such that Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Training Data Model such that Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Training Data Model such that Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Training Data Model such that Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Sample Data: Training Data Neighborhood #1 Model such that Neighborhood #2 Loss Neighborhood #3 Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Training Data Model Multivariate Linear Regression Loss Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Training Data Model Multivariate Linear Regression Loss Sum of Squares Error Optimization In Summary What have we learned for today? Features: Per Capita Crime Rate and Average Number of Rooms Features Training Data Structure Optimization using Gradient Descent Training Data Model Multivariate Linear Regression Loss Sum of Squares Error Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Training Data Model Loss Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Model Loss Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Loss Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Loss Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Neighborhood #1 Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Neighborhood #1 Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Neighborhood #1 Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Neighborhood #1 Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Neighborhood #1 Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Neighborhood #2 Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Neighborhood #2 Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Neighborhood #2 Optimization In Summary What have we learned for today? Multivariate Linear Regression Features Gradient Descent Training Data Step #1: Initialize Weights Model Step #2: Update Weights Loss Neighborhood #2 Optimization In Summary What have we learned for today? We have learned… Features What are Loss Functions and how Linear Regression is done Training Data We have identified the necessary components for linear regression Model Features, Training Data, Model, Loss, and Optimization We discussed concepts about… Loss Noie Generalization, Regularization, etc. Optimization