38 Questions
What is the primary goal of gradient descent?
To minimize the cost function
What is the role of the learning rate α in gradient descent?
It controls how much the parameters are adjusted with respect to the gradient
What is the gradient in the context of gradient descent?
The vector of partial derivatives of the cost function with respect to each parameter
What is the purpose of taking the derivative of the cost function in gradient descent?
To move downward toward the pits or valleys in the graph
What is the name of the function that is minimized in machine learning?
Cost function
What is the name of the variables in the model that are adjusted to minimize the cost function?
Parameters
What is the correct way to update the parameters in gradient descent?
Update all parameters simultaneously
What is the name of the algorithm that is used to find the minimum of the cost function?
Gradient descent algorithm
What is the primary goal of Linear Regression?
To model the relationship between a dependent variable and one or more independent variables
What is the term for the simplest form of Linear Regression?
Simple Linear Regression
What does the term 'm' represent in Linear Regression?
The number of training examples
What happens to the value of θj when the slope is positive?
It decreases
What is the term for the 'input' variable in Linear Regression?
Feature variable
What is the hypothesis in Linear Regression?
A function that predicts the target variable
Why do we not need to decrease the learning rate α over time?
Because gradient descent will automatically take smaller steps as we approach a local minimum
What is the relationship between the dependent variable and independent variables in Linear Regression?
Linear
What is the intuition behind the convergence of gradient descent with a fixed step size α?
The derivative of the cost function approaches 0
How can we find the best learning rate?
By trying several values and plotting the learning curve
What happens to the cost function after each iteration if gradient descent is working optimally?
It decreases
When does gradient descent converge?
When the gradient descent fails to reduce the cost function
Why is it difficult to estimate the number of iterations required for gradient descent to converge?
Because the number of iterations varies considerably
What is hyperparameter tuning?
The process of trying several values of learning rate and plotting the learning curve
What is the condition for declaring convergence in an automatic convergence test?
Decreases by less than in one iteration
What type of gradient descent uses all the training examples in each step?
Batch Gradient Descent
What is the value of J(0,0) in the given example?
4.8
What is the update rule for 0 in the given example?
0 = 0 - /5 * (-yi)
What is the value of 1 after the first iteration in the given example?
1
What is the cost function J(0.28, 1) in the given example?
0.3952
What is the purpose of stochastic gradient descent?
To reduce computation time
What is the type of regression used when there are multiple features?
Multiple Linear Regression
What is the purpose of defining x0 as 1 in the notation for multivariate linear regression?
To represent the intercept term in the regression equation
What is the role of the variables x1, x2, x3, and x4 in the multivariate linear regression example?
They are the input features of the training examples
What is the relationship between the number of features and the number of inputs in the multivariate linear regression example?
The number of features is equal to the number of inputs plus one
What is the purpose of the cost function in multivariate linear regression?
To minimize the difference between predicted and actual values
What is the benefit of using multiple features in the housing price prediction example?
It increases the accuracy of the predictions
What is the purpose of the training examples in the multivariate linear regression example?
To estimate the model parameters
What is the relationship between the input features and the output values in the multivariate linear regression example?
The input features have a linear relationship with the output values
What is the purpose of gradient descent in the context of multivariate linear regression?
To minimize the cost function
Study Notes
Linear Regression
- Linear regression is a supervised learning technique used to model the relationship between a dependent variable (target) and one or more independent variables (features).
- The goal is to predict the value of the dependent variable based on the values of the independent variables.
Simple Linear Regression
- Simple linear regression models the relationship between two variables (1 feature and the target) by fitting a linear equation to observed data.
- Notations:
m
= number of training examples,x
's = "input" variable / feature,y
's = "output" variable / "target" variable,(x, y)
= one training example,(X(i), y(i))
=i
th training example.
Hypothesis
- Parameters:
θ
's are the variables in the model that are adjusted to minimize the cost function. - How to choose
θ
's? It is a crucial part of training models.
Gradient Descent
- Objective (Cost) Function: the function that you want to minimize, typically the loss function, which measures the difference between the model's predictions and the actual values.
- Parameters: the variables in the model that are adjusted to minimize the cost function.
- Gradient: the vector of partial derivatives of the cost function with respect to each parameter.
- Learning Rate
α
: a hyperparameter that controls how much the parameters are adjusted with respect to the gradient during each update. - Gradient descent works by moving downward toward the pits or valleys in the graph to find the minimum value.
- It seeks to reach the minimum of the cost function and find the best-fit values for the parameters by adjusting the parameters in the direction of the steepest descent.
Gradient Descent Algorithm
- Simultaneous update: update
θ0
andθ1
simultaneously. - Correct:
θj = θj - α * (slope)
Convergence of Gradient Descent
- Gradient descent can converge to a local minimum, even with the learning rate
α
fixed. - As we approach a local minimum, gradient descent will automatically take smaller steps.
- No need to decrease
α
over time.
How to Find the Best Learning Rates
- There is no formula to find the right learning rate.
- Try several values of learning rate and for each value plot the number of iterations versus the cost function.
- This is called hyperparameter tuning.
The Number of Iterations
- The cost function will decrease after each iteration if the gradient descent is working optimally.
- Gradient descent converges when it fails to reduce the cost function and stays at the same level.
- The number of iterations required for gradient descent to converge varies considerably.
Making Sure Gradient Descent is Working Correctly
- Example automatic convergence test: declare convergence if the cost function decreases by less than a certain value in one iteration.
- No. of iterations: 43
Gradient Descent Types
- 1-Batch Gradient Descent: each step of gradient descent uses all the training examples (
m
). - 2-Stochastic Gradient Descent (SGD): calculate the gradient using just a random small part of the observations instead of all of them.
Linear Regression with Multiple Variables
- Multiple features (variables) example: housing price prediction.
- Cost function for multivariate linear regression:
J(θ0, θ1, ..., θn) = 1/2m * Σ(y - (θ0 + θ1*x1 + ... + θn\*xn))^2
.
Gradient Descent for Multiple Variables
- Gradient descent algorithm for multiple variables: update each parameter
θj
simultaneously.
Linear regression is a supervised learning technique used to model the relationship between a dependent variable and one or more independent variables. The goal is to predict the value of the dependent variable based on the values of the independent variables.
Make Your Own Quizzes and Flashcards
Convert your notes into interactive study material.
Get started for free