Linear Regression: Dependent & Independent Variables

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

In linear regression, which term describes the variable whose value is being predicted?

Independent variable
Covariate
Predictor variable
Response variable (correct)

What is the purpose of the “least squares” method in simple linear regression?

To maximize the sum of residuals.
To minimize the sum of squared residuals. (correct)
To maximize the number of predictors.
To minimize the absolute value of residuals.

What is the role of coefficients (weights) in the linear equation produced by linear regression?

To represent the error term in the prediction.
To standardize the dependent variable.
To normalize the input data.
To quantify the strength and direction of the relationship between independent and dependent variables. (correct)

Why is linear regression considered a long-established statistical procedure advantageous?

Its properties are well understood, and training can be done quickly. (A) Signup and view all the answers

In the context of linear regression, what is the design matrix?

A matrix of observations on predictor variables. (D) Signup and view all the answers

How does multiple linear regression differ from simple linear regression?

Multiple linear regression involves one dependent variable and multiple independent variables. (A) Signup and view all the answers

Which of the following equations represents a multiple linear regression model?

$y = \theta_0 + \theta_1 * x_1 + \theta_2 * x_2 + \dots + \theta_n * x_n$ (D) Signup and view all the answers

In polynomial linear regression, what transformation is applied to the independent variable?

Polynomial (A) Signup and view all the answers

Given a polynomial linear regression model $y = \theta_0 + \theta_1 * x + \theta_2 * x^2$, what does the term $\theta_2$ represent?

The coefficient of the squared term (B) Signup and view all the answers

When fitting a polynomial regression model, how does increasing the degree of the polynomial affect the model's fit to the data?

It can result in a more complex model that fits the training data better, but may overfit. (D) Signup and view all the answers

What is the primary goal of using a cost function in linear regression?

To minimize the difference between predicted and actual output values. (A) Signup and view all the answers

In the context of cost functions for linear regression, what does Mean Squared Error (MSE) measure?

The average squared difference between the predicted and actual values. (C) Signup and view all the answers

How is Mean Absolute Error (MAE) different from Mean Squared Error (MSE) as a cost function?

MAE calculates the absolute differences, while MSE squares the differences. (A) Signup and view all the answers

What is the purpose of Gradient Descent in the context of linear regression?

To minimize the cost function by iteratively updating the model parameters. (B) Signup and view all the answers

Stochastic Gradient Descent (SGD) differs from standard Gradient Descent (GD) mainly in:

SGD uses random samples for each iteration, while GD uses the entire dataset. (B) Signup and view all the answers

Which of the following is true about Mini-batch Gradient Descent?

It is a compromise between Stochastic Gradient Descent and Batch Gradient Descent. (C) Signup and view all the answers

What does the learning rate ($\alpha$) control in Gradient Descent?

The magnitude of the update to the parameters. (D) Signup and view all the answers

In Gradient Descent, what is the consequence of setting the learning rate ($\alpha$) too large?

The algorithm may fail to converge, or even diverge. (D) Signup and view all the answers

What is a common strategy for choosing an appropriate learning rate ($\alpha$) for Gradient Descent?

Incrementally test values such as 0.001, 0.01, 0.1, 1, etc. (D) Signup and view all the answers

In the context of Gradient Descent, what does it mean for J($\theta$) to decrease on every iteration for a sufficiently small learning rate?

The model is converging to an optimal solution. (D) Signup and view all the answers

Why is feature scaling important in linear regression with gradient descent?

It speeds up convergence by making the cost function easier to optimize. (D) Signup and view all the answers

What is the purpose of mean normalization in feature scaling?

To make features have approximately zero mean. (A) Signup and view all the answers

Which formula accurately reflects mean normalization?

$x_1 = \frac{size - 1000}{2000}$ (A) Signup and view all the answers

What is a limitation of 'Batch' Gradient Descent?

Each step requires calculating gradients from all training examples, which can be inefficient. (C) Signup and view all the answers

If a linear regression model underfits the training data, what could be a potential solution?

Introduce polynomial features. (D) Signup and view all the answers

What does the hypothesis function $h_\theta(x)$ represent in linear regression?

The predicted value based on input features and parameters (A) Signup and view all the answers

In multiple linear regression, why is it important to consider interaction terms (e.g., $x_1 * x_2$) between independent variables?

To account for situations where the effect of one independent variable depends on the value of another (A) Signup and view all the answers

Suppose you are using gradient descent for linear regression and notice that the cost function, J($\theta$), increases over several iterations. What is a likely cause?

The learning rate, $\alpha$, is set too high. (B) Signup and view all the answers

You have a dataset with housing prices and features like size (in square feet) and the number of bedrooms and notice that these are on very different scales. What preprocessing step should you perform?

Perform feature scaling to ensure that all features have a similar range of values (C) Signup and view all the answers

In linear regression, what does a high value of the cost function typically indicate?

The model does not fit the data well. (D) Signup and view all the answers

What kind of problems can Linear Regression be applied to?

Various areas in business and academic study (D) Signup and view all the answers

In the Multiple Linear Regression formula $h_\theta(x) = \theta_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_nx_n$, what happens if $x_0 = 1$?

For convenience of notation (A) Signup and view all the answers

What is supervised learning?

Given the "right answer” for each example in the data (A) Signup and view all the answers

Which process is best suited for a high number of examples?

Stochastic Gradient Descent (SGD) (C) Signup and view all the answers

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Is it useful if:

A combination of the other options (A) Signup and view all the answers

What is relatively easier to work with?

Linear-regression models (B) Signup and view all the answers

The cost function is very crucial in linear regression. It accounts for which element?

The difference between the predicted output of the model and the true output (C) Signup and view all the answers

In the simultaneous update: `temp0 := 00 – a * ∂/∂00J(00, 01)` `temp1 := 01 – a ∂/∂01*J(00, 01)` `00 := temp0` `01 := temp1`

What does the element `a` represent?

The learning rate (B) Signup and view all the answers

When is linear regression useful?

When trying to establish a statistical procedure, since the properties of linear-regression models are well understood and can be trained very quickly. (D) Signup and view all the answers

What is the risk of a very large learning rate (alpha)?

It may fail to converge, or even diverge. (A) Signup and view all the answers

Flashcards

Linear Regression Model

Describes the relationship between a dependent variable and independent variables.

Dependent Variable

The variable being predicted in a linear regression model.