Podcast
Questions and Answers
Inflexible models are often more interpretable than flexible models.
Inflexible models are often more interpretable than flexible models.
True (A)
What is the most commonly used measure for the quality of fit?
What is the most commonly used measure for the quality of fit?
mean squared error
The ideal learning method achieves low _____ and low bias.
The ideal learning method achieves low _____ and low bias.
variance
What is the primary purpose of cross-validation?
What is the primary purpose of cross-validation?
What does the Bayes error rate represent?
What does the Bayes error rate represent?
Match the following regression types with their characteristics:
Match the following regression types with their characteristics:
The expected test error rate is minimized by the _____ classifier.
The expected test error rate is minimized by the _____ classifier.
The R-squared statistic lies between 0 and 1.
The R-squared statistic lies between 0 and 1.
Define collinearity.
Define collinearity.
What does the Akaike Information Criterion (AIC) assess?
What does the Akaike Information Criterion (AIC) assess?
What is the main advantage of decision trees?
What is the main advantage of decision trees?
The _____ method combines predictions from multiple weak learners to create a strong prediction.
The _____ method combines predictions from multiple weak learners to create a strong prediction.
What does PCA stand for?
What does PCA stand for?
Which of the following is a component of the K-means clustering algorithm?
Which of the following is a component of the K-means clustering algorithm?
Flashcards
Prediction accuracy vs. model interpretability
Prediction accuracy vs. model interpretability
Balance between a model's ability to accurately predict and how easily its logic can be understood.
Mean Squared Error (MSE)
Mean Squared Error (MSE)
Commonly used metric to assess how well a statistical model describes the actual data. Calculated by averaging the squared differences between predicted and actual values.
Training MSE
Training MSE
The MSE calculated from the data used to train a model. Minimizing this can lead to overfitting.
Test MSE
Test MSE
Signup and view all the flashcards
Cross-validation
Cross-validation
Signup and view all the flashcards
U-shape of test MSE
U-shape of test MSE
Signup and view all the flashcards
Expected Test MSE Decomposition
Expected Test MSE Decomposition
Signup and view all the flashcards
Irreducible Error
Irreducible Error
Signup and view all the flashcards
Bias
Bias
Signup and view all the flashcards
Variance
Variance
Signup and view all the flashcards
Test Error Rate
Test Error Rate
Signup and view all the flashcards
Bayes Classifier
Bayes Classifier
Signup and view all the flashcards
Bayes Decision Boundary
Bayes Decision Boundary
Signup and view all the flashcards
Bayes Error Rate
Bayes Error Rate
Signup and view all the flashcards
K-Nearest Neighbors (KNN) Classifier
K-Nearest Neighbors (KNN) Classifier
Signup and view all the flashcards
Choice of K in KNN
Choice of K in KNN
Signup and view all the flashcards
Simple Linear Regression
Simple Linear Regression
Signup and view all the flashcards
Residual Sum of Squares (RSS)
Residual Sum of Squares (RSS)
Signup and view all the flashcards
Residual Standard Error (RSE)
Residual Standard Error (RSE)
Signup and view all the flashcards
Wald Confidence Intervals
Wald Confidence Intervals
Signup and view all the flashcards
Hypothesis Testing in Regression
Hypothesis Testing in Regression
Signup and view all the flashcards
R² Statistic
R² Statistic
Signup and view all the flashcards
Multiple Linear Regression
Multiple Linear Regression
Signup and view all the flashcards
Ordinary Least Squares (OLS) Estimator
Ordinary Least Squares (OLS) Estimator
Signup and view all the flashcards
Hypothesis Test for Multiple Predictors
Hypothesis Test for Multiple Predictors
Signup and view all the flashcards
Variable Selection
Variable Selection
Signup and view all the flashcards
Forward Selection
Forward Selection
Signup and view all the flashcards
Backward Selection
Backward Selection
Signup and view all the flashcards
Mixed Selection
Mixed Selection
Signup and view all the flashcards
Interpreting R² in Multiple Regression
Interpreting R² in Multiple Regression
Signup and view all the flashcards
Multiple Regression RSE
Multiple Regression RSE
Signup and view all the flashcards
Qualitative Predictors (Factors)
Qualitative Predictors (Factors)
Signup and view all the flashcards
Interaction Term
Interaction Term
Signup and view all the flashcards
Residual Plots
Residual Plots
Signup and view all the flashcards
Heteroscedasticity
Heteroscedasticity
Signup and view all the flashcards
Outlier
Outlier
Signup and view all the flashcards
Studentized Residuals
Studentized Residuals
Signup and view all the flashcards
Study Notes
- Prediction accuracy and model interpretability often involve a trade-off
- Inflexible models are usually more interpretable than flexible ones
- Mean squared error (MSE) measures the quality of fit
MSE Formula
- MSE = (1/n) * Σ(yi - f(xi))^2, where the sum is from i=1 to n
- f represents a fit model
Training vs. Test MSE
- Training MSE is minimized to find f
- Test MSE is minimized by the ideal f
- Cross-validation estimates test MSE
- Test MSE typically follows a U-shape, decreasing with flexibility before increasing again
Expected Test MSE Decomposition
- E(yo - f(xo))^2 = var(f(xo)) + [Bias(f(xo))]^2 + var(ϵ)
- Ideal learning methods balance low variance and low bias
- Irreducible error (var(ϵ)) serves as a lower bound for expected test MSE
- Bias arises from simplifying complex relationships with simple models
- Greater model flexibility generally reduces bias, but increases variance
Variance
- Variance measures the change in f with different training datasets
- More flexible statistical methods tend to have higher variance
Classification Setting
- Training MSE is replaced by training error rate: (1/n) * Σ1{yi ≠ŷi}, where the sum is from i=1 to n
Ideal Classifier
- Minimizes test error rate
- Achieved by the Bayes classifier, which assigns observations to the class j with the largest P(Y = j | X = x0)
- Example: In a two-class scenario, assigns to class 1 if P(Y = 1 | X = x0) > 0.5
- Bayes decision boundary is where the conditional probability equals 0.5
- Bayes error rate is the lowest possible test error rate
Bayes Error Rate
- 1 - E(max P(Y = j | X)), where the max is over j
KNN Classifier
- K-nearest neighbors classifier (KNN) estimates conditional distribution for classification
- P(Y = j | X = x0) = (1/K) * Σ1{yi = j}, where the sum is over i ∈ N0
- N0 represents K points closest to x0 in the training data
- KNN classifies x0 based on the largest estimated conditional probability
- Higher K values lead to smoother classifiers with lower bias and higher variance and vice versa
Simple Linear Regression
- Simple linear regression model: Y = β0 + β1X + ϵ
- βi estimates are found by minimizing residual sum of squares (RSS)
RSS Formula
- RSS = Σe_i^2 = Σ(yi - ŷi)^2, where the sum is from i=1 to n
Minimizers
- β̂1 = Σ(xi - x̄)(yi - ȳ) / Σ(xi - x̄)^2, where the sum is from i=1 to n
- β̂0 = ȳ - β̂1x̄
- ȳ = (1/n) * Σyi where the sum is from i=1 to n
- x̄ = (1/n) * Σxi where the sum is from i=1 to n
se(β̂0 )2 Formula
- se(β̂0)^2 = σ^2 * [(1/n) + (x̄^2 / Σ(xi - x̄)^2)], where the sum is from i=1 to n and σ^2 = var(ϵ)
se(β̂1 )2 Formula
- se(β̂1)^2 = σ^2 / Σ(xi - x̄)^2, where the sum is from i=1 to n and σ^2 = var(ϵ)
- Formulas assume uncorrelated errors with equal variance σ^2
- se(β̂1) smaller when xi values are more spread out
σ2 Estimate
- Typically unknown, but can be estimated by residual standard error (RSE)
RSE Formula
- RSE = sqrt(RSS / (n-2))
Wald Confidence Intervals
- Utilise Standard errors
Hypothesis Tests
- Used on the coefficients
Null Hypothesis Test
- H0: β1 = 0, H1: β1 ≠0
- Assesses relationship between X and Y
t-statistic Formula
- t = β̂1 / SE(β̂1)
- Follows a tn-2 distribution
Model Fit Quantification
- RSE and R2 statistic
R2 Formula
- R2 = (TSS - RSS) / TSS = 1 - (RSS / TSS)
- States the proportion of variance explained
- TSS = Σ(yi - ȳ)^2, the total sum of squares, from i=1 to n
- R2 always lies between 0 and 1
- R2 = r2 in simple linear regression, where r is sample correlation between X and Y
Multiple Linear Regression Overview
- Model form: Y = β0 + Σ(βk * Xk) + ϵ, where sum is from k=1 to p
- Xj represents the jth predictor, and βj quantifies the association
- Each βj is the average effect on Y of a unit increase in Xj, holding other predictors constant
Prediction Formula
- Given estimators β̂0, β̂1,..., β̂p, the predictions take the form: ŷ = β̂0 + Σ(β̂k * xk), where sum is from k=1 to p
Multiple Linear Regression Minimization
- Minimizes the sum of squared residuals
- RSS = Σ(yi - ŷi)^2 = Σ(yi - β̂0 - Σ(β̂k * xik))^2
OLS Estimator Formula
- Given the full rank design matrix X, the OLS estimator is given by the equation: β̂OLS = (XTX)-1XTY
Hypothesis Test Overview
- Answers if there is a relationship between response and predictors
- Null hypothesis: H0: β1 = β2 = · · · = βp = 0
Alternative Hypothesis
- At least one βk for k > 0 is not zero
F-statistic Formula
- F = (TSS - RSS)/p / RSS/(n - p - 1)
- TSS = Σ(yi - ȳ)^2, from i=1 to n
Model Assumptions
- If model assumptions are correct, then E[RSS/(n - p - 1)] = σ^2
- If H0 holds, then E[(TSS - RSS)/p] = σ^2
F-Statistic Relationship With Response
- If there is no relationship between the response and predictors, we expect the F-statistic to be close to 1
- If the alternative hypothesis holds, then E[(TSS – RSS)/p] > σ2 so we expect F to be greater than 1
Hypothesis Test on Subset of Coefficients
- Sometimes we would like to instead test that a particular subset of q of the coefficients are zero. That is, H0: βp−q+1 = βp−q+2 = · · · = βp = 0
- Let RSS0 denote the residual sum of squares that use all variables except the last q. Then we take the F -statistic
F-Statistic Formula (Subset)
- F= (RSS0 − RSS)/q / RSS/(n − p − 1)
- This F -statistic reports the partial effect of adding these q variables. In all of these cases, for a nested model with p1 < p2 , we compare with the F distribution with (p2 − p1 , n − p2 − 1) degrees of freedom
Variable Selection
- Determining which predictors are associated with the response
- In order to fit a single model involving only those predictors. Various statistics can be used to jduge the quality of a model including Mallow’s Cp , Akaike information criterion (AIC), Bayesian information criterion (BIC), and adjusted R
Efficiency Approach
- Since there are a total of 2p models that contain subsets of p variables, it is infeasible to try every subset. More efficient approaches must be taken
Three Classical Approaches
- FORWARD SELECTION: Begin with the null model then fit p simple linear regressions and add to the null model the variable that results in the lowest RSS. We then add to that model the variable that results in the lowest RSS for the new two-variable model and continue until some stopping rule is satisfied
- BACKWARD SELECTION: We start with all variables in the model and remove the variable with the largest p-value. The new (p − 1)-variable model is fit and the variable with the largest p-value is removed and we continue this procedure until a stopping rule is reached
- MIXED SELECTION: This is a combination of forward and backward selection. Start with no variables, then add the variables that provide the best fit one-by-one. If at any point the p-value for one of the variables in the model rises above a threshold, we remove that variable from the model and we continue to perform forward and backward steps until all variables in the model have a sufficiently low p-value.
Forward Selection
- Forward selection is a greedy approach and can include variables that later become redundant
Multiple Linear Regression
- Forward selection cannot be used if p > n and forward selection can always be used
- In multiple linear regression, R2 = cor(Y, YÌ‚ )2. An R2 value close to 1 indicates that the model explains a large portion of the variance in the response variable
R2 in Machine Learning
- the R2 statistic will always increase when more variables are added to the model, even if those variables are only weakly associated with the response since adding another variable always results in a decrease in the residual sum of squares in the training data
Residual Sum of Squares Formula
- For multiple linear regression the RSE is defined as RSE= sqrt (RSS/ (n-p-1))
- Models with more variables can have higher RSE if the decrease in RSS is small relative to the increase in p
Trade Off in Machine Learning
- The inaccuracy in the coefficient estimates is related to the reducible error and one can compute a confidence interval
Error in Machine Learning
- Even if the true values for β0 , β1 , · · · , βp are known, the response cannot be predicted perfectly..
- The error due to ϵ is called the irreducible error. Prediction errors are always wider than confidence intervals since they incorporate both the error in the estimate of the population and the randomness of the individual point
Predictors in Machine Learning
- Sometimes qualitative predictors known as factors are used
- If a factor has two levels, we can create a dummy variable or an indicator..
- When a qualitative predictor has more than two levels, we can create additional dummy variables
Assumptions of Linear Regression
- The standard linear regression model has several highly restrictive assumptions that are often violated..
- It assumes that the relationship between the predictors and response are additive and linear..
- One wayof relaxing the additive assumption is by introducing an interaction term..
- The predictor X1 X2 is said to be an interaction term constructed by computing the product of X1 and X2
List Of Common Problems With Machine Learning
- Non-linearity of the response-predictor relationships..
- Correlation of error terms..
- Non-constant variance of error terms..
- Outliers
- High-leverage points
- Collinearity
- If the true relationship is far from linear, then all conclusions drawn from the fit is suspect
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.