Generalizing Linear Models(GLMs)

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Explain two ways a GLM generalizes a linear model.

Distribution of the target variable: The target variable in a GLM can now belong to the exponential family of distributions that include a number of continuous and discrete distributions. The distributions include Binomial, Poisson, Normal, Gamma, and inverse Gaussian. Relationship between the target mean and linear predictor: GLM uses a “link function” to set the target mean µ to the linear combination of predictors.

Explain why a linear model can be regarded as a special case of a GLM.

In the special case of a GLM where the target variable is normally distributed and the link function is the identity function g(µ) = μ, we are back to the linear model. In this sense, GLM is a “generalized” version of a linear model.

Describe the characteristics of a Tweedie distribution.

It is a Poisson sum of Gamma random variables. It has a discrete probability at mass zero and a probability density function on the positive real line.

Explain whether or not the log link can be used when some of the observations of the target variable are zero.

Just because some of the target variables are zero does NOT necessarily invalidate the use of log link, which is not applied to the target observations but is applied to the target mean. Therefore we need to consider the target distribution: Gamma and inverse Gaussian do not permit zero values, so these need to be adjusted, but Poisson allows zero values so we can use a log-link Poisson GLM with no issues. Signup and view all the answers

Explain two differences between weights and offsets when applied to GLM.

The form of the target variable: For weights, the target variable should be averaged by exposure and for offsets, the observations are values aggregated over the exposure units. How exposure affects the target variable: Due to the averaging, the variance of each observation is inversely related to the size of exposure, which serves as the weight for that observation. However the weights do not appear in the model equation and will not affect the mean of the target directly. The exposure, when serving as an offset, is in direct proportion to the mean of the target and it appears in the model equation, but otherwise leaves its variance unaffected. Signup and view all the answers

State the statistical method typically used to estimate the parameters of GLM.

Maximum Likelihood Estimation (MLE) is used to choose the parameter estimates in such a way to maximize the likelihood of observing the given data, which is typically achieved by running an optimization algorithm. Signup and view all the answers

Explain the problem with deviance as a model selection criterion.

The deviance of GLM parallels the RSS of a linear model in the same sense that it is merely a goodness-of-fit measure on the training set and always decreases when new predictors are added. Signup and view all the answers

Explain the limitations of the likelihood ratio test as a model selection method.

It can only be used to compare one pair of GLMs at a time. The simpler GLM must be a special case, or nested within, the more complex GLM. Signup and view all the answers

Explain how regularization for GLM works.

For GLMs, the regularized model results from minimizing the penalized objective function which now has deviance instead of RSS. Signup and view all the answers

Explain the importance of setting a cutoff for a binary classifier.

The binary classifier merely produces a prediction of the probability that the event of interest occurs or not. To translate the probabilities into the predicted classes, i.e the event is predicted to happen or not, we need a pre-specified cutoff. Signup and view all the answers

Explain the relationship between accuracy, sensitivity, and specificity.

Accuracy is the weighted average of sensitivity and specificity, where the weights are the proportions of observations belonging to the two classes. Signup and view all the answers

Explain how the cutoff of a binary classifier affects the sensitivity and specificity.

In general, the selection of cutoff involves a trade-off between having high sensitivity and having high specificity. Extreme case 1: If cutoff is 0, all probabilities exceed cutoff and everyone is predicted to be positive. As a result sensitivity is 1 and specificity is 0. Extreme case 2: If cutoff is 1, all probabilities are less than cutoff and everyone is predicted to be negative. As a result sensitivity is 0 and specificity is 1. Signup and view all the answers

Explain the problem with unbalanced data.

The problem with unbalanced data is that a classifier implicitly places more weight on the majority class and tries to match the training observations in that class, without paying enough attention to the minority class. This can be problematic if the minority class is the positive class, in other words class of interest. Signup and view all the answers

Explain how undersampling and oversampling work to make unbalanced data more balanced.

Undersampling works by drawing fewer observations from the negative class and retaining all of the positive observations. The drawback is that the classifier, now based on less data and less info about negative class, could be less robust and more prone to overfitting. Oversampling keeps all the original data, but oversamples (with replacement) the positive class. Signup and view all the answers

Explain why oversampling must be performed after splitting the full data into training and test data.

Oversampling must be performed after the training/test split or else some of the positive class observations may appear in both splits. This would mean the test set will not be truly unseen to the trained classifier, defeating the purpose of the training/test split. Signup and view all the answers

Explain one reason for using oversampling over undersampling, and one reason for using undersampling over oversampling.

Oversampling is preferred when we want to retain all the information about the negative class, and undersampling is preferred to ease the computational burden and reduce run time when the training data is excessively large. Signup and view all the answers

Flashcards

How GLMs generalize linear models

GLMs generalize linear models by allowing the target variable to follow distributions from the exponential family and using a link function to relate the target mean to a linear combination of predictors.

Linear model as a GLM case

A linear model is a special case of a GLM when the target variable is normally distributed and the link function is the identity function.

Tweedie distribution characteristics

A Tweedie distribution is a Poisson sum of Gamma random variables and has a discrete probability at mass zero and a probability density function on the positive real line.

Using log link with zero values

The log link can be used even if target has zero observations, as it's applied to the target mean, but ensure the target distribution allows for zero values (e.g., Poisson).