Generalizing Linear Models(GLMs)

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Explain two ways a GLM generalizes a linear model.

Distribution of the target variable: The target variable in a GLM can now belong to the exponential family of distributions that include a number of continuous and discrete distributions. The distributions include Binomial, Poisson, Normal, Gamma, and inverse Gaussian. Relationship between the target mean and linear predictor: GLM uses a “link function” to set the target mean µ to the linear combination of predictors.

Explain why a linear model can be regarded as a special case of a GLM.

In the special case of a GLM where the target variable is normally distributed and the link function is the identity function g(µ) = μ, we are back to the linear model. In this sense, GLM is a “generalized” version of a linear model.

Describe the characteristics of a Tweedie distribution.

It is a Poisson sum of Gamma random variables. It has a discrete probability at mass zero and a probability density function on the positive real line.

Explain whether or not the log link can be used when some of the observations of the target variable are zero.

<p>Just because some of the target variables are zero does NOT necessarily invalidate the use of log link, which is not applied to the target observations but is applied to the target mean. Therefore we need to consider the target distribution: Gamma and inverse Gaussian do not permit zero values, so these need to be adjusted, but Poisson allows zero values so we can use a log-link Poisson GLM with no issues.</p> Signup and view all the answers

Explain two differences between weights and offsets when applied to GLM.

<p>The form of the target variable: For weights, the target variable should be averaged by exposure and for offsets, the observations are values aggregated over the exposure units. How exposure affects the target variable: Due to the averaging, the variance of each observation is inversely related to the size of exposure, which serves as the weight for that observation. However the weights do not appear in the model equation and will not affect the mean of the target directly. The exposure, when serving as an offset, is in direct proportion to the mean of the target and it appears in the model equation, but otherwise leaves its variance unaffected.</p> Signup and view all the answers

State the statistical method typically used to estimate the parameters of GLM.

<p>Maximum Likelihood Estimation (MLE) is used to choose the parameter estimates in such a way to maximize the likelihood of observing the given data, which is typically achieved by running an optimization algorithm.</p> Signup and view all the answers

Explain the problem with deviance as a model selection criterion.

<p>The deviance of GLM parallels the RSS of a linear model in the same sense that it is merely a goodness-of-fit measure on the training set and always decreases when new predictors are added.</p> Signup and view all the answers

Explain the limitations of the likelihood ratio test as a model selection method.

<p>It can only be used to compare one pair of GLMs at a time. The simpler GLM must be a special case, or nested within, the more complex GLM.</p> Signup and view all the answers

Explain how regularization for GLM works.

<p>For GLMs, the regularized model results from minimizing the penalized objective function which now has deviance instead of RSS.</p> Signup and view all the answers

Explain the importance of setting a cutoff for a binary classifier.

<p>The binary classifier merely produces a prediction of the probability that the event of interest occurs or not. To translate the probabilities into the predicted classes, i.e the event is predicted to happen or not, we need a pre-specified cutoff.</p> Signup and view all the answers

Explain the relationship between accuracy, sensitivity, and specificity.

<p>Accuracy is the weighted average of sensitivity and specificity, where the weights are the proportions of observations belonging to the two classes.</p> Signup and view all the answers

Explain how the cutoff of a binary classifier affects the sensitivity and specificity.

<p>In general, the selection of cutoff involves a trade-off between having high sensitivity and having high specificity. Extreme case 1: If cutoff is 0, all probabilities exceed cutoff and everyone is predicted to be positive. As a result sensitivity is 1 and specificity is 0. Extreme case 2: If cutoff is 1, all probabilities are less than cutoff and everyone is predicted to be negative. As a result sensitivity is 0 and specificity is 1.</p> Signup and view all the answers

Explain the problem with unbalanced data.

<p>The problem with unbalanced data is that a classifier implicitly places more weight on the majority class and tries to match the training observations in that class, without paying enough attention to the minority class. This can be problematic if the minority class is the positive class, in other words class of interest.</p> Signup and view all the answers

Explain how undersampling and oversampling work to make unbalanced data more balanced.

<p>Undersampling works by drawing fewer observations from the negative class and retaining all of the positive observations. The drawback is that the classifier, now based on less data and less info about negative class, could be less robust and more prone to overfitting. Oversampling keeps all the original data, but oversamples (with replacement) the positive class.</p> Signup and view all the answers

Explain why oversampling must be performed after splitting the full data into training and test data.

<p>Oversampling must be performed after the training/test split or else some of the positive class observations may appear in both splits. This would mean the test set will not be truly unseen to the trained classifier, defeating the purpose of the training/test split.</p> Signup and view all the answers

Explain one reason for using oversampling over undersampling, and one reason for using undersampling over oversampling.

<p>Oversampling is preferred when we want to retain all the information about the negative class, and undersampling is preferred to ease the computational burden and reduce run time when the training data is excessively large.</p> Signup and view all the answers

Flashcards

How GLMs generalize linear models

GLMs generalize linear models by allowing the target variable to follow distributions from the exponential family and using a link function to relate the target mean to a linear combination of predictors.

Linear model as a GLM case

A linear model is a special case of a GLM when the target variable is normally distributed and the link function is the identity function.

Tweedie distribution characteristics

A Tweedie distribution is a Poisson sum of Gamma random variables and has a discrete probability at mass zero and a probability density function on the positive real line.

Using log link with zero values

The log link can be used even if target has zero observations, as it's applied to the target mean, but ensure the target distribution allows for zero values (e.g., Poisson).

Signup and view all the flashcards

Weights vs. Offsets in GLM

Weights: target averaged by exposure; exposure affects observation variance. Offsets: exposure directly proportional to the target mean in the model equation.

Signup and view all the flashcards

Parameter estimation

MLE chooses parameter estimates to maximize the likelihood of observing the given data, often achieved via an optimization algorithm.

Signup and view all the flashcards

Problem with deviance

The deviance is a goodness-of-fit measure on the training set and always decreases when new predictors are added, similar to RSS in linear models.

Signup and view all the flashcards

Limitations of likelihood ratio test

The likelihood ratio test can only compare one pair of GLMs at a time, and the simpler GLM must be a special case or nested within the more complex GLM.

Signup and view all the flashcards

Regularization for GLMs

Regularization for GLMs minimizes a penalized objective function with deviance instead of RSS.

Signup and view all the flashcards

Importance of a cutoff

A cutoff translates probabilities into predicted classes for binary classifiers.

Signup and view all the flashcards

Accuracy, sensitivity, specificity links

Accuracy is the weighted average of sensitivity and specificity, with weights being the proportions of observations in each class.

Signup and view all the flashcards

Cutoff effect on binary classifier

Lowering cutoff increases sensitivity & decreases specificity. Raising cutoff decreases sensitivity & increases specificity.

Signup and view all the flashcards

Problem with imbalanced data

Unbalanced data causes the classifier to favor the majority class, potentially missing the importance of the minority class.

Signup and view all the flashcards

Undersampling/Oversampling in short.

Undersampling reduces majority class, possibly overfitting. Oversampling increases minority class, retaining all original data.

Signup and view all the flashcards

When should you oversample?

Oversampling must be done after the split to avoid testing on 'seen' data.

Signup and view all the flashcards

Oversampling vs. Undersampling: Reasons

Oversampling retains all data; undersampling eases computational burden.

Signup and view all the flashcards

Study Notes

Generalizing Linear Models with GLMs

  • GLMs broaden linear models by allowing target variables to follow exponential family distributions, including continuous and discrete types like Binomial, Poisson, Normal, Gamma, and inverse Gaussian
  • GLMs use a "link function" to relate the target mean to a linear combination of predictors

GLMs as Generalizations of Linear Models

  • A GLM simplifies to a linear model when the target variable is normally distributed and uses an identity link function where g(µ) = µ

Characteristics of Tweedie Distribution

  • Tweedie distributions are Poisson sums of Gamma random variables
  • They exhibit a discrete probability at mass zero with a probability density function on the positive real line
  • Zero values in some target variables do not automatically invalidate the use of a log link
  • Log link is applied to the target mean, not directly to the observations
  • Distributions like Gamma and inverse Gaussian require adjustments to handle zero values, but Poisson accommodates them

Weights vs. Offsets in GLMs

  • For weights, the target variable should be averaged by exposure
  • For offsets, the observations are values aggregated over the exposure units
  • Averaging causes each observation's variance to be inversely related to the size of exposure, which acts as the weight
  • Weights do not appear directly in the model equation and do not affect the target mean, while offsets are directly proportional to the target mean and appear in the equation, leaving variance unaffected

Parameter Estimation Method for GLMs

  • Maximum Likelihood Estimation (MLE) estimates parameters to maximize the likelihood of observed data
  • This is typically achieved via optimization algorithms

Deviance as a Model Selection Criterion

  • Deviance in GLMs is similar to Residual Sum of Squares (RSS) in linear models
  • It measures the goodness-of-fit on the training set
  • Deviance decreases with added predictors

Limitations of Likelihood Ratio Test

  • Likelihood ratio tests can only compare one pair of GLMs at a time
  • The simpler GLM must be a special case of the more complex one

Regularization in GLMs

  • Regularization in GLMs minimizes a penalized objective function using deviance instead of RSS

Importance of Setting a Cutoff for Binary Classifiers

  • Binary classifiers predict the probability of an event
  • A pre-specified cutoff is necessary to translate probabilities into predicted classes

Accuracy, Sensitivity, and Specificity

  • Accuracy is a weighted average of sensitivity and specificity
  • Weights are determined by the proportions of observations in each class

Cutoff Effect on Sensitivity and Specificity

  • The cutoff selection involves a trade-off between sensitivity and specificity
  • Setting a cutoff of 0 predicts all outcomes as positive, resulting in sensitivity of 1 and specificity of 0
  • Setting a cutoff of 1 predicts all outcomes as negative, resulting in sensitivity of 0 and specificity of 1

Problem of Unbalanced Data

  • Classifiers can implicitly overweight the majority class, focusing on training observations from that class and neglecting the minority class
  • This is an issue if the minority class is the class of interest

Balancing Unbalanced Data with Undersampling and Oversampling

  • Undersampling reduces observations from the negative class while keeping all positive observations, but the classifier might be less robust and prone to overfitting due to reduced data
  • Oversampling retains all original data while oversampling the positive class with replacement

Timing of Oversampling

  • Oversampling must occur after splitting data into training/test sets
  • If done beforehand, the test set may not be truly unseen because observations may appear in both sets

Reasons for Choosing Oversampling or Undersampling

  • Oversampling is preferable for retaining full information about the negative class
  • Undersampling is preferable for easing computational load and reducing runtime when training data is very large

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Generalized Anxiety Disorder Overview
8 questions
Generalized Linear Models Quiz
5 questions
Generalized Linear Models Overview
5 questions
Generalized Linear Models Overview
24 questions
Use Quizgecko on...
Browser
Browser