Logistic and Poisson Models

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Why might a researcher choose a logistic regression over a linear regression model? Explain your answer.

Logistic regression is used when the outcome variable is binary or categorical, while linear regression is appropriate for continuous outcomes. In human subjects research, we often work with categorical outcomes, meaning a linear regression is not appropriate.

How does the interpretation of a coefficient β in a logistic regression with a continuous predictor X differ from its interpretation in a linear regression model?

In logistic regression, e to the power of β represents the change in the odds of the outcome for every one-unit increase in X, while in linear regression, indicates the change in the outcome Y.

What key assumption regarding the relationship between the mean and variance must be checked when considering a Poisson regression, and what alternative model is suggested if this assumption is violated?

Poisson regression assumes that the mean and variance of the outcome are equal. If the outcome is over dispersed, the variance is much greater than the mean, and a negative binomial model may be appropriate.

In the context of a study using Zou's modified Poisson regression, why might a researcher choose this method over traditional logistic regression when modeling a binary outcome?

<p>Zou's modified Poisson regression estimates relative risks (RRs) rather than odds ratios (ORs), which are easier to interpret. It estimates RRs well even when the outcome is not rare, in contrast to logistic regression.</p> Signup and view all the answers

Briefly explain the function and importance of a conceptual model. Provide an example in your explanation of one way they are often used by researchers.

<p>A conceptual model is a visual tool that represents research questions/hypotheses for relationships between measured variables. Conceptual models are useful to visually represent research questions and specify expected relationships between variables.</p> Signup and view all the answers

Flashcards

What is a Link Function?

A transformation made to the outcome variable to ensure a linear relationship with the predictors.

What is Logistic Regression?

A regression model where the outcome variable is binary.

What is the logit function?

A function that transforms probabilities to a log-odds scale, used in logistic regression.

What is the Odds Ratio?

A measure of the relative odds of an event occurring given exposure, compared to non-exposure.

Signup and view all the flashcards

What is Poisson Regression?

A regression model used to model count data (i.e. number of events).

Signup and view all the flashcards

Study Notes

CHS 729 Week 8: Logistic and Poisson Models

  • The session covers generalized linear models, specifically logistic and Poisson regression.
  • The material includes aligning research questions with justified methods, using conceptual models/path diagrams, and method planning.
  • Final projects and midcourse evaluation debriefs fill out the agenda.

Statistical Models

  • These models relate an outcome (Y) to variables (X1, X2,...).
  • The outcome (Y) can also be termed the response or dependent variable
  • X-variables are referred to as predictors, explanatory variables, or independent variables.
  • Models take the form Y = f(X) + ϵ; f() is a function applied to predictors, and ϵ is the model's error.

Simple Linear Regression

  • Regresses outcome Y on one predictor X
  • Linear regression attempts to fit a linear function f() where Y = f(X).
  • The linear function is Y = mX + b, where m = slope, and b = intercept
  • Slope (m) represents how much Y changes for each unit change in X.

Limitations of Linear Regression

  • Linear regression can be unsuitable if the assumptions are not met.
  • Categorical outcomes render linear regression inappropriate.
  • Continuous outcomes, such as days of heroin use, can violate model assumptions via distribution issues.
  • A need exists for regression tools adaptable to situations where the linear regression is insufficient.

Linear Model Components

  • Linear regression is expressed by Y = 𝛽0 + 𝛽1X1 + 𝛽2X2 + … + 𝛽nXn
  • Additional components often remain hidden:
  • Random Component: Assumed error distribution of outcome variable Y.
  • Link Function: A transformation of outcome Y to ensure a linear relationship with predictors (X1, X2,...).

Linear Regression Specifics

  • In linear regression:
  • Random component is captured by the continuous, normally distributed outcome (constant variance of error terms).
  • The link function is known as the identity link, with no transformation, and Y can be used as is.

Generalized Linear Model (GLM)

  • This model take the form link(Y) = 𝛽0 + 𝛽1X1 + 𝛽2X2 + … + 𝛽nXn.
  • The choice of link(Y) is to transform Y such that:
  • Link(Y) follows a specific distribution.
  • A linear relationship exists between each predictor X and the outcome link(Y).
  • Understanding various regression types is vital.

Logistic Regression Scenario

  • Binary outcome (Yes/No) can be an example
  • Setting Yes = 1 and No = 0 allows plotting outcome Y against predictor X.
  • Linear regression may not fit the data's scale, and no linear relationship may be found .
  • Using the line, values over ~650 would imply that Y is negative.
  • However, this is incorrect as Y is binary.

Standard Logistic Function

  • It is defined as f(x) = 1 / (1 + e^-x)
  • Values range between 0 and 1.
  • A mathematical function fitting the model to return values when our outcomes can only be 0 or 1 is desired.

Altering Outcomes

  • The function still has values between 0 and 1.
  • When thinking of 1 = success, it is possible to frame the function as a probability of success, p(Y = 1).
  • A logistic function above is the probability of passing an exam.
  • The data points are either 0 (fail) or 1 (pass).
  • According to the model, studying for 3 hours means there is about a 62.5% chance of passing.

Logistic Function Definition

  • Defining a logistic function yields p(Y = 1) = 1 / (1 + e^(𝛽0 + 𝛽1X1 + 𝛽2X2 + …)
  • The important point is not having a linear relationship between the outcome p(Y) and predictors X1, X2,...
  • The transformation of outcomes is with the logit function to solve this.
  • The function represents the inverse of: p(Y = 1) = 1 / (1 + e^(𝛽0 + 𝛽1X1 + 𝛽2X2 + …))
  • After rearrangement to get: logit(p) = ln(p / (1-p)) = 𝛽0 + 𝛽1X1 + 𝛽2X2 + ...
  • The logit function is a link, transforming our outcome for a linear relationship between outcome (log odds) and predictors.

Log Odds Ratios

  • The outcome is either 0 or 1
  • The value p(Y=1) / p(Y=0) equals the odds ratio
  • p(Y=1) / (1-p(Y=1)) is the same as p(Y=1) / p(Y=0).
  • p(Y=1) / p(Y=0) is thus the probability ratio of our outcome, where Success = 1 and Failure = 0.

Odds

  • They represents the ratio of something happening versus something not happening.
  • An example: 10 chess game wins for you, and 8 for Bob translates to 10:8 odds for winning the next game
  • Odds and probability are related.
  • Example: 10/18 games were won with a probability of ≈ 0.5556.
  • 8/18 games were lost, probability ≈ 0.4444.
  • Taking a ratio yields the odds: 10/18 / 8/18 = 10/8.

Log-Odds

  • These represent the odds of success (assuming Y = 1).
  • ln(p(Y=1) / p(Y=0)) is the log of the odds.
  • Generating a function of the form ln(p(Y = 1) / (1 - p(Y = 1))) = 𝛽0 + 𝛽1X1 + 𝛽2X2 + … happens when fitting a logistic regression.

Logistic Regression Models

  • Can be fit when dealing with dichotomous outcomes in generalized linear models
  • A logit link function is used to fit a model: logit(Y) = ln(p(Y = 1) / (1 - p(Y = 1))) = 𝛽0 + 𝛽1X1 + 𝛽2X2 + ...
  • This establishes a relationship between each X variable and the log-odds of outcome Y.
  • With the model, each 𝛽 needs to be interpreted.

Interpreting Beta

  • A simple logistic regression looks at the link between you smoking cigarettes and having heart disease
  • Resulting functions have to form ln(p / (1-p)) = 0.3 + 0.5X
  • When categorical, the referent group is defined for what the other groups are compared to.
  • No smoking is selected, as it is the non-exposure group.

Interpreting Beta for Categorical Variable X

  • ln(p / (1-p)) = 0.3 + 0.5Xsmoking; smoking, like the example shown, increases log-odds of heart disease by 0.5.
  • However, log-odds may not be easily interpreted by everyone.
  • The impact of odds can be found by exponentiating the 𝛽 coefficient: e𝛽 = e^0.5 ≈ 1.65.
  • People who do smoke have a 1.65 greater odds of heart disease than those who do not smoke.

Interpreting Beta for Continuous Variable X

  • ln(p / (1-p)) = 0.3 + 0.5Xage; X is continuous so let age in years be the example
  • e𝛽 = e^0.5 ≈ 1.65 which is exponentiating our 𝛽 coefficient
  • A 1.65 greater odds of the outcome would occur for every one unit increase.

Interpreting Beta for Multiple Predictors.

  • ln(p / (1-p)) = 0.3 + 0.5Xage + 1.2Xgender, linear regression model coefficients are deemed as additive
  • Values can be multiplicative when exponentiated
  • Women are the gender coefficient and male is the referent.
  • For someone one year older and is a woman, compared to a man one year younger, the odds are multiplicative: e^(𝛽age+𝛽gender) = e^(0.5+1.2) = e^0.5 * e^1.2 ≈ 1.65 * 3.32.

Odds Ratio vs Risk Ratio

  • We commonly use the odds ratio, as it can be calculated given data.
  • However, the odds ratio can be notoriously hard for some to easily understand.
  • People often want to see what an outcome will be for certain groups, such as with a variable like age.
  • People are generally interested in the risk ratio (or relative risk).

Risk vs Odds Definitions

  • Using a 2x2 table is valuable for conceptualizing relative risk and odd ratio differences
  • Relative risk (RR) is (risk of event in treatment) / (risk of event: 𝑎⁄(𝑎 + 𝑏) / 𝑐⁄(𝑐 + 𝑑) = (𝑎𝑐 + 𝑎𝑑) / (𝑎𝑐 + 𝑏𝑐).
  • Whereas the odds ratio (OR) is (odds of event in treatment) / (odds of event in control): 𝑎/b / c/d = ad/bc..
  • Knowing the likelihood in a specific group is why we commonly find RR important.

The Odds Ratio

  • It is a good and reasonable approximation if the outcome is rare
  • If the outcome is rare, in our table a (rare to have a high event after treatment) or c (rare to have a low amount of the the bad event when trying to prevent it), these values will be reasonably low for the table
  • Therefore with either a or c being low, the product of a*c will be quite small as well.
  • Since RR = (ac+ad)/(ac+bc) and OR = ad/bc; with ac being reasonably small, these two values will be pretty similar to each other too. (approximations)
  • If our outcome is rare and low (under <5% of observations), it is understood the OR is quite similar to what we value/care about, the RR

Logistic Regression

  • It is a generalized linear model when binary is the outcome.
  • Using a logit-link function transforms the outcome from identity scale to log-odds.
  • Functions of the form ln(p / (1-p)) = 𝛽0 + 𝛽1X1 + 𝛽2X2 + ... result
  • Applying logarithms to the predictor’s beta coefficient yields its odds ratio with ORx = e𝛽

Ordinary Least Squares

  • It is useful determining the line in linear regression.
  • This method does not align and function with logistic regression.
  • Maximum likelihood is another option.
  • There will be a way of measuring results using OLS, having measuring functions minimize the overall metric.

The Good News

  • The time to use a logistic regression and how to use it in R is largely understood.
  • The assumptions that come with logistic regression are more relaxed than that of linear regression:
  • A binary outcome
  • The observations are independent
  • No Multicollinearity of predictors
  • No extreme outliers
  • A linear relationship between the predictors and Logit(Y)
  • Rule of 10 (10 observations of the least frequent category of each predictor)

Logistics Regression in R example

  • The data.frame in R called state.x77 is in use.
  • Factors including state-level illiteracy rate, state-level murder rate, and state-level graduation rate and their effect on state's life expectancy.
  • Outcome dichotomization happened: 0 = life expectancy is < 71 years, 1 = life expectancy is equal to or above 71 years of age.
  • It is termed "seventyone" as the example

Running Logistic Regression in R Mechanics

  • 𝑔𝑙𝑚() function is being used, like with linear regression.
  • The data and equation are specified
  • An additional command is family which consists of the link function and random component in the code.
  • The binomial shows data outcome structure, while logit showcases the transforming outcome with a logit function.

Regression Output

  • The output is noticed to be similar for the model, being akin to both linear regression and normal regression
  • The beta coefficients will be in the "Estimates” column
  • Error standard, z-values, and p-values.
  • The link is between every X and the y log-odds.

Running Logistic Regression in R: Ouputting

  • A results can be set as:
    • ORs <- exp(coef(model))[-1]
    • Confit < - exp(confint(model))[-1,]
    • P_Val <- coef (summary(model))[-1,4]
  • The coefficients are for c(“Illiteracy”, “Murder Rate”, “Highschool.
  • data.frame data will consist of: variable_names, ORs, confint, P_Val.

Writing Results to CSV

  • "logRegression.csv"
  • For every increase in the high school graduation rates, odds of at least 71 life expactancy, increased by 1.29, 29% increase.

Adjusted Odds Ratio

  • Adjusted shows multiple predictors were added into a model.
  • The impact and relationship between illiteracy and murder rate is adjusted connecting from the data.

Modelling Count Data with a Poisson Regression

  • Moving on, the modelling will now cover count data

Count Data Types

  • Numeric outcome.
  • "Over a given period of time, how often Y happen?"
  • Amount of days an individual binge drink
  • Number of deaths that occurred over a year
  • Number of amount of cigarettes someone can intake.

Key Points Around Count Data

  • Assuming normality and using regressions are usually used when answers are greater in value than 10.
  • The assumption cannot be made with an answer below 10.

Defining Poisson

  • One value defines a Poisson distribution: λ.
  • Expected value is λ and the variance of an error.
  • Becomes similar to normal distribution when λ is greater or equal to 10.

Identifying A Poisson Distribution

  • The variable can be identified with an hist() function.
  • There will be high concentrations close to 0 and a small skew.

Generalized Linear Model for Poisson

  • In(u) : We can use the log link if we let u be the expected number of events set to occur for a given observation.
  • Taking a logarithm distributes the variable to become more normal, in the case that a variable is right-skewed
  • Functions come in the form with:In(u) = 𝛽0 + 𝛽1X1 + 𝛽2X2 + ...
  • With this distribution, a Poisson distribution is assumed with the error structure.

Overdose Death Example

  • The histogram is measured at the country level and shows how many total OD deaths occurred through the year of 2018
  • A smaller proportion of values and amounts fall close to 0
  • Questions like:"how rates of poverty and adults that go to school can cause more overdoses" can come to mind when asking questions related to this

Running a Poisson Regression

  • Quite similar to that of to the running of a logistic reggression
  • glm() is the function that is in use during this process.

Changes in Families

  • Is used and changes using "possion" which the name of the family
  • When the numbers, people, county/region matters, its accounted for more in the "OD deaths" example

Adjusting

  • Offset is required, to allow for adding our model in total
  • With data and information from every area and group, "Family" comes with the use of poisson

Results

  • A transformation using log scale, meaning the coefficients need to be to be taken in to account.
  • In(u) = 𝛽0 + 𝛽1X1 + 𝛽2X2 + ... Is an example of the data being used.
  • For more on the relationship and effect, the beta needs to have its coefficients taken into account.
  • Same effects happen using logistics.

Incidence Rate Ratio

  • Its based on a log scale where the coefficients are in relation to the numbers.
  • This shows that when the data with all these areas, all have significant relationship and the coefficients are taken into consideration.

Poisson Assumption

  • The main assumption/view is that that total equal amount/number and results is the outcome.
  • Often the outcome is overdispersed means having a more wide variance.
  • This means you would need to run a binomial mode to show that all the results were as close as can be in relation to each other.
  • Looking as null and how close the points are
  • A high range that's has data closer to it that binomial is needed. The higher count means a sense need to be had between all.

Running Negative Binomial data numbers in the set

  • Gmlnb allows for using the functions and parameters with more understanding.
  • The data in relation of to both and the link and numbers that connect/coordinate the data.
  • The Deviance or points of the date is close as ever.
  • When a model is in the NB model there will result, being interpreted the same as it would with Poisson.

Putting in the Results

  • A CSV format is what has the output and is commonly used.
  • When "Rattos are seen being presented there's another side with having the data used that's not shown but considered
  • The high results being from highschool high graduation and an unseeable amount of deaths are the points and focus to take from the data

Modified Poisson

  • Its used as an alternative, and shown with logistics

Modified Zou's Poisson

  • The odds being used are complex
  • The use of R.R ratios in results
  • With different angles and view of the results.
  • The common used and known version of the data.

Steps

  • The code for the results needs to have different angles and use for each one
  • The "sandwichs" estimator results and the errors.

Implantation

  • Same in code and easy to use.
  • Simple version of code is better.
  • Sandwich codes

Zou"s Modified Poisson

  • To easily see the data even when data is already extracted

log

  • A step by step data extract

"Rare codes

  • In Logistics

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Poisson Regression
10 questions

Poisson Regression

RenewedRhodolite avatar
RenewedRhodolite
Analyzing Count Data with Poisson Regression
5 questions
Poisson Regression for Relative Risk Estimation
5 questions
Use Quizgecko on...
Browser
Browser