Advanced Statistical Analysis: Event History Analysis 3
Document Details
Uploaded by ClearerKoala
University of Groningen
Clara Mulder
Tags
Related
Summary
These are lecture notes on advanced statistical analysis, specifically event history analysis, given by Clara Mulder at the University of Groningen's Faculty of Spatial Sciences. Key topics covered include preparations for analysis, defining events and populations at risk, determining time units and end of observation, choosing a method, and various extensions and complications encountered in the analysis. The lecture materials focus on practical considerations and data preparation.
Full Transcript
F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAdvanced Statistical Analysis: Event history analysis 3 Clara Mulder F a c u l t y o f Sp a t i a l...
F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAdvanced Statistical Analysis: Event history analysis 3 Clara Mulder F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sThis lecture › General feedback assignment Cox Regression › Reminder assignments › More about preparation of analysis › An alternative method: discrete-time event history analysis using logistic regression of person-period data › Extensions: time-varying covariates, non-proportional hazards, multiple risks › Complications: left censoring, non-random right censoring, periods of missing information F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAssignment Cox Regression F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAssignment Cox Regression › A lot of events before age 30 (highest hazard) › Hazards cross at age 30 • So…. Proportional? › Odd findings above age 40 • makes little sense; better to truncate (at 40? 30?) • Ask for confidence intervals: sts graph, hazard by(sex) ci › Respondents clustered in households • So… robust cluster(nohhold) F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAssignment Cox Regression F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAssignment Cox Regression › Predicted cohort survival using stcurve: forced into proportionality F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAssignment Cox Regression › Interpretation finding for level of education • ‘ Education strived for’ F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sReminder assignments › Minimum: 4 out of 5 from 1 st 5 weeks, 1 out of 2 from last 2 weeks › Doublecheck whether you attended the sufficient number of practicals and passed the sufficient number of assignments (list will be on Brightspace in the folder Assignment 6). › If you need to take a resit assignment: upload it in the resit assignment folder by 29 March 17:00 h. We will not grade resit assignments uploaded in the wrong folder F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sPreparation: Define event › Sometimes obvious, sometimes not. E.g. Legal divorce or separation? Birth or conception? All moves or just migration (long distance)? Move into homeownership or also becoming homeowner by transfer of ownership of current home? › How to choose? F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sPreparation: Determine population at risk › Those who can experience the event. Exclude those who cannot › For change in status: ‘who cannot’ includes those who have already experienced the event (example: homeowners, divorced) › For event without change in status: everyone, or some logical selection. Do 1 st , 2 nd … have meaning? F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sPreparation: Determining time 0 (start of clock; risk period) › For death: Birth › For marriage: Age from which marriage is legal › For divorce: Date of marriage (for some countries: after waiting time) › For fertility: Age 15? Marriage? Start of partnership? › For unemployment: Start of work history? Start of job? › … etcetera F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sPreparation: Determining time unit › Years? Months? Days? Hours? Minutes? › Depends on process under study: compare Mortality Infant mortality (in 1 st year) Neonatal mortality (in 1 st 28 days) Divorce Finding a job from unemployment Arrival of next bus › Depends on… F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sPreparation: Determining end of observation Truncate observation when: › Occurrence of event is logically no longer possible › Population at risk gets too small to get reliable estimates of the hazard › N events gets too small to get reliable estimates How? › Cox regression: replace eventindicator = 0 if T > T end (manual censoring) › recode T (T end / [highest] = T end ) › Discrete-time logit: keep if durationvar =< T end F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sPreparation: Describing hazard/survival › Survival analysis using Kaplan Meier • Depicting the survival function • Depicting the hazard function • Calculating survival statistics (e.g. median survival time) F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sFirst-time home-ownership by age: survival functions, Netherlands (Mulder & Wagner 1998 in Urban Studies) From: Mulder & Wagner, Urban Studies 1998 F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sPreparation: Choosing a method › Possibly a parametric method, if the hazard has a clear functional form and other assumptions are justified › Cox regression, if the hazards are (mostly) proportional › Discrete-time model: e.g. logistic regression of person-years (or person-months or …) F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sChoices in Mulder & Wagner: › Event: moving into first owner-occupied home (included buying, inheriting, buying with help; did not include becoming owner without moving) › Population at risk: renters and those living in parental home – never owned before › Time 0: age 15 (could have been 18 but there were a few aged 15-17) › Time units: years (could have been months but…) › Discrete-time logistic regression F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sThe discrete-time logistic regression (= logit) model › Units of analysis: time units (e.g. person-years) › Dependent variable: Event (1) or no event (0) › Time units after event occurrence: not in analysis ( drop or make sure to use … if ) › Covariates: time-constant or time-varying › Duration: expressed in number of time units without event › Hazard: assumed (piecewise) constant; exploring form possible through using duration variable and person file! F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sSatisfactory approximation of continuous time? › O.k. if % of time units with event (=with value 1 in dependent) < 10 › So choose your time unit small enough › Then the answer is: yes, go ahead! F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sViolation of assumption of independence of observations (is this a multilevel problem)? › Standard errors determined by category with fewest observations in dependent variable › No matter how many 0 in the dependent variable we have, there is always just one 1 (maximum). And you have made sure category with value 1 is much smaller than category with value 0 › So the answer is: no, go ahead (unless multiple episodes per person; then robust standard errors)! F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sData preparation › Construction of person-period file, usually from person file • Stata: reshape long F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAnalysis › Stata: logit F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sEvent history analysis: four extensions › Through time, values of covariates may change in the course of the duration (‘Time-varying covariates’) › Through time, effects of covariates may change (non- proportional hazards) › Multiple spells (> 1 event per person) › Multiple risks (e.g. unmarried cohabitation and marriage, rather than just marriage) F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sExtension 1: Time-varying covariates › Solution: Choose method that allows for these • Cox model: episode splitting. • Discrete-time logistic regression model: update value of covariate each time unit F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sExtension 2: Non-proportional hazards (effects change through time) › Another example from Mulder & Wagner, Urban Studies 1998 F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sHazard of transition to home-ownership by age, West Germany and the Netherlands (selected cohorts born 1900-1960) From: Mulder & Wagner, Urban Studies 1998 F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sWhat is the effect of dummy variable ‘country’? › When assuming proportionality: probably around zero or positive if the Netherlands = 1 › Clearly wrong F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sHandling non-proportional hazards › If proportionality assumption is violated: › Cox Regression: Split episodes; then: • run model separately for younger and older ages • OR include a dummy young/old and an interaction with the covariate › Discrete-time logistic regression: Include interaction of covariate with duration variable: country * age or country * age group F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sExtension 3: multiple spells › Re-appearance in population at risk › More than one spell / event per person › If this is not rare: correct standard errors for clustering of spells in persons (multi-level for beginners…) • Stata subcommand: robust cluster ( id-var ) or vce ( cluster id-var ) F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sExtension 4: multiple risks › e.g. from rent to other rent or to own › Cox model: not possible as far as I know › Discrete-time multinomial logistic regression: dependent 0 (no event), 1, 2, … • Stata: mlogit F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sComplication 1: left censoring › Continuous-time methods including Cox model: • Left-censored cases useless F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sComplication 1: left censoring › Discrete-time logistic regression: • If no duration dependence (constant hazard): no problem • Try to find substitute for duration variable • Include a duration variable, with a category ‘unknown’? Impute? F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sComplication 2: non-random right censoring F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sReminder: right censoring › Truncation of observation because of: • Panel loss (panel data) • End of observation (panel data) • The interview (retrospective data) • Occurrence of competing event (e.g. transition rent-own: renter moves back to parents): competing risks F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sComplication 2: non-random right censoring › e.g. panel attrition because of a move while moving is the dependent variable › Often ignored or just mentioned as a problem › Model panel attrition as extra risk? F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sComplication 3: periods of missing information › Common problem in panel data › Similar to missing values in cross-sections… but more complicated › No standard solution › Copy values from previous time unit? Impute? F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAssignment › Same data as assignment last week (leaving home), but transformed into person-year file and with extra variables › Reproduce last week’s results using logistic regression of person-years › [Extra: include time-varying covariates for whether respondent has finished education and has entered the labor market (= has had a first job)] › Extra: include interaction between age and sex to allow for non-proportional hazards F a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sF a c u l t y o f Sp a t i a l Sc i e n c e sAssignment › Note: data in this format NOT suitable for preparatory survival analysis; use person/episode data › Part 2: Try to explain the transition to first-time home- ownership using information available in the data › Use Mulder & Wagner 1998 in Urban Studies as inspiration if you want to