Repeated Measures ANOVA Lecture PDF
Document Details
Tags
Summary
This document provides a lecture on repeated measures ANOVA, which is a statistical method used to analyze data where the same subjects are measured multiple times. It covers the basic concepts of ANOVA, including assumptions, types of designs (e.g., between-subject, factorial), and the interpretation of results, particularly relevant concepts such as effect size (n²).
Full Transcript
Repeated measures Week 1 – Review of ANOVA Between-factor one-way ANOVA = comparison of group means. (independent popula=ons). Tes=ng if the means between groups are equal. Null hypotheses says all means are equal. Alterna=ve says at least one group is different. SS par==on: SST = SSG (between) + S...
Repeated measures Week 1 – Review of ANOVA Between-factor one-way ANOVA = comparison of group means. (independent popula=ons). Tes=ng if the means between groups are equal. Null hypotheses says all means are equal. Alterna=ve says at least one group is different. SS par==on: SST = SSG (between) + SSE (within) ANOVA with two groups -> T-test (only one independent factor) ANOVA: fixed number of groups, variable number of possible outcomes. Point of a test: find out if different is due to chance or if it is a significant difference. P-value: indicates the significance of a factor (What is the probability of obtaining these or more extreme sample means if the means would be equal in the popula=on?) Effect size: indicates the size of the effect (In ANOVA: how large is the difference between the groups in the popula=on?) N2 = ss effect / ss total: propor=on of variance explained of effect. Par=al n2: propor=on of variance explained, aUer accoun=ng for variance explained by possible other factors. Alterna=ve hypothesis is just not the null hypotheses. When null hypothesis is rejected need to conduct further inspec=on. What does it mean the groups are not the same -> need to know which group differ. - Mul=ple comparisons o Planned -> contrasts o Post hoc comparisons -> unplanned -> don’t have specific expecta=on – data driven approach. Do all the contrasts. Post hoc – costs power, with mul=ple tests at the same =me you have to adjust your alpha, have to work with a lower alpha. At risk of concluding there is an effect when there in fact isn’t. Assump=ons ANOVA 1. Independent observa=ons 2. Withing each group the scores are normally distributed Check per group QQ-plot or test on skewness and kurtosis. 3. The variances of the scores are equal across all groups. Check sample variances between groups: max/min < 2 is ok Levene’s test: use of significance test to confirm H0. (need to get a non-significant result). It looks at if the variances are the same, and we need that. Experimental designs 3 characteris=cs 1. Manipula=on of treatment levels -> groups are created 2. Random assignment of cases to levels (groups) 3. Control of extraneous variables Hold them constant -> can do by randomiza=on Counterbalance their effects Turn them into extra factor When al hold, differences in scores are aeributed to differences in treatment levels. With sta=s=cs you can never find proof of a causal rela=onship. You can show a rela=onship, but not causal. Pure sta=s=cs can’t, you need theories, accumulate evidence that support theories. Replica=on helps. Between subject designs Differences due to treatments are tested between groups of subjects: different cases in every level. Designs: - Experimental: cases are randomly assigned to treatment. - Nonexperimental: no random assignment. Factorial designs: treatment levels are determined by more than one factor. Main effects of each factor, and interac=on(s). Factorial ANOVA / two-way ANOVA Usually more than one factor Why: sta=s=cal reason – reduc=on of error variance. Not primary focus but know it would be foolish to not include variable (medica=on vs therapy). Or substan=ve reason: study interplay between variables. Sources of variance: Iden=fying: 1. List each factor as source 2. Examine each combina=on of factors: complety crossed -> include interac=on as source. 3. When effect is repeated, with different instances, at every level of another factor -> include factor as source. S: subjects, variance you have to due chance, people are different are get different scores. Effects Two-way ANOVA - Main effect: the average of the simple effects for a certain factor Most appropriately interpreted with no interac=on. Average over simple effects, misleading for specific effects. N per cell Preferred equal number of subjects per cell. SS of effects and interac=ons are orthogonal. Effects are complety separated, and tests are independent. Unequal number of cells: 1. Regression approach: adjust each effect for all other effects to obtain its unique contribu=on. 2. Experimental method: es=mate the main effects ignoring the interac=on, es=mate the interac=on adjus=ng for the main effects. 3. Hierarchical approach: us an order in decomposing the effects. Factorial-blocks design With minimal of two factors, one factor serves as a so-called blocking factor. The blocking factor is intrinsic to the subject, and related to the dependent variable. Purpose of blocking is to draw conclusions about each block, error reduc=on. Not main priority of research. But can look at it. Make something an extra variable. Not assigning randomly post tes=ng, match (gender) n -> blocking. Within man randomize, make sure numbers are the same. Two types of factors - Experimental - Blocking Factorial-blocks design - Randomized-blocks design - Post-hoc block design Randomized-blocks design Homogeneous blocks of subjects are formed beforehand to reduce withing-group variability. Increases sta=s=cal power for factor A. Advantages: reduc=on of error variance, increases comparability of groups by assuring the block sizes are equal, interac=on between factors can be detected. Post-hoc blocking is blocking aUer collec=on of the data, was not ini=ally planned in design. Problem is oUen unequal sample sizes and data fishing. If blocking variable is con=nuous, blocking causes a loss of informa=on. With blocking you reduce variances withing group by matching preexis=ng group differences, which reduces error variances. Types of designs: - Between-subjects designs: one of more factors in the experimental design. Different cases in each cell of experimental design. - Repeated-measures designs: each case par=cipates in two or more treatment levels and is measured once at each level. Withing-subject designs: differences due to treatments are tested within the same set of subjects. Effect size, p-value and power Type 1: rejec=ng H0, when you should not have – false posi=ve. Type 2 error: not rejec=ng H0, when you should have – false nega=ve. Improve power: - Increase sample size - Increase a – risk more type 1 errors. - Increase size effect - Decrease error variance – by making groups more homogeneous - (blocking), or add factors and/or covariates to the model. Power analysis A priori analysis: Compute sample size n, as a func=on of the required power level, pre- specified significance level, and popula=on effect size to be detected. Retrospec=ve power: Compute power level of a sta=s=cal test carried out, as a func=on of the sample size, significance level applied, and the sample effect size. No addi=onal informa=on for explaining nonsignificant results. Week 2 - ANCOVA and regression Experimental control – want to eliminate confounding, minimize variability. Everything is constant except the variable – blocking. ANCOVA is special kind of blocking: con=nuous IV, block for every ‘age’. Way to reduce within group variance. By making the groups more homogeneous. Ac=vely modelling what the data tells us, even though we don’t really care about it, it is not the research ques=on. By factoring in extra variable you get a clearer picture -> more power. ANCOVA is way to eliminate systema=c differences (bias) between the groups. Why use covariates -> reduce of error variance, increase power. Use covariates to adjust means for differences. ANCOVA in natural groups -> need to think about confounding variable. Covariate is always a con=nuous variable. (discrete -> blocking). To have an effect, the covariate should be correlated with the DV. Needs to be linear rela=onship – otherwise transforma=on is needed. Use this dependency to make beeer predic=ons of the means in the group: adjust the group means. ANCOVA: combine regression and ANOVA. 1. Perform a regression analysis to predict the dependent variable using the covariates. -> the residuals of this analysis are ‘corrected’ values of the dependent variable. 2. Perform an ANOVA on the corrected dependent variable (residuals) to examine groups. With ancova try to reduce within group varia=on. Make a corrected mean group line. Do this by compu=ng a regression line to model associa=on between x and y. this line has the same slope for all a groups. Determine the withing-groups variance with respect to the regression lines. To calculate between-groups variance use differences between corrected (adjusted) means. Calculate corrected means using the regression lines by filling in the total mean of the covariate (grand mean). Assump=on needs to be parallel. The rela=onship between DV and covariate need to be the same for the two groups. Overall the within group variances are smaller – due to corrected mean. SS error: unexplained varia=on within groups – based on regression line. Use this regression to calculate adjusted means by filling in the mean of the covariate (grand mean). Put in all data points, put them in regression line. Tes=ng adjusted means: SSa = SSE (Ho) – SSE (Ha) With the use of two regression lines, SSE will be smaller. ANCOVA: do not add covariates without considerable thought. Important covariates don’t correlate with one another. Assump=ons ANOVA: 1. Independent observa=ons – design, intraclass correla=on 2. Normally distributed error – scores (DV) in each group normally distributed. Skewness, kurtosis, PP- plot, histogram, boxplot. 3. Homogeneity of variances – sample SDs, Levene’s test, BF-test Extra assump=ons ANOVCA 4. Linearity Rela=on between y and covariate is linear. 5. Homogeneity of regression slopes Regression lines are parallel, groups have equal slopes 6. The covariate is measured without error Important with natural groups Correc=on of viola=ons: transforma=ons, nonlinear ANCOVA Heterogeneous regression slopes indicate interac=ons between factors and covariates. - Test significant of interac=on to inves=gate the assump=on of homogeneous regression slopes. - A significant interac=on effect indicates unequal slopes. Unequal slopes can be modeled by including the interac=on in the model. However, only equal slopes ensure that differences in means are matched by differences in height of the regression lines, and H0 has the same meaning as in an ordinary ANOVA. ANCOVA in randomized designs Random assignment of subjects to group – systema=c differences between subjects are more or less equally divided over groups. No systema=c differences in covariate means. => primary effect is error reduc=on. When pre-exis=ng classifica=on is used (groups are defined by classifica=on factor) systema=c differences between the groups may arise: these are par=ally reflected in the covariate, pre-exis=ng classifica=on know as natural/intact groups. Non-randomized design: systema=c bias may exist between groups that is not due to manipula=on of experimental factors – then equa=ng is nonsense, and the interpreta=on of an ANCOVA is flawed. Week 3 – MANOVA ANOVA with more than one dependable. Mul=variate designs: more than 1 dependent variable, model associa=on between dependent variables. Two reasons: 1. Some treatments (factor) affect subjects in more than one way 2. Several criterion measures will provide a more complete and detailed descrip=on of the phenomenon under inves=ga=on. Null hypothesis The combina=on of means in group1 (all outcome means) is equal to all outcome means in group 2. Reject H0 means the groups are different on one linear combina=on of all the means. Sta=s=cal reasons for mul=variate analysis 1. Mul=ple univariate tests inflate overall Type 1 error rate – capitaliza=on on chance. 2. Univariate analyses ignore important informa=on: associa=on between dependent variables (correla=on) - Separate tests re-analyze same variance - Individual variables may show no significant effect, while jointly the variables do have an effect. ð Power: mul=variate test may have more sta=s=cal power. 3. The use of a total score (which consists of several subtest scores) may not reveal significant effects due to a canceling out effect. Reasons for not using MANOVA 1. The techniques do not answer all ques=ons; s=ll need univariate tests to follow-up significant result. 2. Small or negligible differences on ‘badly chosen’ variables ma y obscure real differences on other) more important) variables. 3. Arbitrarily chosen variables (with even up to moderate correla=ons between DVs) can decrease the power of the mul=variate test. ð Carefully select the variables to include. Assump=ons - Vectors Yq and Y2 have a mul=variate normal distribu=on with means u1 and u2 and covariance matrix. - Covariance matrix is constant across groups. - K samples with sizes n1, …, nk. - Linear rela=on between all DV’s. What does significant (overall effect) mean? There is at least one linear combina=on of dependent variables for which at least some of the groups differ in popula=on means. Follow-up analyses aUer significant MANOVA. Which variables and/or groups cause effect? - Variables – each separate variable univariate ANOVA - Groups – post-hoc procedures or contrasts, visual inspec=on k-group MANOVA Univariate ANOVA: par==oning of total variance – SStotal = SSbg + SSwg Mul=variate ANOVA for k-groups: par==oning of total covariance (matrix) – total SSCP = between SSCP + within SSCP => T = B + W Wilks’ lambda Determinants of SSCP matrices are generalized variances. Wilks lambda give percentage unexplained variance. Measure of badness of fit. If B=0, there is no treatment effect and lambda = 1. If W=0, there is no within group dispersion and lambda = 0. Lower lambda is beeer. Distribu=on of Lambda is complicated – approximate: - With chi distribu=on, with p(k-1) df’s - With f distribu=on, with df’s that may be non-integer. - SPSS uses F, which is beeer for small n. - F is exact for some values of p and k. Other sta=s=cs - Roy’s largest root - Hotelling-Lawley trace - Pillai-Bartlet trace Tend to give similar answers. But when differences: Wilks, Pillai, Hotelling equally quite robust with respect to viola=on of the assump=on of homogeneity of covariance matrices, provided that group sizes are approximately equal. In some situa=ons Roy has more power, but differences with respect to power between the four sta=s=cs are small. P-value reports significance, effect size reports relevance. Effect size n2 = 1-lambda -> interpreta=on similar to R2 in regression: percentage of explained variance. However, sum of all effects n2 might exceed one. Par=al n2 = 1 – A1/s s=minimum number of DV and hypothesis df. Roy-Bargmann Stepdown Analysis Method for selec=on of important DVs. Conceptually similar to backward elimina=on in mul=ple regression. Procedure: 1. Rank order the DVs, based on theore=cal considera=ons, or effect sizes in separate ANOVA’s. 2. Do univariate ANOVA on the most important DV Significant: select dv and go to step 3. Else, stop. 3. Do ANCOVA with next-most significant DV as DV, and the selected DV as covariate. Significant: include DV, repeat 3. Model assump=ons in (M)ANOVA 1. Independent observa=ons No rela=onship between cases Effect of viola=on: es=mators of standard errors are generally too small: thus test liberal. Detec=on of viola=on: design, intraclass correla=on. Correc=on viola=on: test at more stringent level of significance: smaller a -> decreased power. MLM 2. Distribu=onal assump=ons within each group ANOVA: observa=ons follow a normal distribu=on. MANOVA: mul=variate normal distribu=on of dependent variables. Detec=on: check the marginal distribu=ons of the individual variables: univariate normal distribu=ons. Check bivariate distribu=ons of all pairs of variables: bivariate normal distribu=ons. Check: graphs, sta=s=cs, tests Shapiro-Wilk. Check whether scaeerplot for each pair is ellip=cal – ellipse means associa=on. Correc=on: transforma=on DV, collect more data, data trimming, check for outliers, MLM. 3. (co)variances assump=ons within each group ANOVA: popula=on variances are equal. MANOVA: within-group covariances matrices are equal. Effect: for equal group sizes actual a levels are very close to the nominal a levels – robust. For unequal group sizes: F liberal (large variances, small groups), conserva=ve (large variances, large groups) ð Balanced designs are very important. Checking: visual – compare matrices, Box’s M test. Correc=on: transforma=on individual variables, to stabilize the variances. MLM. 4. Linear rela=on between all DV’s ANOVA; n.a. Checking: scaeerplot of Yi vs Yj. Correc=on: transforma=ons, nonparametric MANOVA, if only one out of many DVs misbehaves: remove it. Post hoc comparisons Univariate ANOVA’s as follow-up of MANOVA. When tes=ng several hypothesis adjust cri=cal values of test. -> Bonferroni procedure a/m (number of tests). Contrasts in MANOVA Specify comparison under contrasts or via L-matrix in GLM module, or make your own transformed variables. Key idea: test combina=on of effects. The fewer tests, the beeer. Prefer contrasts over mul=ple comparisons. Week 4 – Di)erence scores vs ANOVA Examples repeated measures: performance across k condi=ons with simplest design k=2 pre- and post measurement of one-group sample. Or performance across =me with at least k=3. Possible analyses Separate ANOVAs for each =me point. Insight into between-factor effect at each =me point separately. No insight into =me effect. Use pared t-test for each pair of =me points. K=2 -> OK of ANCOVA. But for k>2 not op=mal because of mul=ple hypothesis tes=ng, reduced power, disregarding associa=on between more than two measures. K=2: pretest-poseest designs 1 within-subject factor (=me) with two levels: thus two repeated measures on same subject. Difference score: d = y1-y0 One group; thus nu between-factor: H0: ud = 0 – popula=on mean of difference scores is 0. Equivalent to matched t-test and within-subjects ANOVA, 1 within-subject effect. ANCOVA with pretest as covariate The post-measure is regressed on the pre-measure -> implies working with a corrected mean. poseest mean u1 is corrected for the pretest mean u0 using linear regression. Trying to predict post with the pre-score. ANCOVA y*=y1-b1y0 Difference score: implicit assump=on b1=1. This is usually not the case. In ANCOVA b1 is es=mated op=mally, which reduces the error variance, and increases power. But, ANCOVA should be valid approach: - Randomized designs ANCOVA is valid Randomized designs: random assignment of subject to groups, at popula=on level no differences in pre-measures between groups. Ancova and gain score analysis: test same hypothesis and es=mate same group differences. Ancova provides more power and precision than anova on gain scores, because the error variance is smaller, and is thus preferred. - Quasi-experimental design with natural groups ANCOVA possibly invalid. Mean pre-measured are not necessarily the same. Ancova assumes they are, difference score analysis compares groups as is. May lead to different results. Mean pre are not necessarily the same but ANCOVA assumes they are. Range of scores between groups. => Lord’s paradox Lords’ paradox is when different sta=s=cal approaches lead to seemingly contradictory conclusions about effects of a treatment. Crucial ques=on is if group membership is unrelated to pretest score. If it is related ANOA gain scores, no change but ANCOVA difference between groups. Mixed design, k=2 =me points Within-subject factor with k=2 levels, and 1 between-subject factor (group). ANOVA on difference scores. Equivalent to within ANOVA tes=ng, 1 within effect and 1 between effect. Mixed design, k > 2 =me points Withing-subject factor with k > 2 levels, and between-subject factors. Possible analyses - Within subjects ANOVA Consider RM as a block-design Blocking factor: subjects - block on subjects to remove within-subjects variability from error variance. - Profile analysis - Mul=level analysis It does not take into account the dependency between observa=ons within one subject, due to the specific proper=es of that subject: subject must be regarded as a factor. Consider subject as a separate factor: blocking. Univariate approach splits within-groups variability SSs(k) in two parts: 1. Interac=on individual differences with treatments SSsk 2. Individual differences due to subjects SSs. Blocking on subjects: remove subject mean taking individual varia=on out of equa=on. Subject SS measures consistent differences between subjects that affect subject means. Treatment SS: withing subject effect and therefor requires withing- subject error term. Interac=on MS: reflects the extent to which subjects respond differently to treatments. Assump=ons RM-ANOVA 1. Independent observa=ons 2. Univariate normality Per subject and =me point: impossible to assess, since there is only one observa=on per subject and =me point. Per =me point across subjects: possible to assess. 3. For k > 2: spericity: for all difference ariables between all pairs of k repeated measures, the variances are equal. Var (Drug1-Drug2) = Var (Drug2-Drug3) = Var (Drug3-Drug4) = etc Test with Mauchly’s test W. Problem: sensitive to departures from normality, lack of sensitivity to small violations. If violated: F too liberal, rejecting falsely too often, Epsilon correction If sphericity violated – epsilon correction. Adjust the degrees of freedom with Greenhouds & Geisser or Huynh & Feldt. GG – quite conservative (df become too small) HF – quite liberal For large n they are usually the same, for small n GG is safer. Epsilon is extent to which the covariance matrix deviates from sphericiy df or F are multiplied by e. Week 5 – Profile Analysis (RM-MANOVA) If sphericity is violated -> RM-MANOVA with epsilon correc=on or RM-MANOVA. Types of designs 1 within-subject factor - Test of flatness (= main effect of =me) 1 within-subject factor and 1 between-subject effect - Test of flatness (=main effect of =me) - Test of parallelism (=interac=on =me x gender) - Test of difference in levels (=main effect of gender) 1 within-subject factor Profile analysis with 1 within-subject factor with k levels equals analyzing k – 1 transformed measurement with MANOVA. Transformed measurements MANOVA on the (k – 1) transformed variables - Particular linear combination of k original measurements o H0: means of the transformed variables equal to 0 o e.g.,: T1=Y1 - Y2: T2 =Y3 - Y2; μ(T1)=μ(T2)=0 - k – 1 transformed variables may or may not be dependent, hence it is not assumed that the transformation takes care of all dependence between measurement contributed by one subject Which k-1 transformed variables have to be used? - Many equivalent choices, not all orthonormal, some via dummy contrasts: o Polynomial transformations (contrasts) are standard in SPSS. Polynominal = diTerence scores of consecutive time points. o DiTerence scores of consecutive time points - Invariance property: multivariate test statistic is the same for equivalent choices Coeficients orthonormal if contrasts mul=plied and added up = 0, cross products = 0, length coefficients = 1. Profile analysis is trajectory over =me. Fixed and random effects Fixed effect ANOVA: - Only interest is effects of the groups measured in the study. - Effects es=mated uj for all groups j. - 1 source of random varia=on Eij. Random effect ANOVA - Groups in the study are conceived as a random sample of the popula=on of groups. - Assump=on: - 2 sources of random varia=on Eij and aj. Choice depends on - Focus of sta=s=cal inference – research ques=on - The nature of the set of N groups – generalizability And is limited on - Sample sizes at all ‘random effects levels’ 1. Number of groups 2. Magnitudes of the groups sample sizes nj. 500 employees from 3 companies -> fixed effect ANOVA or regression analysis with company as predictor using dummy. 500employees from 40 companies -> random effect ANOVA or MLM Week 6 – Multilevel model Different level/hierarchy of levels. Can be like different skill level and speed. Key dis=nc=on is difference between fixed and random effect. Mul=level is a special regression model. Different regressions for different levels, and slide them into one another. Not necessarily the same amount of observa=ons per level 2 unit. Samples are drawn at two levels. Both level 2 and level 1 units are conceived as random samples from their popula=ons. Level 1 observa=ons are dependent within level 2 units. Individuals from one class are more coherent than between classes. Assump=on of independent observa=ons between level 2 units is violated. If level 2 units are groups: - Model la=ons between groups - Model rela=ons within groups - Both levels are modeled as random effects - Represent within-group and between-group rela=ons in a single model. For repeated measures: level 2 units are individuals – replace in previous point groups by individuals. Level 2 units as groups Linear regression Assump=ons: - Independent observa=ons - Linear rela=on between y and x - Error term has constant variance σ2. - Error term e is normally distributed with mean 0 and variance σ2 , indpedent of x. - Error term represents random differences between observa=ons, summarized by the variance. Nota=on of mul=level model with 2 levels - j = 1, …., N groups (level 2 units) - i = 1, …, nj individuals in groups (level 1 units) - dependent variable yi -> yij - predictor at level 1 (individuals) x1 -> xij The basic idea of mul=level modeling is that the outcome variable yij has an individual as well as a group aspect. Mul=level is ‘ideal’ solu=on. Not separate regressions per group j, but one model in which differences in coefficients (for groups) are modeld using random coefficients. Own intercept and slope. Random intercept model Slope is the same but different intercept. Base rate differences in dependent variable. 1. Differences between individuals 2. Differences between groups - B0j -> intercepts varies across groups. - Y00 -> mean intercept in popula=on of groups - Var(b0j) = var(u0j): variance of intercept reflects range of differences across groups in intercepts. U is small when there are small differences beteen groups. Why not use ANCOVA, with X as a covariate, factor as group? In ANCOVA, you treat the group as fixed. In mlm effect is random -> want to factor it in but is not the main focus of the research. Random effect ANCOVA is the same as the random intercept model. Random slope model Adds B1j = Y10 + U1j - B0j and B1j -> intercepts and slopes vary over groups. - Y00 and Y10 -> mean intercept and slope in popula=on of groups. - Var(B0j) = var(U0j) and var(B1j) = var(U1j) -> variance of intercapt and slope indicate ‘range’ of plausible differences - Also cov(B0j, B1j) -> covariance between intercept and slope. Assump=ons - (U0j, U1j) and eij are independent - Y00 and Y10 are fixed parameters. T20, t2q, t01, o20 are random. Sta=s=cal tests in the mlm - Fixed regression coefficients -> usual t-test or likelihood ra=o test - Random coefficients -> likelihood ra=o test only for nested models Level 2 units as individual – repeated measures version mlm Two-level structure of measurement within individuals - Level 1: measurements, at different =me points or occasions. Explanatory variables: 1. (typically) =me itself; and/or 2. Time-dependent variables. - Level 2: individuals Explanatory variables: individual characteris=cs - Cross-level interac=ons Time by individual characteris=c Do males show a different growth paeern than females? Research ques=ons typically involve change/development - Level 1 (intra or within-individual) How does outcome change/develop over =me? - Level 2 (inter or between-individual) Can differences in the changes be modeled or predicted? Need at least 3 measurements to model linear change. Time as explanatory variable at level 1 can via dummy variables, or func=on of =me. Dummy variable - Y00: intercept, expected mean of y at the age of 1 - Y10: slope for D1, difference in expected mean of y between the age of 1 and 1.5 - Y20: slope for D2, difference in expected mean of y between the age of 1 and 2. Linear func=on - Y00: intercept; expected mean of y at the age of 1 - Y10: slope for (AGE-1); difference in expected mean of y between the ages 1 year apart. OUen a more flexible model is needed, such as a random slope model Null hypotheses On the fixed part of the model 1. Between subjects: effect of PROGRAM at age of 1 - There is no program effect at age 1, H0: Y01 = 0 - No difference in intercepts between groups 2. Within subjects: effect of =me - There is no =m effect for no program, H0: Y10 = 0 - No difference in slopes: horizontal lines over =me 3. Within subjects: interac=on effect - There is no interac=on effect, H0: Y11 = 0 - Effect of =me is the same for both groups Week 7 – Missing Data, RCTs and complex interventions Missing data Missing datum: no score when score was planned to have been gathered. - From the sampled par=cipants: par=cipant refuses to par=cipate or does not show up. - Par=cipant cannot or refuses to ‘deliver’ a score. - Loss of data, e.g., due to computer failure - Repeated measures design: drop-out of the study Reason why data is missing maeers, if it is related to the study is more harmful. If you have computer failure, and lost 10 percent irrespecvul to what you’re researching, not as harmful. Consequences of missing outcomes 1. Effec=ve sample size smaller Implies: sta=s=cal tests have lower power, and larger standard errors of the sta=s=cs of interest. This always occurs with missing data, doesn’t maeer how loss of data was related to research. 2. Possibly biased result – sta=s=cal tests do not relate to your popula=on of interest, but the popula=on that your sample with observed data represents. Example: lost due to computer failure, no bias. However, if 10% refuse to answer, biased – people with higher income are more likely to refuse, the mean income of the 90% of the responders is lower than the actual mean income. Consequence of missing data depends on the nature of the missing datum. Liele & Rubin’s classifica=on of missing data, form least severe to most severe 1. Missing completely at random (MCAR) Only effec=ve sample size smaller, no bias. Truly unrelated to any aspect of your study. 2. Missing at random (MAR) Missing in both groups, for example CBT and SSRIs, but the missing is unrelated to the dependent variable. Unrelated to the outcome measure but have a different rate of drop-out across groups. Effec=ve sample size smaller, bias can be avoided with proper imputa=on, or model (using observed covariates) 3. Not missing at random (NMAR) Effec=ve sample size smaller, and bias. For example, the more somber people drop-out, the people for who the treatment didn’t work anyway dropped-out. How do you diagnose the problem Not always possible, can’t always know which kind of missing happened. Pra=cal implica=ons - Do all you can to prevent missing data - If missing data occurs, keep note of the reason(s) - In planning the sample size of your study (power analysis), take into account expected aeri=on; be conserva=ve How to deal with it Throw people with missing data out. Par=cipants without any observed score: - Exclude from analysis, report them (number, causes of missing data), and reflect upon poten=al biases in the results that you did observe. Par=cipants with some observed scores: - Omit from analysis (complete case analysis) may yield basis. Only OK for very small numbers missing. - Impute (make an educated guess) their missing outcome data: if done correctly, may eliminate bias. - Use an analysis that can deal with this (i.e., mul=level analysis for repeated measures, with some measurements of outcome measure missing): if done correctly, may eliminate bias. Impute missing outcome data - Prior knowledge - Impute uncondi=onal means: filling in means o May introduce bias o Reduces variability - Using condi=onal means: filling in model predic=ons (e.g., regression imputa=on). Means corrected for this par=cipant, if pre and post was higher than the mean. Using informa=on available, take pre and post into account. o If model is proper, then no bias o Reduces variability – s=ll saying that the par=cipant behaves in this predictable way. - Using condi=onal distribu=ons: filling in draws from the distribu=ons of the model (e.g., filling in predic=ons plus random error). o If model is proper, then no bias and no reduc=on of variability. Single imputa=on – fill in the missing data with the imputed values. Do this once. Because our guessed value of the missing data, is based on the observed data, we now end up with an analysis that uses the observed data twice. - Standard errors are underes=mated, p-values incorrect. - The larger the propor=on of missing data, the larger this issue. Mul=ple imputa=on – do this a number of =mes. - Analyse the mul=ple completed data sets as planned - Combine the results - Expresses both the uncertainty due to sampling fluctua=ons( as always), and due to missingness Use an analysis that can deal with missing data Expecta=on maximiza=on methods e.g., mul=level analysis for repeated measures, with some measurements of outcome measure missing. Es=mates are made based on all available data. Es=mates are biased when par=cipants with data have a different paeern than [par=cipants without data. Doesn’t work NMAR. Missing data handling in prac=ce 1. Prevent missing data as much as possible. 2. If it occurs, diagnose reason for missingness for each case. 3. Data analysis Very few – OK. Few missing – single imputa=on. More than a few – depending on nature and type, complete cases missing: describe them separately, NMAR mul=ple imputa=on. Randomized controlled trials (RCT) Principles Clinical lingo for an experiment. It is an experiment because we sample from a popula=on, randomly assign people to two condi=ons, assess the outcome and poten=ally do a follow-up assessment. RCT is oUen control versus interven=on. Alterna=ve trial designs: - Factorial design – 2x2 ANOVA Two trials for the price of one. More efficient, rather than doing two separate experiment you do two at the same =me. Disadvantage: not the cleanest assessment whether drug a and b work, or if they interact. - Within group designs - repeated measures set-up. Disadvantage: influence of learning effects? Regression to the mean? - Cross-over designs Advantage: minimizes poten=al for confounding. Increases power – smaller N is required. Disadvantage: doubling of the dura=on, poten=al carryover effects. (Washout is wai=ng =ll effect is over, or doing something in between so the effects are over) Mechanisms of change Predictor: variable that predicts outcome of an RCT across all condi=ons. Moderator: on whom and under what circumstances treatments have different effects. Kind of like an interac=on. Influences strength of rela=onship. E.g., treatment more effec=ve in women than men, but no gender effect in control. Mediator: how and why treatment takes effect. Pathways indica=ng causal rela=onships. Follow-up and adherence to protocol 1. Choose par=cipants who likely adhere to protocol 2. Make the interven=on simple 3. Make study visits convenient and enjoyable 4. Make study measurements painless, useful, and interes=ng 5. Encourage par=cipants to con=nue in the trial 6. Find par=cipants who are lost to follow-up Monitoring clinical trials 1. Stopping for harm Ensuring that no harm occurs to par=cipants. Make the interven=on simple. 2. Stopping for benefit Stopping when clear benefit has been shown. 3. Stopping for fu=lity Stopping at very low chance of answering ques=on Non-adherence to protocol Inten=on-to-treat versus per-protocol. It can happen that you assign some people to control and some to interven=on. It can happen that this changes during the trial. People may be part of interven=on, but don’t due the interven=on. So are placebo even though they were assigned interven=on. Do you treat these people as how they were assigned, or do you treat them how they wound up doing. If you do per-protocol -> no longer random assigned, par=cipants decided themselves they are part of placebo group by not doing the interven=on. Inten=on-to-treat: analysis the data as how people were assigned. Analysis condi=ons as were assigned. Per-protocol: analysis based on what people have actually done. Inten=on to treat likely provides a conserva=ve es=ma=on of treatment effects (tendency to underes=mate the full effect of a treatment), and per-protocol a liberal es=ma=on (tendency to overes=mate). Can also apply and report both. RCT: effec=ve, compared to what? Ac=ve treatment withheld. - No treatment: simple and cheap, controls for =me, tes=ng, regression to mean. Downside: ethical issues, may lead to drop out, may lead to independent treatment seeking. - Waitlist: give one group the interven=on now, and the other group in three months. Plus: guarantees treatment, some control for expectancy effects. Min: ethical issues related to delaying treatment, not indicated for long-term follow-up. - Placebo: plus: good control for expectancy benefits, good control for non-specific treatment effects, allows assessment of adverse effects. Min: ethical issues of irrelevant treatment, not double-blind in experience-based studies. Usual care: - some kind of care you are providing to all pa=ent anyway, for half of the par=cpants you are going to resume the help they are gezng, and the other half gets the new care. Plus: most acceptable to pa=ents and treaters, flexible to replace or superimpose usual care. Min: requires large N to achieve adequate power, usual care highly variable withing and between ins=u=ons. - Devised usual care: try to match the interven=on. Plus: guarantees minimum treatment, usually acceptable to par=cipants. Min: requires large N to achieve adequate power, may not be acceptable to par=cipants if too minimal (too much reduced). Ac=ve treatment comparisons - Dose control: plus: more ethical than no treatment, examina=on of dose-response rela=onship. Min: dose varia=on not always possible, limited hypothesis tes=ng, requires large N to achieve adequate power. - Dismantling design: slim down ac=ve treatment. Plus: acceptable to par=cipants, precise examina=on of ac=ve ingredients, for treatment with theore=cal jus=fica=ons. Min: requires a priori knowledge of ac=ve ingredients, requires large N to achieve adequate power. - Equivalence trial: find out if treatment is comparable, not if something works beeer or worse. Plus: acceptable to par=cipants. Min: may compromise internal validity, requires large N to achieve adequate power, requires significant resources. Complex interven=ons Interven=on is complex because of its proper=es, e.g.: - Number of components involved - Range of behaviors targeted - Exper=se and skills required by those delivering and receiving the interven=on - The number of groups, sezngs, or levels targeted - The permieed level of flexibility of the interven=on or its components. External vs. internal validity Efficacy: does the interven=on produce the intended outcomes in experimental or ideal sezngs? Conducted under idealized condi=ons, maximizes internal validity to provide a precise, unbiased es=mate of efficacy. Effec=veness: does the interven=on produce the intended outcomes in real world sezngs? Interven=on oUen compared against treatment as usual. Results inform choices between an established and a novel approach to achieving outcome. Theory base: what works in which circumstances and how? Aims to understand how change is brought, including the interplay of mechanism and context. Can lead to refinement of theory. Systems: how do the system and interven=on adapt to on another? Treats the interven=on as a disrup=on to a complex system.