Repeated Measures ANOVA Lecture PDF

Repeated measures Week 1 – Review of ANOVA Between-factor one-way ANOVA = comparison of group means. (independent popula=ons). Tes=ng if the means between groups are equal. Null hypotheses says all means are equal. Alterna=ve says at least one group is diﬀerent. SS par==on: SST = SSG (between) + SSE (within) ANOVA with two groups -> T-test (only one independent factor) ANOVA: ﬁxed number of groups, variable number of possible outcomes. Point of a test: ﬁnd out if diﬀerent is due to chance or if it is a signiﬁcant diﬀerence. P-value: indicates the signiﬁcance of a factor (What is the probability of obtaining these or more extreme sample means if the means would be equal in the popula=on?) Eﬀect size: indicates the size of the eﬀect (In ANOVA: how large is the diﬀerence between the groups in the popula=on?) N2 = ss eﬀect / ss total: propor=on of variance explained of eﬀect. Par=al n2: propor=on of variance explained, aUer accoun=ng for variance explained by possible other factors. Alterna=ve hypothesis is just not the null hypotheses. When null hypothesis is rejected need to conduct further inspec=on. What does it mean the groups are not the same -> need to know which group diﬀer. - Mul=ple comparisons o Planned -> contrasts o Post hoc comparisons -> unplanned -> don’t have speciﬁc expecta=on – data driven approach. Do all the contrasts. Post hoc – costs power, with mul=ple tests at the same =me you have to adjust your alpha, have to work with a lower alpha. At risk of concluding there is an eﬀect when there in fact isn’t. Assump=ons ANOVA 1. Independent observa=ons 2. Withing each group the scores are normally distributed Check per group QQ-plot or test on skewness and kurtosis. 3. The variances of the scores are equal across all groups. Check sample variances between groups: max/min < 2 is ok Levene’s test: use of signiﬁcance test to conﬁrm H0. (need to get a non-signiﬁcant result). It looks at if the variances are the same, and we need that. Experimental designs 3 characteris=cs 1. Manipula=on of treatment levels -> groups are created 2. Random assignment of cases to levels (groups) 3. Control of extraneous variables Hold them constant -> can do by randomiza=on Counterbalance their eﬀects Turn them into extra factor When al hold, diﬀerences in scores are aeributed to diﬀerences in treatment levels. With sta=s=cs you can never ﬁnd proof of a causal rela=onship. You can show a rela=onship, but not causal. Pure sta=s=cs can’t, you need theories, accumulate evidence that support theories. Replica=on helps. Between subject designs Diﬀerences due to treatments are tested between groups of subjects: diﬀerent cases in every level. Designs: - Experimental: cases are randomly assigned to treatment. - Nonexperimental: no random assignment. Factorial designs: treatment levels are determined by more than one factor. Main eﬀects of each factor, and interac=on(s). Factorial ANOVA / two-way ANOVA Usually more than one factor Why: sta=s=cal reason – reduc=on of error variance. Not primary focus but know it would be foolish to not include variable (medica=on vs therapy). Or substan=ve reason: study interplay between variables. Sources of variance: Iden=fying: 1. List each factor as source 2. Examine each combina=on of factors: complety crossed -> include interac=on as source. 3. When eﬀect is repeated, with diﬀerent instances, at every level of another factor -> include factor as source. S: subjects, variance you have to due chance, people are diﬀerent are get diﬀerent scores. Eﬀects Two-way ANOVA - Main eﬀect: the average of the simple eﬀects for a certain factor Most appropriately interpreted with no interac=on. Average over simple eﬀects, misleading for speciﬁc eﬀects. N per cell Preferred equal number of subjects per cell. SS of eﬀects and interac=ons are orthogonal. Eﬀects are complety separated, and tests are independent. Unequal number of cells: 1. Regression approach: adjust each eﬀect for all other eﬀects to obtain its unique contribu=on. 2. Experimental method: es=mate the main eﬀects ignoring the interac=on, es=mate the interac=on adjus=ng for the main eﬀects. 3. Hierarchical approach: us an order in decomposing the eﬀects. Factorial-blocks design With minimal of two factors, one factor serves as a so-called blocking factor. The blocking factor is intrinsic to the subject, and related to the dependent variable. Purpose of blocking is to draw conclusions about each block, error reduc=on. Not main priority of research. But can look at it. Make something an extra variable. Not assigning randomly post tes=ng, match (gender) n -> blocking. Within man randomize, make sure numbers are the same. Two types of factors - Experimental - Blocking Factorial-blocks design - Randomized-blocks design - Post-hoc block design Randomized-blocks design Homogeneous blocks of subjects are formed beforehand to reduce withing-group variability. Increases sta=s=cal power for factor A. Advantages: reduc=on of error variance, increases comparability of groups by assuring the block sizes are equal, interac=on between factors can be detected. Post-hoc blocking is blocking aUer collec=on of the data, was not ini=ally planned in design. Problem is oUen unequal sample sizes and data ﬁshing. If blocking variable is con=nuous, blocking causes a loss of informa=on. With blocking you reduce variances withing group by matching preexis=ng group diﬀerences, which reduces error variances. Types of designs: - Between-subjects designs: one of more factors in the experimental design. Diﬀerent cases in each cell of experimental design. - Repeated-measures designs: each case par=cipates in two or more treatment levels and is measured once at each level. Withing-subject designs: diﬀerences due to treatments are tested within the same set of subjects. Eﬀect size, p-value and power Type 1: rejec=ng H0, when you should not have – false posi=ve. Type 2 error: not rejec=ng H0, when you should have – false nega=ve. Improve power: - Increase sample size - Increase a – risk more type 1 errors. - Increase size eﬀect - Decrease error variance – by making groups more homogeneous - (blocking), or add factors and/or covariates to the model. Power analysis A priori analysis: Compute sample size n, as a func=on of the required power level, pre- speciﬁed signiﬁcance level, and popula=on eﬀect size to be detected. Retrospec=ve power: Compute power level of a sta=s=cal test carried out, as a func=on of the sample size, signiﬁcance level applied, and the sample eﬀect size. No addi=onal informa=on for explaining nonsigniﬁcant results. Week 2 - ANCOVA and regression Experimental control – want to eliminate confounding, minimize variability. Everything is constant except the variable – blocking. ANCOVA is special kind of blocking: con=nuous IV, block for every ‘age’. Way to reduce within group variance. By making the groups more homogeneous. Ac=vely modelling what the data tells us, even though we don’t really care about it, it is not the research ques=on. By factoring in extra variable you get a clearer picture -> more power. ANCOVA is way to eliminate systema=c diﬀerences (bias) between the groups. Why use covariates -> reduce of error variance, increase power. Use covariates to adjust means for diﬀerences. ANCOVA in natural groups -> need to think about confounding variable. Covariate is always a con=nuous variable. (discrete -> blocking). To have an eﬀect, the covariate should be correlated with the DV. Needs to be linear rela=onship – otherwise transforma=on is needed. Use this dependency to make beeer predic=ons of the means in the group: adjust the group means. ANCOVA: combine regression and ANOVA. 1. Perform a regression analysis to predict the dependent variable using the covariates. -> the residuals of this analysis are ‘corrected’ values of the dependent variable. 2. Perform an ANOVA on the corrected dependent variable (residuals) to examine groups. With ancova try to reduce within group varia=on. Make a corrected mean group line. Do this by compu=ng a regression line to model associa=on between x and y. this line has the same slope for all a groups. Determine the withing-groups variance with respect to the regression lines. To calculate between-groups variance use diﬀerences between corrected (adjusted) means. Calculate corrected means using the regression lines by ﬁlling in the total mean of the covariate (grand mean). Assump=on needs to be parallel. The rela=onship between DV and covariate need to be the same for the two groups. Overall the within group variances are smaller – due to corrected mean. SS error: unexplained varia=on within groups – based on regression line. Use this regression to calculate adjusted means by ﬁlling in the mean of the covariate (grand mean). Put in all data points, put them in regression line. Tes=ng adjusted means: SSa = SSE (Ho) – SSE (Ha) With the use of two regression lines, SSE will be smaller. ANCOVA: do not add covariates without considerable thought. Important covariates don’t correlate with one another. Assump=ons ANOVA: 1. Independent observa=ons – design, intraclass correla=on 2. Normally distributed error – scores (DV) in each group normally distributed. Skewness, kurtosis, PP- plot, histogram, boxplot. 3. Homogeneity of variances – sample SDs, Levene’s test, BF-test Extra assump=ons ANOVCA 4. Linearity Rela=on between y and covariate is linear. 5. Homogeneity of regression slopes Regression lines are parallel, groups have equal slopes 6. The covariate is measured without error Important with natural groups Correc=on of viola=ons: transforma=ons, nonlinear ANCOVA Heterogeneous regression slopes indicate interac=ons between factors and covariates. - Test signiﬁcant of interac=on to inves=gate the assump=on of homogeneous regression slopes. - A signiﬁcant interac=on eﬀect indicates unequal slopes. Unequal slopes can be modeled by including the interac=on in the model. However, only equal slopes ensure that diﬀerences in means are matched by diﬀerences in height of the regression lines, and H0 has the same meaning as in an ordinary ANOVA. ANCOVA in randomized designs Random assignment of subjects to group – systema=c diﬀerences between subjects are more or less equally divided over groups. No systema=c diﬀerences in covariate means. => primary eﬀect is error reduc=on. When pre-exis=ng classiﬁca=on is used (groups are deﬁned by classiﬁca=on factor) systema=c diﬀerences between the groups may arise: these are par=ally reﬂected in the covariate, pre-exis=ng classiﬁca=on know as natural/intact groups. Non-randomized design: systema=c bias may exist between groups that is not due to manipula=on of experimental factors – then equa=ng is nonsense, and the interpreta=on of an ANCOVA is ﬂawed. Week 3 – MANOVA ANOVA with more than one dependable. Mul=variate designs: more than 1 dependent variable, model associa=on between dependent variables. Two reasons: 1. Some treatments (factor) aﬀect subjects in more than one way 2. Several criterion measures will provide a more complete and detailed descrip=on of the phenomenon under inves=ga=on. Null hypothesis The combina=on of means in group1 (all outcome means) is equal to all outcome means in group 2. Reject H0 means the groups are diﬀerent on one linear combina=on of all the means. Sta=s=cal reasons for mul=variate analysis 1. Mul=ple univariate tests inﬂate overall Type 1 error rate – capitaliza=on on chance. 2. Univariate analyses ignore important informa=on: associa=on between dependent variables (correla=on) - Separate tests re-analyze same variance - Individual variables may show no signiﬁcant eﬀect, while jointly the variables do have an eﬀect. ð Power: mul=variate test may have more sta=s=cal power. 3. The use of a total score (which consists of several subtest scores) may not reveal signiﬁcant eﬀects due to a canceling out eﬀect. Reasons for not using MANOVA 1. The techniques do not answer all ques=ons; s=ll need univariate tests to follow-up signiﬁcant result. 2. Small or negligible diﬀerences on ‘badly chosen’ variables ma y obscure real diﬀerences on other) more important) variables. 3. Arbitrarily chosen variables (with even up to moderate correla=ons between DVs) can decrease the power of the mul=variate test. ð Carefully select the variables to include. Assump=ons - Vectors Yq and Y2 have a mul=variate normal distribu=on with means u1 and u2 and covariance matrix. - Covariance matrix is constant across groups. - K samples with sizes n1, …, nk. - Linear rela=on between all DV’s. What does signiﬁcant (overall eﬀect) mean? There is at least one linear combina=on of dependent variables for which at least some of the groups diﬀer in popula=on means. Follow-up analyses aUer signiﬁcant MANOVA. Which variables and/or groups cause eﬀect? - Variables – each separate variable univariate ANOVA - Groups – post-hoc procedures or contrasts, visual inspec=on k-group MANOVA Univariate ANOVA: par==oning of total variance – SStotal = SSbg + SSwg Mul=variate ANOVA for k-groups: par==oning of total covariance (matrix) – total SSCP = between SSCP + within SSCP => T = B + W Wilks’ lambda Determinants of SSCP matrices are generalized variances. Wilks lambda give percentage unexplained variance. Measure of badness of ﬁt. If B=0, there is no treatment eﬀect and lambda = 1. If W=0, there is no within group dispersion and lambda = 0. Lower lambda is beeer. Distribu=on of Lambda is complicated – approximate: - With chi distribu=on, with p(k-1) df’s - With f distribu=on, with df’s that may be non-integer. - SPSS uses F, which is beeer for small n. - F is exact for some values of p and k. Other sta=s=cs - Roy’s largest root - Hotelling-Lawley trace - Pillai-Bartlet trace Tend to give similar answers. But when diﬀerences: Wilks, Pillai, Hotelling equally quite robust with respect to viola=on of the assump=on of homogeneity of covariance matrices, provided that group sizes are approximately equal. In some situa=ons Roy has more power, but diﬀerences with respect to power between the four sta=s=cs are small. P-value reports signiﬁcance, eﬀect size reports relevance. Eﬀect size n2 = 1-lambda -> interpreta=on similar to R2 in regression: percentage of explained variance. However, sum of all eﬀects n2 might exceed one. Par=al n2 = 1 – A1/s s=minimum number of DV and hypothesis df. Roy-Bargmann Stepdown Analysis Method for selec=on of important DVs. Conceptually similar to backward elimina=on in mul=ple regression. Procedure: 1. Rank order the DVs, based on theore=cal considera=ons, or eﬀect sizes in separate ANOVA’s. 2. Do univariate ANOVA on the most important DV Signiﬁcant: select dv and go to step 3. Else, stop. 3. Do ANCOVA with next-most signiﬁcant DV as DV, and the selected DV as covariate. Signiﬁcant: include DV, repeat 3. Model assump=ons in (M)ANOVA 1. Independent observa=ons No rela=onship between cases Eﬀect of viola=on: es=mators of standard errors are generally too small: thus test liberal. Detec=on of viola=on: design, intraclass correla=on. Correc=on viola=on: test at more stringent level of signiﬁcance: smaller a -> decreased power. MLM 2. Distribu=onal assump=ons within each group ANOVA: observa=ons follow a normal distribu=on. MANOVA: mul=variate normal distribu=on of dependent variables. Detec=on: check the marginal distribu=ons of the individual variables: univariate normal distribu=ons. Check bivariate distribu=ons of all pairs of variables: bivariate normal distribu=ons. Check: graphs, sta=s=cs, tests Shapiro-Wilk. Check whether scaeerplot for each pair is ellip=cal – ellipse means associa=on. Correc=on: transforma=on DV, collect more data, data trimming, check for outliers, MLM. 3. (co)variances assump=ons within each group ANOVA: popula=on variances are equal. MANOVA: within-group covariances matrices are equal. Eﬀect: for equal group sizes actual a levels are very close to the nominal a levels – robust. For unequal group sizes: F liberal (large variances, small groups), conserva=ve (large variances, large groups) ð Balanced designs are very important. Checking: visual – compare matrices, Box’s M test. Correc=on: transforma=on individual variables, to stabilize the variances. MLM. 4. Linear rela=on between all DV’s ANOVA; n.a. Checking: scaeerplot of Yi vs Yj. Correc=on: transforma=ons, nonparametric MANOVA, if only one out of many DVs misbehaves: remove it. Post hoc comparisons Univariate ANOVA’s as follow-up of MANOVA. When tes=ng several hypothesis adjust cri=cal values of test. -> Bonferroni procedure a/m (number of tests). Contrasts in MANOVA Specify comparison under contrasts or via L-matrix in GLM module, or make your own transformed variables. Key idea: test combina=on of eﬀects. The fewer tests, the beeer. Prefer contrasts over mul=ple comparisons. Week 4 – Di)erence scores vs ANOVA Examples repeated measures: performance across k condi=ons with simplest design k=2 pre- and post measurement of one-group sample. Or performance across =me with at least k=3. Possible analyses Separate ANOVAs for each =me point. Insight into between-factor eﬀect at each =me point separately. No insight into =me eﬀect. Use pared t-test for each pair of =me points. K=2 -> OK of ANCOVA. But for k>2 not op=mal because of mul=ple hypothesis tes=ng, reduced power, disregarding associa=on between more than two measures. K=2: pretest-poseest designs 1 within-subject factor (=me) with two levels: thus two repeated measures on same subject. Diﬀerence score: d = y1-y0 One group; thus nu between-factor: H0: ud = 0 – popula=on mean of diﬀerence scores is 0. Equivalent to matched t-test and within-subjects ANOVA, 1 within-subject eﬀect. ANCOVA with pretest as covariate The post-measure is regressed on the pre-measure -> implies working with a corrected mean. poseest mean u1 is corrected for the pretest mean u0 using linear regression. Trying to predict post with the pre-score. ANCOVA y*=y1-b1y0 Diﬀerence score: implicit assump=on b1=1. This is usually not the case. In ANCOVA b1 is es=mated op=mally, which reduces the error variance, and increases power. But, ANCOVA should be valid approach: - Randomized designs ANCOVA is valid Randomized designs: random assignment of subject to groups, at popula=on level no diﬀerences in pre-measures between groups. Ancova and gain score analysis: test same hypothesis and es=mate same group diﬀerences. Ancova provides more power and precision than anova on gain scores, because the error variance is smaller, and is thus preferred. - Quasi-experimental design with natural groups ANCOVA possibly invalid. Mean pre-measured are not necessarily the same. Ancova assumes they are, diﬀerence score analysis compares groups as is. May lead to diﬀerent results. Mean pre are not necessarily the same but ANCOVA assumes they are. Range of scores between groups. => Lord’s paradox Lords’ paradox is when diﬀerent sta=s=cal approaches lead to seemingly contradictory conclusions about eﬀects of a treatment. Crucial ques=on is if group membership is unrelated to pretest score. If it is related ANOA gain scores, no change but ANCOVA diﬀerence between groups. Mixed design, k=2 =me points Within-subject factor with k=2 levels, and 1 between-subject factor (group). ANOVA on diﬀerence scores. Equivalent to within ANOVA tes=ng, 1 within eﬀect and 1 between eﬀect. Mixed design, k > 2 =me points Withing-subject factor with k > 2 levels, and between-subject factors. Possible analyses - Within subjects ANOVA Consider RM as a block-design Blocking factor: subjects - block on subjects to remove within-subjects variability from error variance. - Proﬁle analysis - Mul=level analysis It does not take into account the dependency between observa=ons within one subject, due to the speciﬁc proper=es of that subject: subject must be regarded as a factor. Consider subject as a separate factor: blocking. Univariate approach splits within-groups variability SSs(k) in two parts: 1. Interac=on individual diﬀerences with treatments SSsk 2. Individual diﬀerences due to subjects SSs. Blocking on subjects: remove subject mean taking individual varia=on out of equa=on. Subject SS measures consistent diﬀerences between subjects that aﬀect subject means. Treatment SS: withing subject eﬀect and therefor requires withing- subject error term. Interac=on MS: reﬂects the extent to which subjects respond diﬀerently to treatments. Assump=ons RM-ANOVA 1. Independent observa=ons 2. Univariate normality Per subject and =me point: impossible to assess, since there is only one observa=on per subject and =me point. Per =me point across subjects: possible to assess. 3. For k > 2: spericity: for all diﬀerence ariables between all pairs of k repeated measures, the variances are equal. Var (Drug1-Drug2) = Var (Drug2-Drug3) = Var (Drug3-Drug4) = etc Test with Mauchly’s test W. Problem: sensitive to departures from normality, lack of sensitivity to small violations. If violated: F too liberal, rejecting falsely too often, Epsilon correction If sphericity violated – epsilon correction. Adjust the degrees of freedom with Greenhouds & Geisser or Huynh & Feldt. GG – quite conservative (df become too small) HF – quite liberal For large n they are usually the same, for small n GG is safer. Epsilon is extent to which the covariance matrix deviates from sphericiy df or F are multiplied by e. Week 5 – Proﬁle Analysis (RM-MANOVA) If sphericity is violated -> RM-MANOVA with epsilon correc=on or RM-MANOVA. Types of designs 1 within-subject factor - Test of ﬂatness (= main eﬀect of =me) 1 within-subject factor and 1 between-subject eﬀect - Test of ﬂatness (=main eﬀect of =me) - Test of parallelism (=interac=on =me x gender) - Test of diﬀerence in levels (=main eﬀect of gender) 1 within-subject factor Proﬁle analysis with 1 within-subject factor with k levels equals analyzing k – 1 transformed measurement with MANOVA. Transformed measurements MANOVA on the (k – 1) transformed variables - Particular linear combination of k original measurements o H0: means of the transformed variables equal to 0 o e.g.,: T1=Y1 - Y2: T2 =Y3 - Y2; μ(T1)=μ(T2)=0 - k – 1 transformed variables may or may not be dependent, hence it is not assumed that the transformation takes care of all dependence between measurement contributed by one subject Which k-1 transformed variables have to be used? - Many equivalent choices, not all orthonormal, some via dummy contrasts: o Polynomial transformations (contrasts) are standard in SPSS. Polynominal = diTerence scores of consecutive time points. o DiTerence scores of consecutive time points - Invariance property: multivariate test statistic is the same for equivalent choices Coeﬁcients orthonormal if contrasts mul=plied and added up = 0, cross products = 0, length coeﬃcients = 1. Proﬁle analysis is trajectory over =me. Fixed and random eﬀects Fixed eﬀect ANOVA: - Only interest is eﬀects of the groups measured in the study. - Eﬀects es=mated uj for all groups j. - 1 source of random varia=on Eij. Random eﬀect ANOVA - Groups in the study are conceived as a random sample of the popula=on of groups. - Assump=on: - 2 sources of random varia=on Eij and aj. Choice depends on - Focus of sta=s=cal inference – research ques=on - The nature of the set of N groups – generalizability And is limited on - Sample sizes at all ‘random eﬀects levels’ 1. Number of groups 2. Magnitudes of the groups sample sizes nj. 500 employees from 3 companies -> ﬁxed eﬀect ANOVA or regression analysis with company as predictor using dummy. 500employees from 40 companies -> random eﬀect ANOVA or MLM Week 6 – Multilevel model Diﬀerent level/hierarchy of levels. Can be like diﬀerent skill level and speed. Key dis=nc=on is diﬀerence between ﬁxed and random eﬀect. Mul=level is a special regression model. Diﬀerent regressions for diﬀerent levels, and slide them into one another. Not necessarily the same amount of observa=ons per level 2 unit. Samples are drawn at two levels. Both level 2 and level 1 units are conceived as random samples from their popula=ons. Level 1 observa=ons are dependent within level 2 units. Individuals from one class are more coherent than between classes. Assump=on of independent observa=ons between level 2 units is violated. If level 2 units are groups: - Model la=ons between groups - Model rela=ons within groups - Both levels are modeled as random eﬀects - Represent within-group and between-group rela=ons in a single model. For repeated measures: level 2 units are individuals – replace in previous point groups by individuals. Level 2 units as groups Linear regression Assump=ons: - Independent observa=ons - Linear rela=on between y and x - Error term has constant variance σ2. - Error term e is normally distributed with mean 0 and variance σ2 , indpedent of x. - Error term represents random diﬀerences between observa=ons, summarized by the variance. Nota=on of mul=level model with 2 levels - j = 1, …., N groups (level 2 units) - i = 1, …, nj individuals in groups (level 1 units) - dependent variable yi -> yij - predictor at level 1 (individuals) x1 -> xij The basic idea of mul=level modeling is that the outcome variable yij has an individual as well as a group aspect. Mul=level is ‘ideal’ solu=on. Not separate regressions per group j, but one model in which diﬀerences in coeﬃcients (for groups) are modeld using random coeﬃcients. Own intercept and slope. Random intercept model Slope is the same but diﬀerent intercept. Base rate diﬀerences in dependent variable. 1. Diﬀerences between individuals 2. Diﬀerences between groups - B0j -> intercepts varies across groups. - Y00 -> mean intercept in popula=on of groups - Var(b0j) = var(u0j): variance of intercept reﬂects range of diﬀerences across groups in intercepts. U is small when there are small diﬀerences beteen groups. Why not use ANCOVA, with X as a covariate, factor as group? In ANCOVA, you treat the group as ﬁxed. In mlm eﬀect is random -> want to factor it in but is not the main focus of the research. Random eﬀect ANCOVA is the same as the random intercept model. Random slope model Adds B1j = Y10 + U1j - B0j and B1j -> intercepts and slopes vary over groups. - Y00 and Y10 -> mean intercept and slope in popula=on of groups. - Var(B0j) = var(U0j) and var(B1j) = var(U1j) -> variance of intercapt and slope indicate ‘range’ of plausible diﬀerences - Also cov(B0j, B1j) -> covariance between intercept and slope. Assump=ons - (U0j, U1j) and eij are independent - Y00 and Y10 are ﬁxed parameters. T20, t2q, t01, o20 are random. Sta=s=cal tests in the mlm - Fixed regression coeﬃcients -> usual t-test or likelihood ra=o test - Random coeﬃcients -> likelihood ra=o test only for nested models Level 2 units as individual – repeated measures version mlm Two-level structure of measurement within individuals - Level 1: measurements, at diﬀerent =me points or occasions. Explanatory variables: 1. (typically) =me itself; and/or 2. Time-dependent variables. - Level 2: individuals Explanatory variables: individual characteris=cs - Cross-level interac=ons Time by individual characteris=c Do males show a diﬀerent growth paeern than females? Research ques=ons typically involve change/development - Level 1 (intra or within-individual) How does outcome change/develop over =me? - Level 2 (inter or between-individual) Can diﬀerences in the changes be modeled or predicted? Need at least 3 measurements to model linear change. Time as explanatory variable at level 1 can via dummy variables, or func=on of =me. Dummy variable - Y00: intercept, expected mean of y at the age of 1 - Y10: slope for D1, diﬀerence in expected mean of y between the age of 1 and 1.5 - Y20: slope for D2, diﬀerence in expected mean of y between the age of 1 and 2. Linear func=on - Y00: intercept; expected mean of y at the age of 1 - Y10: slope for (AGE-1); diﬀerence in expected mean of y between the ages 1 year apart. OUen a more ﬂexible model is needed, such as a random slope model Null hypotheses On the ﬁxed part of the model 1. Between subjects: eﬀect of PROGRAM at age of 1 - There is no program eﬀect at age 1, H0: Y01 = 0 - No diﬀerence in intercepts between groups 2. Within subjects: eﬀect of =me - There is no =m eﬀect for no program, H0: Y10 = 0 - No diﬀerence in slopes: horizontal lines over =me 3. Within subjects: interac=on eﬀect - There is no interac=on eﬀect, H0: Y11 = 0 - Eﬀect of =me is the same for both groups Week 7 – Missing Data, RCTs and complex interventions Missing data Missing datum: no score when score was planned to have been gathered. - From the sampled par=cipants: par=cipant refuses to par=cipate or does not show up. - Par=cipant cannot or refuses to ‘deliver’ a score. - Loss of data, e.g., due to computer failure - Repeated measures design: drop-out of the study Reason why data is missing maeers, if it is related to the study is more harmful. If you have computer failure, and lost 10 percent irrespecvul to what you’re researching, not as harmful. Consequences of missing outcomes 1. Eﬀec=ve sample size smaller Implies: sta=s=cal tests have lower power, and larger standard errors of the sta=s=cs of interest. This always occurs with missing data, doesn’t maeer how loss of data was related to research. 2. Possibly biased result – sta=s=cal tests do not relate to your popula=on of interest, but the popula=on that your sample with observed data represents. Example: lost due to computer failure, no bias. However, if 10% refuse to answer, biased – people with higher income are more likely to refuse, the mean income of the 90% of the responders is lower than the actual mean income. Consequence of missing data depends on the nature of the missing datum. Liele & Rubin’s classiﬁca=on of missing data, form least severe to most severe 1. Missing completely at random (MCAR) Only eﬀec=ve sample size smaller, no bias. Truly unrelated to any aspect of your study. 2. Missing at random (MAR) Missing in both groups, for example CBT and SSRIs, but the missing is unrelated to the dependent variable. Unrelated to the outcome measure but have a diﬀerent rate of drop-out across groups. Eﬀec=ve sample size smaller, bias can be avoided with proper imputa=on, or model (using observed covariates) 3. Not missing at random (NMAR) Eﬀec=ve sample size smaller, and bias. For example, the more somber people drop-out, the people for who the treatment didn’t work anyway dropped-out. How do you diagnose the problem Not always possible, can’t always know which kind of missing happened. Pra=cal implica=ons - Do all you can to prevent missing data - If missing data occurs, keep note of the reason(s) - In planning the sample size of your study (power analysis), take into account expected aeri=on; be conserva=ve How to deal with it Throw people with missing data out. Par=cipants without any observed score: - Exclude from analysis, report them (number, causes of missing data), and reﬂect upon poten=al biases in the results that you did observe. Par=cipants with some observed scores: - Omit from analysis (complete case analysis) may yield basis. Only OK for very small numbers missing. - Impute (make an educated guess) their missing outcome data: if done correctly, may eliminate bias. - Use an analysis that can deal with this (i.e., mul=level analysis for repeated measures, with some measurements of outcome measure missing): if done correctly, may eliminate bias. Impute missing outcome data - Prior knowledge - Impute uncondi=onal means: ﬁlling in means o May introduce bias o Reduces variability - Using condi=onal means: ﬁlling in model predic=ons (e.g., regression imputa=on). Means corrected for this par=cipant, if pre and post was higher than the mean. Using informa=on available, take pre and post into account. o If model is proper, then no bias o Reduces variability – s=ll saying that the par=cipant behaves in this predictable way. - Using condi=onal distribu=ons: ﬁlling in draws from the distribu=ons of the model (e.g., ﬁlling in predic=ons plus random error). o If model is proper, then no bias and no reduc=on of variability. Single imputa=on – ﬁll in the missing data with the imputed values. Do this once. Because our guessed value of the missing data, is based on the observed data, we now end up with an analysis that uses the observed data twice. - Standard errors are underes=mated, p-values incorrect. - The larger the propor=on of missing data, the larger this issue. Mul=ple imputa=on – do this a number of =mes. - Analyse the mul=ple completed data sets as planned - Combine the results - Expresses both the uncertainty due to sampling ﬂuctua=ons( as always), and due to missingness Use an analysis that can deal with missing data Expecta=on maximiza=on methods e.g., mul=level analysis for repeated measures, with some measurements of outcome measure missing. Es=mates are made based on all available data. Es=mates are biased when par=cipants with data have a diﬀerent paeern than [par=cipants without data. Doesn’t work NMAR. Missing data handling in prac=ce 1. Prevent missing data as much as possible. 2. If it occurs, diagnose reason for missingness for each case. 3. Data analysis Very few – OK. Few missing – single imputa=on. More than a few – depending on nature and type, complete cases missing: describe them separately, NMAR mul=ple imputa=on. Randomized controlled trials (RCT) Principles Clinical lingo for an experiment. It is an experiment because we sample from a popula=on, randomly assign people to two condi=ons, assess the outcome and poten=ally do a follow-up assessment. RCT is oUen control versus interven=on. Alterna=ve trial designs: - Factorial design – 2x2 ANOVA Two trials for the price of one. More eﬃcient, rather than doing two separate experiment you do two at the same =me. Disadvantage: not the cleanest assessment whether drug a and b work, or if they interact. - Within group designs - repeated measures set-up. Disadvantage: inﬂuence of learning eﬀects? Regression to the mean? - Cross-over designs Advantage: minimizes poten=al for confounding. Increases power – smaller N is required. Disadvantage: doubling of the dura=on, poten=al carryover eﬀects. (Washout is wai=ng =ll eﬀect is over, or doing something in between so the eﬀects are over) Mechanisms of change Predictor: variable that predicts outcome of an RCT across all condi=ons. Moderator: on whom and under what circumstances treatments have diﬀerent eﬀects. Kind of like an interac=on. Inﬂuences strength of rela=onship. E.g., treatment more eﬀec=ve in women than men, but no gender eﬀect in control. Mediator: how and why treatment takes eﬀect. Pathways indica=ng causal rela=onships. Follow-up and adherence to protocol 1. Choose par=cipants who likely adhere to protocol 2. Make the interven=on simple 3. Make study visits convenient and enjoyable 4. Make study measurements painless, useful, and interes=ng 5. Encourage par=cipants to con=nue in the trial 6. Find par=cipants who are lost to follow-up Monitoring clinical trials 1. Stopping for harm Ensuring that no harm occurs to par=cipants. Make the interven=on simple. 2. Stopping for beneﬁt Stopping when clear beneﬁt has been shown. 3. Stopping for fu=lity Stopping at very low chance of answering ques=on Non-adherence to protocol Inten=on-to-treat versus per-protocol. It can happen that you assign some people to control and some to interven=on. It can happen that this changes during the trial. People may be part of interven=on, but don’t due the interven=on. So are placebo even though they were assigned interven=on. Do you treat these people as how they were assigned, or do you treat them how they wound up doing. If you do per-protocol -> no longer random assigned, par=cipants decided themselves they are part of placebo group by not doing the interven=on. Inten=on-to-treat: analysis the data as how people were assigned. Analysis condi=ons as were assigned. Per-protocol: analysis based on what people have actually done. Inten=on to treat likely provides a conserva=ve es=ma=on of treatment eﬀects (tendency to underes=mate the full eﬀect of a treatment), and per-protocol a liberal es=ma=on (tendency to overes=mate). Can also apply and report both. RCT: eﬀec=ve, compared to what? Ac=ve treatment withheld. - No treatment: simple and cheap, controls for =me, tes=ng, regression to mean. Downside: ethical issues, may lead to drop out, may lead to independent treatment seeking. - Waitlist: give one group the interven=on now, and the other group in three months. Plus: guarantees treatment, some control for expectancy eﬀects. Min: ethical issues related to delaying treatment, not indicated for long-term follow-up. - Placebo: plus: good control for expectancy beneﬁts, good control for non-speciﬁc treatment eﬀects, allows assessment of adverse eﬀects. Min: ethical issues of irrelevant treatment, not double-blind in experience-based studies. Usual care: - some kind of care you are providing to all pa=ent anyway, for half of the par=cpants you are going to resume the help they are gezng, and the other half gets the new care. Plus: most acceptable to pa=ents and treaters, ﬂexible to replace or superimpose usual care. Min: requires large N to achieve adequate power, usual care highly variable withing and between ins=u=ons. - Devised usual care: try to match the interven=on. Plus: guarantees minimum treatment, usually acceptable to par=cipants. Min: requires large N to achieve adequate power, may not be acceptable to par=cipants if too minimal (too much reduced). Ac=ve treatment comparisons - Dose control: plus: more ethical than no treatment, examina=on of dose-response rela=onship. Min: dose varia=on not always possible, limited hypothesis tes=ng, requires large N to achieve adequate power. - Dismantling design: slim down ac=ve treatment. Plus: acceptable to par=cipants, precise examina=on of ac=ve ingredients, for treatment with theore=cal jus=ﬁca=ons. Min: requires a priori knowledge of ac=ve ingredients, requires large N to achieve adequate power. - Equivalence trial: ﬁnd out if treatment is comparable, not if something works beeer or worse. Plus: acceptable to par=cipants. Min: may compromise internal validity, requires large N to achieve adequate power, requires signiﬁcant resources. Complex interven=ons Interven=on is complex because of its proper=es, e.g.: - Number of components involved - Range of behaviors targeted - Exper=se and skills required by those delivering and receiving the interven=on - The number of groups, sezngs, or levels targeted - The permieed level of ﬂexibility of the interven=on or its components. External vs. internal validity Eﬃcacy: does the interven=on produce the intended outcomes in experimental or ideal sezngs? Conducted under idealized condi=ons, maximizes internal validity to provide a precise, unbiased es=mate of eﬃcacy. Eﬀec=veness: does the interven=on produce the intended outcomes in real world sezngs? Interven=on oUen compared against treatment as usual. Results inform choices between an established and a novel approach to achieving outcome. Theory base: what works in which circumstances and how? Aims to understand how change is brought, including the interplay of mechanism and context. Can lead to reﬁnement of theory. Systems: how do the system and interven=on adapt to on another? Treats the interven=on as a disrup=on to a complex system.

Repeated Measures ANOVA Lecture PDF

Document Details

Tags

Related

Summary

Full Transcript