Experimental and Quasi-Experimental Designs in Implementation Research PDF

Summary

This article reviews experimental and quasi-experimental designs in implementation research, focusing on randomized controlled trials and quasi-experimental methods like pre-post designs and interrupted time series. It discusses how these methods differ from traditional trials and provides practical guidance for researchers.

Full Transcript

Psychiatry Research 283 (2020) 112452 Contents lists available at ScienceDirect Psychiatry Research journal...

Psychiatry Research 283 (2020) 112452 Contents lists available at ScienceDirect Psychiatry Research journal homepage: www.elsevier.com/locate/psychres Experimental and quasi-experimental designs in implementation research T a,b,⁎ c,d a Christopher J. Miller , Shawna N. Smith , Marianne Pugatch a VA Boston Healthcare System, Center for Healthcare Organization and Implementation Research (CHOIR), United States Department of Veterans Affairs, Boston, MA, USA b Department of Psychiatry, Harvard Medical School, Boston, MA, USA c Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI, USA d Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA ARTICLE INFO ABSTRACT Keywords: Implementation science is focused on maximizing the adoption, appropriate use, and sustainability of effective Implementation clinical practices in real world clinical settings. Many implementation science questions can be feasibly answered SMART design by fully experimental designs, typically in the form of randomized controlled trials (RCTs). Implementation- Quasi-experimental focused RCTs, however, usually differ from traditional efficacy- or effectiveness-oriented RCTs on key para- Pre-post with non-equivalent control group meters. Other implementation science questions are more suited to quasi-experimental designs, which are in- Interrupted time series tended to estimate the effect of an intervention in the absence of randomization. These designs include pre-post Stepped wedge designs with a non-equivalent control group, interrupted time series (ITS), and stepped wedges, the last of which require all participants to receive the intervention, but in a staggered fashion. In this article we review the use of experimental designs in implementation science, including recent methodological advances for implementation studies. We also review the use of quasi-experimental designs in implementation science, and discuss the strengths and weaknesses of these approaches. This article is therefore meant to be a practical guide for re- searchers who are interested in selecting the most appropriate study design to answer relevant implementation science questions, and thereby increase the rate at which effective clinical practices are adopted, spread, and sustained. 1. Background random assignment of subjects. This corresponds to the definition of randomized experiments originally championed by Fisher (1925). From The first documented clinical trial was conducted in 1747 by James this perspective, experimental designs usually take the form of Lind, a royal navy physician, who tested the hypothesis that citrus fruit RCTs—but implementation-oriented RCTs typically differ in important could cure scurvy. Since then, based on foundational work by ways from traditional efficacy- or effectiveness-oriented RCTs. Other Fisher (1935), the randomized controlled trial (RCT) has emerged as implementation science questions require different methodologies en- the gold standard for testing the efficacy of treatment versus a control tirely: specifically, several forms of quasi-experimental designs may be condition for individual patients. Randomization of patients is seen as a used for implementation research in situations where an RCT would be crucial to reducing the impact of measured or unmeasured confounding inappropriate. These designs are intended to estimate the effect of an variables, in turn allowing researchers to draw conclusions regarding intervention despite a lack of randomization. Quasi-experimental de- causality in clinical trials. signs include pre-post designs with a non-equivalent control group, As described elsewhere in this special issue, implementation science interrupted time series (ITS), and stepped wedge designs. Stepped is ultimately focused on maximizing the adoption, appropriate use, and wedges are studies in which all participants receive the intervention, sustainability of effective clinical practices in real world clinical set- but in a staggered fashion. It is important to note that quasi-experi- tings. As such, some implementation science questions may be ad- mental designs are not unique to implementation science. As we will dressed by experimental designs. For our purposes here, we use the discuss below, however, each of them has strengths that make them term “experimental” to refer to designs that feature two essential in- particularly useful in certain implementation science contexts. gredients: first, manipulation of an independent variable; and second, Our goal for this manuscript is two-fold. First, we will summarize ⁎ Corresponding author at: VA Boston Healthcare System, Center for Healthcare Organization and Implementation Research (CHOIR), United States Department of Veterans Affairs, 150 S. Huntington Ave. (152M), Boston, MA, 02130, USA. E-mail address: [email protected] (C.J. Miller). https://doi.org/10.1016/j.psychres.2019.06.027 Received 28 March 2019; Received in revised form 18 June 2019; Accepted 19 June 2019 Available online 20 June 2019 0165-1781/ © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/). C.J. Miller, et al. Psychiatry Research 283 (2020) 112452 the use of experimental designs in implementation science. This will 2.1. Optimization trials include discussion of ways that implementation-focused RCTs may differ from efficacy- or effectiveness-oriented RCTs. Second, we will Key research questions in implementation science often involve summarize the use of quasi-experimental designs in implementation determining which implementation strategies to provide, to whom, and research. This will include discussion of the strengths and weaknesses when, to achieve optimal implementation success. As such, trials de- of these types of approaches in answering implementation research signed to evaluate comparative effectiveness, or to optimize provision questions. For both experimental and quasi-experimental designs, we of different types or intensities of implementation strategies, may be will discuss a recent implementation study as an illustrative example of more appealing than traditional effectiveness trials. The methods de- one approach. scribed in this section are not unique to implementation science, but their application in the context of implementation trials may be parti- cularly useful for informing implementation strategies. 2. Experimental designs in implementation science While two-arm RCTs can be used to evaluate comparative effec- tiveness, trials focused on optimizing implementation support may use RCTs in implementation science share the same basic structure as alternative experimental designs (Collins et al., 2005, 2007). For ex- efficacy- or effectiveness-oriented RCTs, but typically feature important ample, in certain clinical contexts, multi-component “bundles” of im- distinctions. In this section we will start by reviewing key factors that plementation strategies may be warranted (e.g., a bundle consisting of separate implementation RCTs from more traditional efficacy- or ef- clinician training, technical assistance, and audit/feedback to en- fectiveness-oriented RCTs. We will then discuss optimization trials, courage clinicians to use a new evidence-based practice). In these si- which are a type of experimental design that is especially useful for tuations, implementation researchers might consider using factorial or certain implementation science questions. We will then briefly turn our fractional-factorial designs. In the context of implementation science, attention to single subject experimental designs (SSEDs) and on-off-on these designs randomize participants (e.g., sites or providers) to dif- (ABA) designs. ferent combinations of implementation strategies, and can be used to The first common difference that sets apart implementation RCTs evaluate the effectiveness of each strategy individually to inform an from more traditional clinical trials is the primary research question optimal combination (e.g., Coulton et al., 2009; Pellegrini et al., 2014; they aim to address. For most implementation trials, the primary re- Wyrick et al., 2014). Such designs can be particularly useful in in- search question is not the extent to which a particular treatment or forming multi-component implementation strategies that are not re- evidence-based practice is more effective than a comparison condition, dundant or overly burdensome (Collins et al., 2014a, 2009, 2007). but instead the extent to which a given implementation strategy is more Researchers interested in optimizing sequences of implementation effective than a comparison condition. For more detail on this pivotal strategies that adapt to ongoing needs over time may be interested in a issue, see Drs. Bauer and Kirchner in this special issue. variant of factorial designs known as the sequential, multiple-assign- Second, as a corollary of this point, implementation RCTs typically ment randomized trial (SMART; Almirall et al., 2012; Collins et al., feature different outcome measures than efficacy or effectiveness RCTs, 2014b; Kilbourne et al., 2014b; Lei et al., 2012; Nahum-Shani et al., with an emphasis on the extent to which a health intervention was 2012; NeCamp et al., 2017). SMARTs are multistage randomized trials successfully implemented rather than an evaluation of the health effects in which some or all participants are randomized more than once, often of that intervention (Proctor et al., 2011). For example, typical im- based on ongoing information (e.g., treatment response). In im- plementation outcomes might include the number of patients who re- plementation research, SMARTs can inform optimal sequences of im- ceive the intervention, or the number of providers who administer the plementation strategies to maximize downstream clinical outcomes. intervention as intended. A variety of evaluation-oriented im- Thus, such designs are well-suited to answering questions about what plementation frameworks may guide the choices of such measures (e.g., implementation strategies should be used, in what order, to achieve the RE-AIM; Gaglio et al., 2013; Glasgow et al., 1999). Hybrid im- best outcomes in a given context. plementation-effectiveness studies attend to both effectiveness and One example of an implementation SMART is the Adaptive implementation outcomes (Curran et al., 2012); these designs are also Implementation of Effective Program Trial (ADEPT; Kilbourne et al., covered in more detail elsewhere in this issue (Landes, this issue). 2014a). ADEPT was a clustered SMART (NeCamp et al., 2017) designed Third, given their focus, implementation RCTs are frequently to inform an adaptive sequence of implementation strategies for im- cluster-randomized (i.e. with sites or clinics as the unit of randomiza- plementing an evidence-based collaborative chronic care model, Life tion, and patients nested within those sites or clinics). For example, Goals (Kilbourne et al., 2014c, 2012a), into community-based practices. consider a hypothetical RCT that aims to evaluate the implementation Life Goals, the clinical intervention being implemented, has proven of a training program for cognitive behavioral therapy (CBT) in com- effective at improving physical and mental health outcomes for patients munity clinics. Randomizing at the patient level for such a trial would with unipolar and bipolar depression by encouraging providers to in- be inappropriate due to the risk of contamination, as providers trained struct patients in self-management, and improving clinical information in CBT might reasonably be expected to incorporate CBT principles into systems and care management across physical and mental health pro- their treatment even to patients assigned to the control condition. viders (Bauer et al., 2006; Kilbourne et al., 2012a, 2008; Simon et al., Randomizing at the provider level would also risk contamination, as 2006). However, in spite of its established clinical effectiveness, com- providers trained in CBT might discuss this treatment approach with munity-based clinics experienced a number of barriers in trying to their colleagues. Thus, many implementation trials are cluster rando- implement the Life Goals model, and there were questions about how mized at the site or clinic level. While such clustering minimizes the risk best to efficiently and effectively augment implementation strategies of contamination, it can unfortunately create commensurate problems for clinics that struggled with implementation. with confounding, especially for trials with very few sites to randomize. The ADEPT study was thus designed to determine the best sequence Stratification may be used to at least partially address confounding is- of implementation strategies to offer sites interested in implementing sues in cluster-randomized and more traditional trials alike, by ensuring Life Goals. The ADEPT study involved use of three different im- that intervention and control groups are broadly similar on certain key plementation strategies. First, all sites received implementation support variables. Furthermore, such allocation schemes typically require ana- based on Replicating Effective Programs (REP), which offered an im- lytic models that account for this clustering and the resulting correla- plementation manual, brief training, and low-level technical support tions among error structures (e.g., generalized estimating equations (Kilbourne et al., 2007, 2012b; Neumann and Sogolow, 2000). REP [GEE] or mixed-effects models; Schildcrout et al., 2018). implementation support had been previously found to be low-cost and readily scalable, but also insufficient for uptake for many community- 2 C.J. Miller, et al. Psychiatry Research 283 (2020) 112452 Fig. 1. SMART design from ADEPT trial. based settings (Kilbourne et al., 2015). For sites that failed to imple- comparison of these three adaptive sequences of implementation stra- ment Life Goals under REP, two additional implementation strategies tegies, results have shown that patients at sites that were randomized to were considered as augmentations to REP: External Facilitation (EF; receive EF as the initial augmentation to REP saw more improvement in Kilbourne et al., 2014b; Stetler et al., 2006), consisting of phone-based clinical outcomes (SF-12 mental health quality of life and PHQ-9 de- mentoring in strategic skills from a study team member; and Internal pression scores) after 12 months than patients at sites that were ran- Facilitation (IF; Kirchner et al., 2014), which supported protected time domized to receive the more intensive EF/IF augmentation. for a site employee to address barriers to program adoption. The ADEPT study was designed to evaluate the best way to augment 2.2. Single subject experimental designs and on-off-on (ABA) designs support for these sites that were not able to implement Life Goals under REP, specifically querying whether it was better to augment REP with We also note that there are a variety of Single Subject Experimental EF only or the more intensive EF/IF, and whether augmentations Designs (SSEDs; Byiers et al., 2012), including withdrawal designs and should be provided all at once, or staged. Intervention assignments are alternating treatment designs, that can be used in testing evidence- mapped in Fig. 1. Seventy-nine community-based clinics across Mi- based practices. Similarly, an implementation strategy may be used to chigan and Colorado were provided with initial implementation sup- encourage the use of a specific treatment at a particular site, followed port under REP. After six months, implementation of the clinical in- by that strategy's withdrawal and subsequent reinstatement, with data tervention, Life Goals, was evaluated at all sites. Sites that had failed to collection throughout the process (on-off-on or ABA design). A weak- reach an adequate level of delivery (defined as those sites enrolling ness of these approaches in the context of implementation science, fewer than ten patients in Life Goals, or those at which fewer than 50% however, is that they usually require reversibility of the intervention of enrolled patients had received at least three Life Goals sessions) were (i.e. that the withdrawal of implementation support truly allows the considered non-responsive to REP and randomized to receive additional healthcare system to revert to its pre-implementation state). When this support through either EF or combined EF/IF. After six further months, is not the case—for example, if a hypothetical study is focused on Life Goals implementation at these sites was again evaluated. Sites training to encourage use of an evidence-based psychotherapy—then surpassing the implementation response benchmark had their EF or EF/ these designs may be less useful. IF support discontinued. EF/IF sites that remained non-responsive continued to receive EF/IF for an additional six months. EF sites that 3. Quasi-experimental designs in implementation science remained non-responsive were randomized a second time to either continue with EF or further augment with IF. This design thus allowed In some implementation science contexts, policy-makers or admin- for comparison of three different adaptive implementation interven- istrators may not be willing to have a subset of participating patients or tions for sites that were initially non-responsive to REP to determine the sites randomized to a control condition, especially for high-profile or best adaptive sequence of implementation support for sites that were high-urgency clinical issues. Quasi-experimental designs allow im- initially non-responsive under REP: plementation scientists to conduct rigorous studies in these contexts, albeit with certain limitations. We briefly review the characteristics of Provide EF for 6 months; continue EF for a further six months for these designs here; other recent review articles are available for the sites that remain non-responsive; discontinue EF for sites that are interested reader (e.g., Handley et al., 2018). responsive; Provide EF/IF for 6 months; continue EF/IF for a further six months 3.1. Pre-post with non-equivalent control group for sites that remain non-responsive; discontinue EF/IF for sites that are responsive; and The pre-post with non-equivalent control group uses a control group Provide EF for 6 months; step up to EF/IF for a further six months in the absence of randomization. Ideally, the control group is chosen to for sites that remain non-responsive; discontinue EF for sites that are be as similar to the intervention group as possible (e.g., by matching on responsive. factors such as clinic type, patient population, geographic region, etc.). Theoretically, both groups are exposed to the same trends in the en- While analyses of this study are still ongoing, including the vironment, making it plausible to decipher if the intervention had an 3 C.J. Miller, et al. Psychiatry Research 283 (2020) 112452 effect. Measurement of both treatment and control conditions classi- mortality in Slovenia (Pridemore and Snowden, 2009); and the effect of cally occurs pre- and post-intervention, with differential improvement delivery of a tailored intervention for primary care providers to increase between the groups attributed to the intervention. This design is pop- psychological referrals for women with mild to moderate postnatal ular due to its practicality, especially if data collection points can be depression (Hanbury et al., 2013). kept to a minimum. It may be especially useful for capitalizing on ITS designs are appealing in implementation work for several rea- naturally occurring experiments such as may occur in the context of sons. Relative to uncontrolled pre-post analyses, ITS analyses reduce certain policy initiatives or rollouts—specifically, rollouts in which it is the chances that intervention effects are confounded by secular trends plausible that a control group can be identified. For example, Kirchner e (Bernal et al., 2017; Eccles et al., 2003). Time-varying confounders, al. (2014) used this type of design to evaluate the integration of mental such as seasonality, can also be adjusted for, provided adequate data health services into primary care clinics at seven US Department of (Bernal et al., 2017). Indeed, recent work has confirmed that ITS de- Veterans Affairs (VA) medical centers and seven matched controls. signs can yield effect estimates similar to those derived from cluster- One overarching drawback of this design is that it is especially randomized RCTs (Fretheim et al., 2013, 2015). Relative to an RCT, ITS vulnerable to threats to internal validity (Shadish et al., 2002), because designs can also allow for a more comprehensive assessment of the pre-existing differences between the treatment and control group could longitudinal effects of an intervention (positive or negative), as effects erroneously be attributed to the intervention. While unmeasured dif- can be traced over all included time points (Bernal et al., 2017; Penfold ferences between treatment and control groups are always a possibility and Zhang, 2013). in healthcare research, such differences are especially likely to occur in ITS designs also present a number of challenges. First, the seg- the context of these designs due to the lack of randomization. Similarly, mented regression approach requires clear delineation between pre- this design is particularly sensitive to secular trends that may differ- and post-intervention periods; interventions with indeterminate im- entially affect the treatment and control groups (Cousins et al., 2014; plementation periods are likely not good candidates for ITS. While ITS Pape et al., 2013), as well as regression to the mean confounding study designs that include multiple ‘interruptions’ (e.g., introductions of new results (Morton and Torgerson, 2003). For example, if a study site is treatment components) are possible, they will require collection of selected for the experimental condition precisely because it is under- enough time points between interruptions to ensure that each inter- performing in some way, then regression to the mean would suggest vention's effects can be ascertained individually (Bernal et al., 2017). that the site will show improvement regardless of any intervention; in Second, collecting data from sufficient time points across all sites of the context of a pre-post with non-equivalent control group study, interest, especially for the pre-intervention period, can be challenging however, this improvement would erroneously be attributed to the (Eccles et al., 2003): a common recommendation is at least eight time intervention itself (Type I error). points both pre- and post-intervention (Penfold and Zhang, 2013). This There are, however, various ways that implementation scientists may be onerous, particularly if the data are not routinely collected by can mitigate these weaknesses. First, as mentioned briefly above, it is the health system(s) under study. Third, ITS cannot protect against important to select a control group that is as similar as possible to the confounding effects from other interventions that begin con- intervention site(s), which can include matching at both the health care temporaneously and may impact similar outcomes (Eccles et al., 2003). network and clinic level (e.g., Kirchner et al., 2014). Second, propensity score weighting (e.g., Morgan, 2018) can statistically mitigate internal 3.3. Stepped wedge designs validity concerns, although this approach may be of limited utility when comparing secular trends between different study cohorts Stepped wedge trials are another type of quasi-experimental design. (Dimick and Ryan, 2014). More broadly, qualitative methods (e.g., In a stepped wedge, all participants receive the intervention, but are periodic interviews with staff at intervention and control sites) can help assigned to the timing of the intervention in a staggered fashion (Betran uncover key contextual factors that may be affecting study results et al., 2018; Brown and Lilford, 2006; Hussey and Hughes, 2007), ty- above and beyond the intervention itself. pically at the site or cluster level. Stepped wedge designs have their analytic roots in balanced incomplete block designs, in which all pairs 3.2. Interrupted time series of treatments occur an equal number of times within each block (Hanani, 1961). Traditionally, all sites in stepped wedge trials have Interrupted time series (ITS; Shadish et al., 2002; Taljaard et al., outcome measures assessed at all time points, thus allowing sites that 2014; Wagner et al., 2002) designs represent one of the most robust receive the intervention later in the trial to essentially serve as controls categories of quasi-experimental designs. Rather than relying on a non- for early intervention sites. A recent special issue of the journal Trials equivalent control group, ITS designs rely on repeated data collections includes more detail on these designs (Davey et al., 2015), which may from intervention sites to determine whether a particular intervention be ideal for situations in which it is important for all participating pa- is associated with improvement on a given metric relative to the pre- tients or sites to receive the intervention during the trial. Stepped intervention secular trend. They are particularly useful in cases where a wedge trials may also be useful when resources are scarce enough that comparable control group cannot be identified—for example, following intervening at all sites at once (or even half of the sites as in a standard widespread implementation of policy mandates, quality improvement treatment-versus-control RCT) would not be feasible. If desired, the initiatives, or dissemination campaigns (Eccles et al., 2003). In ITS administration of the intervention to sites in waves allows for lessons designs, data are collected at multiple time points both before and after learned in early sites to be applied to later sites (via formative eva- an intervention (e.g., policy change, implementation effort), and ana- luation; see Elwy et al., this issue). lyses explore whether the intervention was associated with the outcome The Behavioral Health Interdisciplinary Program (BHIP) beyond any pre-existing secular trend. More formally, ITS evaluations Enhancement Project is a recent example of a stepped-wedge im- focus on identifying whether there is discontinuity in the trend (change plementation trial (Bauer et al., 2016, 2019). This study involved using in slope or level) after the intervention relative to before the inter- blended facilitation (including internal and external facilitators; vention, using segmented regression to model pre- and post-interven- Kirchner et al., 2014) to implement care practices consistent with the tion trends (Gebski et al., 2012; Penfold and Zhang, 2013; Taljaard collaborative chronic care model (CCM; Bodenheimer et al., 2002a, et al., 2014; Wagner et al., 2002). A number of recent implementation 2002b; Wagner et al., 1996) in nine outpatient mental health teams in studies have used ITS designs, including an evaluation of implementa- VA medical centers. Fig. 2 illustrates the implementation and stepdown tion of a comprehensive smoke-free policy in a large UK mental health periods for that trial, with black dots representing primary data col- organization to reduce physical assaults (Robson et al., 2017); the im- lection points. pact of a national policy limiting alcohol availability on suicide The BHIP Enhancement Project was conducted as a stepped wedge 4 C.J. Miller, et al. Psychiatry Research 283 (2020) 112452 Fig. 2. BHIP Enhancement Project stepped wedge (adapted form Bauer et al., 2019). for several reasons. First, the stepped wedge design allowed the trial to for evaluating the impact of health interventions that (as with all de- reach nine sites despite limited implementation resources (i.e., inter- signs) are subject to certain weaknesses and limitations. vening at all nine sites simultaneously would not have been feasible given study funding). Second, the stepped wedge design aided in re- 4. Conclusions and future directions cruitment and retention, as all participating sites were certain to receive implementation support during the trial: at worst, sites that were ran- Implementation science is focused on maximizing the extent to domized to later-phase implementation had to endure waiting periods which effective healthcare practices are adopted, used, and sustained totaling about eight months before implementation began. This was by clinicians, hospitals, and systems. Answering questions in these seen as a major strength of the design by its operational partner, the VA domains frequently requires different research methods than those Office of Mental Health and Suicide Prevention. To keep sites engaged employed in traditional efficacy- or effectiveness-oriented randomized during the waiting period, the BHIP Enhancement Project offered a clinical trials (RCTs). Implementation-oriented RCTs typically feature guiding workbook and monthly technical support conference calls. cluster or site-level randomization, and emphasize implementation Three additional features of the BHIP Enhancement Project deserve outcomes (e.g., the number of patients receiving the new treatment as special attention. First, data collection for late-implementing sites did intended) rather than traditional clinical outcomes. Hybrid im- not begin until immediately before the onset of implementation support plementation-effectiveness designs incorporate both types of outcomes; (see Fig. 2). While this reduced statistical power, it also significantly more details on these approaches can be found elsewhere in this special reduced data collection burden on the study team. Second, onset of issue (Landes, this issue). Other methodological innovations, such as implementation support was staggered such that wave 2 began at the factorial designs or sequential, multiple-assignment randomized trials end of month 4 rather than month 6. This had two benefits: first, this (SMARTs), can address questions about multi-component or adaptive compressed the overall amount of time required for implementation interventions, still under the umbrella of experimental designs. These during the trial. Second, it meant that the study team only had to collect types of trials may be especially important for demystifying the “black data from one site at a time, with data collection periods coming every box” of implementation—that is, determining what components of an 2–4 months. More traditional stepped wedge approaches typically have implementation strategy are most strongly associated with im- data collection across sites temporally aligned (e.g., Betran et al., 2018). plementation success. In contrast, pre-post designs with non-equivalent Third, the BHIP Enhancement Project used a balancing algorithm control groups, interrupted time series (ITS), and stepped wedge de- (Lew et al., 2019) to assign sites to waves, retaining some of the benefits signs are all examples of quasi-experimental designs that may serve of randomization while ensuring balance on key site characteristics implementation researchers when experimental designs would be in- (e.g., size, geographic region). appropriate. A major theme cutting across each of these designs is that Despite their utility, stepped wedges have some important limita- there are relative strengths and weaknesses associated with any study tions. First, because they feature delayed implementation at some sites, design decision. Determining what design to use ultimately will need to stepped wedges typically take longer than similarly-sized parallel group be informed by the primary research question to be answered, while RCTs. This increases the chances that secular trends, policy changes, or simultaneously balancing the need for internal validity, external va- other external forces impact study results. Second, as with RCTs, im- lidity, feasibility, and ethics. balanced site assignment can confound results. This may occur delib- New innovations in study design are constantly being developed and erately in some cases—for example, if sites that develop their im- refined. Several such innovations are covered in other articles within plementation plans first are assigned to earlier waves. Even if sites are this special issue (e.g., Kim et al., this issue). One future direction re- randomized, however, early and late wave sites may still differ on levant to the study designs presented in this article is the potential for important characteristics such as size, rurality, and case mix. The re- adaptive trial designs, which allow information gleaned during the trial sulting confounding between site assignment and time can threaten the to inform the adaptation of components like treatment allocation, internal validity of the study—although, as above, balancing algorithms sample size, or study recruitment in the later phases of the same trial can reduce this risk. Third, the use of formative evaluation (Elwy, this (Pallmann et al., 2018). These designs are becoming increasingly pop- issue), while useful for maximizing the utility of implementation efforts ular in clinical treatment (Bhatt and Mehta, 2016) but could also hold in a stepped wedge, can mean that late-wave sites receive different promise for implementation scientists, especially as interest grows in implementation strategies than early-wave sites. Similarly, formative rapid-cycle testing of implementation strategies or efforts. Adaptive evaluation may inform midstream adaptations to the clinical innovation designs could potentially be incorporated into both SMART designs and being implemented. In either case, these changes may again threaten stepped wedge studies, as well as traditional RCTs to further advance internal validity. Overall, then, stepped wedges represent useful tools implementation science (Cheung et al., 2015). Ideally, these and other 5 C.J. Miller, et al. Psychiatry Research 283 (2020) 112452 innovations will provide researchers with increasingly robust and K.L., 2015. Analysis and reporting of stepped wedge randomised controlled trials: useful methodologies for answering timely implementation science synthesis and critical appraisal of published studies, 2010 to 2014. Trials 16 (1), 358. Dimick, J.B., Ryan, A.M., 2014. Methods for evaluating changes in health care policy: the questions. difference-in-differences approach. JAMA 312 (22), 2401–2402. Eccles, M., Grimshaw, J., Campbell, M., Ramsay, C., 2003. Research designs for studies Funding evaluating the effectiveness of change and improvement strategies. Qual. Saf. Health Care 12 (1), 47–52. Fisher, R.A., 1925, July, July. Theory of statistical estimation. In: Mathematical This work was supported by Department of Veterans Affairs grants Proceedings of the Cambridge Philosophical Society. 22. Cambridge University Press, QUE 15–289 (PI: Bauer) and CIN 13–403 and National Institutes of pp. 700–725. Fisher, R.A., 1935. The Design of Experiments. Oliver and Boyd, Edinburgh. Health grant RO1 MH 099898 (PI: Kilbourne). Fretheim, A., Soumerai, S.B., Zhang, F., Oxman, A.D., Ross-Degnan, D., 2013. Interrupted time-series analysis yielded an effect estimate concordant with the cluster-rando- Supplementary materials mized controlled trial result. J. Clin. Epidemiol. 66 (8), 883–887. Fretheim, A., Zhang, F., Ross-Degnan, D., Oxman, A.D., Cheyne, H., Foy, R., Goodacre, S., Herrin, J., Kerse, N., McKinlay, R.J., Wright, A., Soumerai, S.B., 2015. A reanalysis of Supplementary material associated with this article can be found, in cluster randomized trials showed interrupted time-series studies were valuable in the online version, at doi:10.1016/j.psychres.2019.06.027. health system evaluation. J. Clin. Epidemiol. 68 (3), 324–333. Gaglio, B., Shoup, J.A., Glasgow, R.E., 2013. The RE-AIM framework: a systematic review of use over time. Am. J. Public Health 103 (6), e38–e46. References Gebski, V., Ellingson, K., Edwards, J., Jernigan, J., Kleinbaum, D., 2012. Modelling in- terrupted time series to evaluate prevention and control of infection in healthcare. Almirall, D., Compton, S.N., Gunlicks-Stoessel, M., Duan, N., Murphy, S.A., 2012. Epidemiol. Infect. 140 (12), 2131–2141. Designing a pilot sequential multiple assignment randomized trial for developing an Glasgow, R.E., Vogt, T.M., Boles, S.M., 1999. Evaluating the public health impact of adaptive treatment strategy. Stat. Med. 31 (17), 1887–1902. health promotion interventions: the RE-AIM framework. Am. J. Public Health 89 (9), Bauer, M.S., McBride, L., Williford, W.O., Glick, H., Kinosian, B., Altshuler, L., Beresford, 1322–1327. T., Kilbourne, A.M., Sajatovic, M., Cooperative Studies Program 430 Study, T., 2006. Hanani, H., 1961. The existence and construction of balanced incomplete block designs. Collaborative care for bipolar disorder: part I. Impact on clinical outcome, function, Ann. Math. Stat. 32 (2), 361–386. and costs. Psychiatr. Serv. 57 (7), 937–945. Hanbury, A., Farley, K., Thompson, C., Wilson, P.M., Chambers, D., Holmes, H., 2013. Bauer, M.S., Miller, C., Kim, B., Lew, R., Weaver, K., Coldwell, C., Henderson, K., Holmes, Immediate versus sustained effects: interrupted time series analysis of a tailored in- S., Seibert, M.N., Stolzmann, K., Elwy, A.R., Kirchner, J., 2016. Partnering with tervention. Implement Sci 8 (1), 130. health system operations leadership to develop a controlled implementation trial. Handley, M.A., Lyles, C.R., McCulloch, C., Cattamanchi, A., 2018. Selecting and im- Implement Sci. 11 (1), 22. proving quasi-experimental designs in effectiveness and implementation research. Bauer, M.S., Miller, C.J., Kim, B., Lew, R., Stolzmann, K., Sullivan, J., Riendeau, R., Annu. Rev. Public Health 39 (1), 5–25. Pitcock, J., Williamson, A., Connolly, S., Elwy, A.R., Weaver, K., 2019. Effectiveness Hussey, M.A., Hughes, J.P., 2007. Design and analysis of stepped wedge cluster rando- of implementing a collaborative chronic care model for clinician teams on patient mized trials. Contemp. Clin. Trials 28 (2), 182–191. outcomes and health status in mental Health: a randomized clinical trial. JAMA Kilbourne, A.M., Almirall, D., Eisenberg, D., Waxmonsky, J., Goodrich, D.E., Fortney, Netw. Open 2 (3), e190230. J.C., Kirchner, J.E., Solberg, L.I., Main, D., Bauer, M.S., Kyle, J., Murphy, S.A., Nord, Bernal, J.L., Cummins, S., Gasparrini, A., 2017. Interrupted time series regression for the K.M., Thomas, M.R., 2014Aa. Protocol: adaptive implementation of effective pro- evaluation of public health interventions: a tutorial. Int. J. Epidemiol. 46 (1), grams trial (ADEPT): cluster randomized SMART trial comparing a standard versus 348–355. enhanced implementation strategy to improve outcomes of a mood disorders pro- Betran, A.P., Bergel, E., Griffin, S., Melo, A., Nguyen, M.H., Carbonell, A., Mondlane, S., gram. Implement Sci. 9 (1), 132. Merialdi, M., Temmerman, M., Gulmezoglu, A.M., 2018. Provision of medical supply Kilbourne, A.M., Almirall, D., Goodrich, D.E., Lai, Z., Abraham, K.M., Nord, K.M., kits to improve quality of antenatal care in Mozambique: a stepped-wedge cluster Bowersox, N.W., 2014Ab. Enhancing outreach for persons with serious mental illness: randomised trial. Lancet Glob. Health 6 (1), e57–e65. 12-month results from a cluster randomized trial of an adaptive implementation Bhatt, D.L., Mehta, C., 2016. Adaptive designs for clinical trials. N. Engl. J. Med. 375 (1), strategy. Implement Sci. 9 (1), 163. 65–74. Kilbourne, A.M., Bramlet, M., Barbaresso, M.M., Nord, K.M., Goodrich, D.E., Lai, Z., Post, Bodenheimer, T., Wagner, E.H., Grumbach, K., 2002Aa. Improving primary care for pa- E.P., Almirall, D., Verchinina, L., Duffy, S.A., Bauer, M.S., 2014Ac. SMI life goals: tients with chronic illness. JAMA 288 (14), 1775–1779. description of a randomized trial of a collaborative care model to improve outcomes Bodenheimer, T., Wagner, E.H., Grumbach, K., 2002Ab. Improving primary care for pa- for persons with serious mental illness. Contemp. Clin. Trials 39 (1), 74–85. tients with chronic illness: the chronic care model, Part 2. JAMA 288 (15), Kilbourne, A.M., Goodrich, D.E., Lai, Z., Clogston, J., Waxmonsky, J., Bauer, M.S., 1909–1914. 2012Aa. Life goals collaborative care for patients with bipolar disorder and cardio- Brown, C.A., Lilford, R.J., 2006. The stepped wedge trial design: a systematic review. vascular disease risk. Psychiatr. Serv. 63 (12), 1234–1238. BMC Med. Res. Methodol. 6 (1), 54. Kilbourne, A.M., Goodrich, D.E., Nord, K.M., Van Poppelen, C., Kyle, J., Bauer, M.S., Byiers, B.J., Reichle, J., Symons, F.J., 2012. Single-subject experimental design for evi- Waxmonsky, J.A., Lai, Z., Kim, H.M., Eisenberg, D., Thomas, M.R., 2015. Long-term dence-based practice. Am. J. Speech Lang. Pathol. 21 (4), 397–414. clinical outcomes from a randomized controlled trial of two implementation strate- Cheung, Y.K., Chakraborty, B., Davidson, K.W., 2015. Sequential multiple assignment gies to promote collaborative care attendance in community practices. Adm. Policy randomized trial (SMART) with adaptive randomization for quality improvement in Ment. Health 42 (5), 642–653. depression treatment program. Biometrics 71 (2), 450–459. Kilbourne, A.M., Neumann, M.S., Pincus, H.A., Bauer, M.S., Stall, R., 2007. Implementing Collins, L.M., Dziak, J.J., Kugler, K.C., Trail, J.B., 2014Aa. Factorial experiments: efficient evidence-based interventions in health care: application of the replicating effective tools for evaluation of intervention components. Am. J. Prev. Med. 47 (4), 498–504. programs framework. Implement Sci. 2 (1), 42. Collins, L.M., Dziak, J.J., Li, R., 2009. Design of experiments with multiple independent Kilbourne, A.M., Neumann, M.S., Waxmonsky, J., Bauer, M.S., Kim, H.M., Pincus, H.A., variables: a resource management perspective on complete and reduced factorial Thomas, M., 2012Ab. Public-academic partnerships: evidence-based implementation: designs. Psychol. Methods 14 (3), 202–224. the role of sustained community-based practice and research partnerships. Psychiatr. Collins, L.M., Murphy, S.A., Nair, V.N., Strecher, V.J., 2005. A strategy for optimizing and Serv. 63 (3), 205–207. evaluating behavioral interventions. Ann. Behav. Med. 30 (1), 65–73. Kilbourne, A.M., Post, E.P., Nossek, A., Drill, L., Cooley, S., Bauer, M.S., 2008. Improving Collins, L.M., Murphy, S.A., Strecher, V., 2007. The multiphase optimization strategy medical and psychiatric outcomes among individuals with bipolar disorder: a ran- (MOST) and the sequential multiple assignment randomized trial (SMART): new domized controlled trial. Psychiatr. Serv. 59 (7), 760–768. methods for more potent eHealth interventions. Am. J. Prev. Med. 32 (5 Suppl), Kirchner, J.E., Ritchie, M.J., Pitcock, J.A., Parker, L.E., Curran, G.M., Fortney, J.C., 2014. S112–S118. Outcomes of a partnered facilitation strategy to implement primary care-mental Collins, L.M., Nahum-Shani, I., Almirall, D., 2014Ab. Optimization of behavioral dynamic health. J. Gen. Intern. Med. 29 (Suppl 4), 904–912. treatment regimens based on the sequential, multiple assignment, randomized trial Lei, H., Nahum-Shani, I., Lynch, K., Oslin, D., Murphy, S.A., 2012. A "SMART" design for (SMART). Clin. Trials 11 (4), 426–434. building individualized treatment sequences. Annu. Rev. Clin. Psychol. 8, 21–48. Coulton, S., Perryman, K., Bland, M., Cassidy, P., Crawford, M., Deluca, P., Drummond, Lew, R.A., Miller, C.J., Kim, B., Wu, H., Stolzmann, K., Bauer, M.S., 2019. A robust C., Gilvarry, E., Godfrey, C., Heather, N., Kaner, E., Myles, J., Newbury-Birch, D., method to reduce imbalance for site-level randomized controlled implementation Oyefeso, A., Parrott, S., Phillips, T., Shenker, D., Shepherd, J., 2009. Screening and trial designs. Implement. Sci. 14, 46. brief interventions for hazardous alcohol use in accident and emergency departments: Morgan, C.J., 2018. Reducing bias using propensity score matching. J. Nucl. Cardiol. 25 a randomised controlled trial protocol. BMC Health Serv. Res. 9 (1), 114. (2), 404–406. Cousins, K., Connor, J.L., Kypri, K., 2014. Effects of the campus watch intervention on Morton, V., Torgerson, D.J., 2003. Effect of regression to the mean on decision making in alcohol consumption and related harm in a university population. Drug Alcohol health care. BMJ 326 (7398), 1083–1084. Depend. 143, 120–126. Nahum-Shani, I., Qian, M., Almirall, D., Pelham, W.E., Gnagy, B., Fabiano, G.A., Curran, G.M., Bauer, M., Mittman, B., Pyne, J.M., Stetler, C., 2012. Effectiveness-im- Waxmonsky, J.G., Yu, J., Murphy, S.A., 2012. Experimental design and primary data plementation hybrid designs: combining elements of clinical effectiveness and im- analysis methods for comparing adaptive interventions. Psychol. Methods 17 (4), plementation research to enhance public health impact. Med. Care 50 (3), 217–226. 457–477. Davey, C., Hargreaves, J., Thompson, J.A., Copas, A.J., Beard, E., Lewis, J.J., Fielding, NeCamp, T., Kilbourne, A., Almirall, D., 2017. Comparing cluster-level dynamic treat- ment regimens using sequential, multiple assignment, randomized trials: regression 6 C.J. Miller, et al. Psychiatry Research 283 (2020) 112452 estimation and sample size considerations. Stat. Methods Med. Res. 26 (4), (7), 540–546. 1572–1589. Schildcrout, J.S., Schisterman, E.F., Mercaldo, N.D., Rathouz, P.J., Heagerty, P.J., 2018. Neumann, M.S., Sogolow, E.D., 2000. Replicating effective programs: HIV/AIDS pre- Extending the case-control design to longitudinal data: stratified sampling based on vention technology transfer. AIDS Educ. Prev. 12 (5 Suppl), 35–48. repeated binary outcomes. Epidemiology 29 (1), 67–75. Pallmann, P., Bedding, A.W., Choodari-Oskooei, B., Dimairo, M., Flight, L., Hampson, Shadish, W.R., Cook, T.D., Campbell, D.T., 2002. Experimental and Quasi-Experimental L.V., Holmes, J., Mander, A.P., Odondi, L.o., Sydes, M.R., Villar, S.S., Wason, J.M.S., Designs For Generalized Causal Inference. Houghton Miffflin Company, Boston, MA. Weir, C.J., Wheeler, G.M., Yap, C., Jaki, T., 2018. Adaptive designs in clinical trials: Simon, G.E., Ludman, E.J., Bauer, M.S., Unutzer, J., Operskalski, B., 2006. Long-term why use them, and how to run and report them. BMC Med. 16 (1), 29. effectiveness and cost of a systematic care program for bipolar disorder. Arch. Gen. Pape, U.J., Millett, C., Lee, J.T., Car, J., Majeed, A., 2013. Disentangling secular trends Psychiatry 63 (5), 500–508. and policy impacts in health studies: use of interrupted time series analysis. J. R. Soc. Stetler, C.B., Legro, M.W., Rycroft-Malone, J., Bowman, C., Curran, G., Guihan, M., Med. 106 (4), 124–129. Hagedorn, H., Pineros, S., Wallace, C.M., 2006. Role of "external facilitation" in im- Pellegrini, C.A., Hoffman, S.A., Collins, L.M., Spring, B., 2014. Optimization of remotely plementation of research findings: a qualitative evaluation of facilitation experiences delivered intensive lifestyle treatment for obesity using the multiphase optimization in the veterans health administration. Implement. Sci. 1, 23. Strategy: opt-IN study protocol. Contemp. Clin. Trials 38 (2), 251–259. Taljaard, M., McKenzie, J.E., Ramsay, C.R., Grimshaw, J.M., 2014. The use of segmented Penfold, R.B., Zhang, F., 2013. Use of interrupted time series analysis in evaluating health regression in analysing interrupted time series studies: an example in pre-hospital care quality improvements. Acad. Pediatr. 13 (6, Supplement), S38–S44. ambulance care. Implement. Sci. 9 (1), 77. Pridemore, W.A., Snowden, A.J., 2009. Reduction in suicide mortality following a new Wagner, A.K., Soumerai, S.B., Zhang, F., Ross-Degnan, D., 2002. Segmented regression national alcohol policy in Slovenia: an interrupted time-series analysis. Am. J. Public analysis of interrupted time series studies in medication use research. J. Clin. Pharm Health 99 (5), 915–920. Ther. 27 (4), 299–309. Proctor, E., Silmere, H., Raghavan, R., Hovmand, P., Aarons, G., Bunger, A., Griffey, R., Wagner, E.H., Austin, B.T., Von Korff, M., 1996. Organizing care for patients with chronic Hensley, M., 2011. Outcomes for implementation research: conceptual distinctions, illness. Milbank Q. 74 (4), 511–544. measurement challenges, and research agenda. Adm. Policy Ment. Health 38 (2), Wyrick, D.L., Rulison, K.L., Fearnow-Kenney, M., Milroy, J.J., Collins, L.M., 2014. Moving 65–76. beyond the treatment package approach to developing behavioral interventions: Robson, D., Spaducci, G., McNeill, A., Stewart, D., Craig, T.J.K., Yates, M., Szatkowski, L., addressing questions that arose during an application of the multiphase optimization 2017. Effect of implementation of a smoke-free policy on physical violence in a strategy (MOST). Transl. Behav. Med. 4 (3), 252–259. psychiatric inpatient setting: an interrupted time series analysis. Lancet Psychiatry 4 7

Use Quizgecko on...
Browser
Browser