Summary

This UCL lecture provides a comprehensive overview of power and sample size calculations for clinical trials. It emphasizes the importance of correctly calculating sample sizes to detect clinically meaningful differences with appropriate certainty. It covers different types of outcomes (continuous and binary) and critical factors. The presentation offers helpful insights into adjusting sample sizes for study inefficiencies such as loss to follow up.

Full Transcript

Power and Sample Size Core Principles of Mental Health Research Clinical trial scenario A randomised controlled trial is planned to investigate the effectiveness of cognitive behavioural therapy (CBT) for reducing depression in adults with cancer. The control group will receive tre...

Power and Sample Size Core Principles of Mental Health Research Clinical trial scenario A randomised controlled trial is planned to investigate the effectiveness of cognitive behavioural therapy (CBT) for reducing depression in adults with cancer. The control group will receive treatment as usual (TAU), while the intervention group will receive 12 individual CBT sessions in addition to TAU (TAU+CBT). The primary outcome is depression at 12 weeks follow up as measured by the Beck Depression Inventory II (BDI‑II). The BDI‑II is a self report measure which produces a score ranging from 0 to 63 with a score of 63 indicating severe depressive symptoms. The investigators considered that a difference between groups in BDI‑II score of 6 points or more would lead to recommendations to implement CBT for these patients in clinical practice. The standard deviation (SD) of scores on the BDI‑II in cancer patients is estimated to be 12 points. Sampling distribution for null hypothesis -9 -6 -3 0 3 6 9 12 15 18 Treatment effect Alpha criterion and hypothesis test Type I error Alternative hypothesis Do not reject H0 Reject H0 -9 -6 -3 0 3 6 9 12 15 18 Treatment effect Power Factors which affect power Level of statistical significance (alpha) Effect size to detect (difference between groups) Standard deviation (SD) of the outcome Sample size How could we increase power? – Demo: https://rpsychologist.com/d3/nhst/ Factors which determine sample size Clinically important effect – Smallest difference between groups which it is necessary to detect – Usually determined from prior studies or in consultation with clinicians or researchers Standard deviation (SD) of the outcome – Usually based on previous literature or a pilot study – Often assumed to be the same in both groups Power to detect the specified difference if a true difference exists – Usually specified as 80% or 90% Level of statistical significance – Usually specified as 5% Continuous outcomes – comparison of two means The required sample size per group is 𝟐 𝟐𝝈𝟐 𝒏=( 𝒛 𝟏 − 𝜶 /𝟐 + 𝒛 𝟏 − 𝜷 ) × (𝝁𝟐 − 𝝁𝟏)𝟐 Where – is the true difference between the means in the two groups – is the standard deviation of the outcome – is the significance level – is the statistical power – is the Z score from a normal distribution Binary outcomes – comparison of two proportions The required sample size per group is 𝟐 𝒑 𝟏 ( 𝟏 − 𝒑 𝟏 ) + 𝒑 𝟐 ( 𝟏− 𝒑 𝟐 ) 𝒏=( 𝒛 𝟏 − 𝜶 /𝟐 + 𝒛 𝟏 − 𝜷 ) × 𝟐 (𝒑 𝟐 − 𝒑 𝟏 ) Where – is the true proportion in the control group – is the true proportion in the treatment group – is the significance level – is the statistical power – is the Z score from a normal distribution Similarities between formulae Sample size for a comparison of two means 𝟐 𝟐 𝟐𝝈 𝒏=( 𝒛 𝟏 − 𝜶 /𝟐 + 𝒛 𝟏 − 𝜷 ) × (𝝁𝟐 − 𝝁𝟏)𝟐 Sample size for a comparison of two proportions 𝟐 𝒑 𝟏 ( 𝟏 − 𝒑 𝟏 ) + 𝒑 𝟐 ( 𝟏− 𝒑 𝟐 ) 𝒏=( 𝒛 𝟏 − 𝜶 /𝟐 + 𝒛 𝟏 − 𝜷 ) × 𝟐 (𝒑 𝟐 − 𝒑 𝟏 ) General principle ( 𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐝𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧 ) 𝟐 𝒏∝ ( 𝐄𝐟𝐟𝐞𝐜𝐭 𝐬𝐢𝐳𝐞 ) 𝟐 Sample size – Information required What is the primary outcome measure? How will the data be analysed? What results are expected in the control group? How small a treatment difference needs to be detected? With what degree of certainty? Clinical trial scenario A randomised controlled trial is planned to investigate the effectiveness of cognitive behavioural therapy (CBT) for reducing depression in adults with cancer. The control group will receive treatment as usual (TAU), while the intervention group will receive 12 individual CBT sessions in addition to TAU (TAU+CBT). The primary outcome is depression at 12 weeks follow up as measured by the Beck Depression Inventory II (BDI‑II). The BDI‑II is a self report measure which produces a score ranging from 0 to 63 with a score of 63 indicating severe depressive symptoms. The investigators considered that a difference between groups in BDI‑II score of 6 points or more would lead to recommendations to implement CBT for these patients in clinical practice. The standard deviation (SD) of scores on the BDI‑II in cancer patients is estimated to be 12 points. Sample size – Information required What is the primary outcome measure? – BDI-II score (range 0 to 63) – A continuous measure How will the data be analysed? – T test comparing mean scores at 12 weeks follow up What results are expected in the control group? – Mean (SD) BDI-II score = 20 (12) points How small a treatment difference needs to be detected? – 6 points on the BDI-II scale With what degree of certainty? – 80% power Sample size calculation – Example Difference to detect: = 6 points Standard deviation: = 12 points Statistical significance level: = 5% Power: = 80% The required sample size per group is 𝟐 𝟐 𝟐𝝈 𝒏=( 𝒛 𝟏 − 𝜶 /𝟐 + 𝒛 𝟏 − 𝜷 ) × 𝟐 (𝝁𝟐 − 𝝁𝟏) Sample size calculation – Example = 6 points ; = 12 points ; = 80% ; = 5% The required sample size per group is 𝟐 𝟐 𝟐𝝈 𝒏=( 𝒛 𝟏 − 𝜶 /𝟐 + 𝒛 𝟏 − 𝜷 ) × (𝝁 𝟐 − 𝝁 𝟏 )𝟐 𝟐 𝟐 𝟐 × 𝟏𝟐 𝒏=( 𝒛 𝟏 − 𝟎. 𝟎𝟓 /𝟐 + 𝒛 𝟎.𝟖 ) × (𝟔)𝟐 Function of alpha and beta Values for – function of alpha and beta 𝟐 𝒇 ( 𝜶 , 𝜷 )=(𝒛 𝟏 −𝜶 /𝟐 +𝒛 𝟏− 𝜷 ) Power (1-𝜷) 50% 80% 85% 90% 95% 1% 6.63 11.68 13.05 14.88 17.81 Alpha () 5% 3.84 7.85 8.98 10.51 12.99 10% 2.71 6.18 7.19 8.56 10.82 Where – is the significance level – is the statistical power Sample size calculation – Example = 6 points ; = 12 points ; = 80% ; = 5% The required sample size per group is 𝟐 𝟐 𝟐𝝈 𝒏=( 𝒛 𝟏 − 𝜶 /𝟐 + 𝒛 𝟏 − 𝜷 ) × (𝝁 𝟐 − 𝝁 𝟏 )𝟐 𝟐 𝟐 × 𝟏𝟐 𝒏=𝟕. 𝟖𝟓 × (𝟔 )𝟐 𝒏 =𝟔𝟑 𝟐 𝒏=𝟏𝟐𝟔 Impact of criteria on sample size = 6 points; = 12 points; = 5%; = 80% – 2n = 126 = 4 points; = 12 points; = 5%; = 80% – 2n = 284 = 6 points; = 15 points; = 5%; = 80% – 2n = 198 = 6 points; = 12 points; = 5%; = 90% – 2n = 170 = 6 points; = 12 points; = 1% ; = 80% – 2n = 188 Impact of criteria on sample size  2n  – Larger sample sizes are required to detect smaller differences (means or proportions) between groups  2n  – Larger sample sizes are required to detect differences between groups with greater variability in the outcome  2n   2n  – Larger sample sizes are required to reduce the risk of a Type I error (smaller p criterion) Range of sample sizes Sample size required for a range of input values Detectable difference (BDI-II points) 4 5 6 7 8 80% 284 182 126 94 72 (1-𝜷) Power 85% 324 208 144 106 82 90% 380 244 170 124 96 Where – = 5% – = 12 points on BDI-II More complicated scenarios Loss to follow up Unequal treatment group sizes Adjustment for baseline – Very common with continuous outcomes – Specify correlation between baseline and follow up – Usually reduces the required sample size Hierarchical data structures – Examples: multicentre trials, patients within GP practices, patients treated by the same therapist – Specify intraclass correlation (ICC) for patients within clusters – Usually increases the required sample size Time to event analysis Loss to follow up Example: – In the CBT for depression in adults with cancer study, it is expected that 25% of patients will be lost to follow up – The required sample size calculated for the study was 126 – If 25% drop out, that only leaves 75%, or 94 patients (0.75 x 126 = 94.5) whose data can be analysed – The power to detect the specified clinically important difference will be less than the 80% specified Problem: Loss to follow up reduces the power of a trial Solution: Increase initial number of patients recruited to allow sufficient power after loss to follow up Adjustment for loss to follow up If your sample size calculation says you need 100 participants, how many extra participants do you need to recruit to account for 20% loss to follow up? Increase by 20% ??? – 120% of 100 (or 1.2 x 100) = 120 – How many of 120 participants will remain after 20% attrition? – 100% – 20% = 80% (or 1 – 0.2 = 0.8) – 80% of 120 (or 0.8 x 120) = ??? Increase by 25% ??? – 125% of 100 (or 1.25 x 100) = 125 – How many of 125 participants will remain after 20% attrition? – 80% of 125 (or 0.80 x 125) = ??? Adjustment for loss to follow up If proportion lost is Q, multiply the required sample size (calculated using the relevant formula) by 1/(1-Q) – Sample size = 100, attrition = 20% 2n x 1/(1-Q) = 100 x 1/(1-0.2) = 100 / 0.80 = 100 / 4/5 = 100 x 1.25 = 100 x 5/4 = 125 – Sample size = 126, attrition = 25% 2n x 1/(1-Q) = 126 x 1/(1-0.25) = 126 / 0.75 = 126 / 3/4 = 126 x 1.33 = 126 x 4/3 When p>0.05 There are a number of possible reasons, including – True difference between groups is smaller than specified in the sample size calculation – Standard deviation is greater than specified in the sample size calculation – There is no true difference between groups in the population – There is a true difference between groups, but this result is a Type II error, for example one of the 10% of studies expected to produce a false negative result when power is specified as 90% It’s not possible to distinguish between these reasons Avoid post hoc power calculations Some journals request post hoc sample size calculations Probabilities are not meaningful after the event – Before you flip a coin, the probability of getting “heads” is 0.5 – After you have flipped a coin the result is not in doubt – it is either 0 (if you got “tails”) or 1 (if you got “heads”) Post hoc power calculations are not informative – They cannot help distinguish between possible reasons for the expected result not being found – Post hoc power for a trial with p>0.05 will always be low AVOID post hoc power calculations – A priori: perform and report power/sample size calculations – Post hoc: use 95% confidence intervals to shed light on results Summary Key concepts in power and sample size Importance of getting a large enough but not too large sample size Type I and Type II errors Sample size formulae for continuous and binary outcomes How different factors impact on the number of participants required Adjust sample sizes to account for loss to follow up References Campbell, Julious & Altmann (1995). Estimating sample sizes for binary,... and continuous outcomes in two group comparisons. British Medical Journal – Recommended – Overview with worked examples aimed at clinicians Schulz (2005). Sample size calculations in randomised trials: mandatory and mystical. The Lancet – Recommended – Readable overview aimed at clinicians Julious (2004). Sample sizes for clinical trials with Normal data. Statistics in Medicine – Useful but quite “statistical” References The following are part of the British Medical Journal’s Endgames series written by Philip Sedgwick. They are mini quizzes on real life research scenarios with answers and explanations and would be a useful way of testing and enhancing your understanding of this session: Pitfalls of statistical hypothesis testing: type I and type II errors (2014) Sample size and power (2011) The importance of statistical power (2013) Randomised controlled trials: Understanding power (2015) Thank you!

Use Quizgecko on...
Browser
Browser