L7A & L7B PDF - Foundations of Biostatistics & Epidemiology
Document Details
Uploaded by PraisingMountRushmore
The University of Notre Dame Australia
Dr Rakhshanda Naheed
Tags
Summary
These lecture notes cover biostatistics and epidemiology, specifically focusing on one-sample t-tests and case-control studies. The document also includes examples and resources.
Full Transcript
Foundations of Biostatistics & Epidemiology (EPID1000) NHST with One Sample t test Dr Rakhshanda Naheed Copyright COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been copied and communicated to you by or on behalf of Curtin University of Technology pursuant to Part VB...
Foundations of Biostatistics & Epidemiology (EPID1000) NHST with One Sample t test Dr Rakhshanda Naheed Copyright COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been copied and communicated to you by or on behalf of Curtin University of Technology pursuant to Part VB of the Copyright Act 1968 (the Act) The material in this communication may be subject to copyright under the Act. Any further copying or communication of this material by you may be the subject of copyright protection under the Act. Do not remove this notice Weekly Learning Outcomes Describe the purpose & assess statistical significance of One-sample t-test including 95% Confidence Intervals. Assess the ‘Practical significance’ (effect size) for evidence based practice. Role of Case Control design in establishing associations between exposures & outcomes. Describe, calculate & interpret Odds ratio from case control studies. Interpret 95%CI for Odds Ratio. 3 Statistical Inference: From Sample to the Population Very Impt Slide Statistical Inference Def: Making conclusions about the population based on the information from a sample. Two ways of achieving Statistical Inference: – Hypothesis testing: Statistical tests are used to determine if results are real or due to chance. Hypotheses can be about Correlation, Association, differences between groups etc. – Parameter estimation (Point estimate & Interval Estimate: 95% Confidence Intervals) 5 Steps to Perform NHST 1. Set up/write Null & Alternate hypothesis 2. Set the value below which probability of random chance alone will be considered low and results will be considered real. (Significance level or Alpha) 3. Choose suitable test that will provide the ‘test statistic’ with P value (probability value for random chance) 4. Make decision (Reject null or Do not reject null) 5. Assess Practical significance Very Important slide One Sample t-test: PURPOSE To test whether a sample mean is significantly different to a predetermined/ constant value, referred to as a Test Value in SPSS Statistics. The Test Value is often, though not always, a population mean derived from prior research. Lab Example A principal wants to compare his year 7 students’ performance on the National Standardised Literacy Test with that of their peers from across the country. Last year, the national year 7 average on this test was 84.6. 1. Set up Null & Alternative hypotheses Null hypothesis (H0): There is no difference in Year 7 students’ performance on the National Standardised Literacy Test with that of their peers from across the country (i.e. sample score = test value/population score) H0: µ (literacy scores for Year 7 students at a particular school = 84.6 Alternative (HA): Year 7 students’ performance on the National Standardised Literacy Test is different from that of their peers from across the country. (i.e. sample score ¹ test value/population score) HA: µ (literacy scores for Year 7 students at a particular school ¹ 84.6 Step 2: Set the criteria for the decision (Significance level or Alpha) A level below which results are unlikely to be due to chance alone Usually set at 5% or 0.05 10 Step 3: Choose suitable test that will provide the probability of random chance (aka P value) One sample t test Some other questions that can be answered with this test: Do inner-city homeowners have higher mortgages than the national average? On average, do university lecturers spend more or less time at work than the 37.5 hours per week stipulated in their enterprise agreements? Is there a difference between perceived quality of life today, and perceived quality of life in the 1950s? Question: Can you work out what is the ‘Test value’ for each example? 11 One Sample t test Single sample compared to a Test Value. – Sample: Year 7 average on a Literacy Test – Test Value: 84.6 (National year 7 average) The Test Value is often a population mean but can be any value specified by a researcher. Handy Tip: Make sure you are clear what is null hypothesis when you use this test. Assumptions (requirements for the test) 1. Test variable should be either Interval or Ratio variable: (This should be considered at the initial stage of planning/designing the study) 2. Test variable should have roughly normal distribution in the population. Normality Assessed graphically and statistically. See sections 4.3.2.2/3 of your SPSS Manual Review lecture week 3 & lab 3. Normality Data may have normal distribution if both zs and zk are within ±1.96. Are they? Standard error of 1.05 means in sampling distribution of means (i.e. if we took several samples) we can expect the means to vary on an average by 1.05. Skewness and kurtosis for the literacy data show zs and zk are within ±1.96. Normality: Shapiro-Wilk test Shapiro-Wilk test is non-significant (p >.05) therefore assumption of Normality is not violated. In other words Normality can be assumed. Normality: What is the name of this graph? Does this graph show that literacy scores have normal distribution? Normality: What is the name of this graph? Frequency Stem & 1.00 Extremes 3.00 6. 6.00 7. 12.00 7. 4.00 8. 1.00 8. 1.00 Extremes Stem width: Each leaf: Leaf (==88) 10 1 case(s) Does this graph show that literacy scores have normal distribution? Normality: What is the name of this graph? The points cluster tightly around the diagonal line (or ‘hug’ the diagonal line J). So does this mean literacy scores have normal distribution? Normality There is roughly even spread of points above and below the horizontal line, which means sample has come from a population with normal distribution. Normality: This is a ____ plot. What do the two little circles with 8 & 2 mean? Make the Decision Interpretation: The t test is statistically significant at a =.05 which means it is less likely that the difference from the national average is due to some random chance alone. A negative mean difference indicates that the sample mean is lower than the test value (i.e. national average 84.6). In this particular instance mean difference is -9.064: 95% CI: We are 95% confident that the true ‘difference’ from the national average for the wider population of year 7 student population represented by this sample can be anywhere from -11.22 to -6.91. See Chapter 4 sections 4.3.2.4/5 of SPSS Manual. Reported Results in a paper; Results section should provide: – A description of each test used, and its purpose. – Descriptive statistics for the sample and the test value. – The outcome of each test, including whether or not it was statistically significant with p values and 95%CIs. – Measures of effect; the size and (if applicable) direction (i.e. higher or lower) for each observed effect. For Assignment: Follow the advice given in the question. Relevant output is a must. No output = No mark Resources – Ross, T. 2012. A Survival Guide for Health Research Methods. First Edition, McGraw Hill. – SPSS Statistics Version 22: A Practical Guide by Allen, Bennett & Heritage. Third edition. 2014. Cengage Learning – Carneiro, I & Howard, N. 2011. Introduction to Epidemiology. Second edition. McGraw Hill. – Stewart, A. 2016. Basic Statistics and Epidemiology: A Practical Guide. CRC Press Taylor & Francis Group. Foundations of Biostatistics & Epidemiology (EPID1000) Case Control Studies Dr Rakhshanda Naheed Weekly Learning Outcomes Describe the purpose & assess statistical significance of One-sample t-test including 95% Confidence Intervals. Assess the ‘Practical significance’ (effect size) for evidence based practice. Role of Case Control design in establishing associations between exposures & outcomes. Describe, calculate & interpret Odds ratio from case control studies. Interpret 95%CI for Odds Ratio. 2 Study Designs (very important slide) A: Descriptive study designs – Simply ‘describe’ & do not look for any association – Examples: Case study or Case report and Case Series. B: Analytic study designs: Look for associations. These are either: ‘Observational’ (simply observing & no intervention) & ‘Experimental/Intervention’ (researcher introduces an intervention to one group & not to the other & measures the outcome. Observational Designs 1. Cross-sectional 2. Ecological/Correlational 3. Cohort 4. Case control Intervention /Experimental Designs 1. Randomised design (called Randomised Controlled Trial or RCT) 2. Non-Randomised design (Quasi-experimental design) 3 What is Case Control study design Begins with two groups; one with disease of interest (cases) & the other group without that disease (controls). The 2 groups are then compared to see if their previous exposures are different. Opposite approach to Cohort design (that selects people with & without exposure but they are all disease free at the start) 4 Case Control design A number of Cases are recruited consisting of subjects who already have a known disease. A number of Controls are recruited who do not have the disease, but who are similar in other respects. Both groups are then asked questions to find out if they were exposed to a particular risk factor. If more cases are found to be exposed than the controls, then it might be possible to link the exposure with the disease. (Odds ratio; measure of effect & association) 5 Case control vs Cohort design Very impt slide 6 Subjects Two types of subjects need to be recruited: Cases: Need explicit criteria as to who will be a case and from where to recruit, when diagnosed with the disease. Controls: should be similar to cases in every respect other than actually having the disease; otherwise systematic difference can cause biased results. E.g. Hospital based vs population based 7 Matching Comparison between cases & controls will be compromised if there are age, sex & other differences. Matching is done to make groups more comparable/similar. One case may be matched to one or more controls. Unwise to match on too many factors, why? 8 Advantages of case control studies Suitable for rare or uncommon diseases. Cheap and Quick (since no follow up required). Especially useful for diseases with long ‘latent period’ (time from exposure to the onset of disease) e.g. mesothelioma. Allow investigation of multiple exposures that may be linked with the outcome. 9 Disadvantages of Case Control design Cannot measure Incidence and Prevalence. Data are retrospective and therefore prone to both selection bias and information bias. Difficult to establish Time sequence (or temporal relationship) between exposure and disease. Cannot examine multiple outcomes. Recall Bias: Difference in recall of past exposures between cases and controls. 10 Case Control design & Genetic Epidemiology: Important & common application of case control design where interaction between genetic & environmental factors is investigated. (Bio-med students should make a special note of this) Disease susceptibility genes can be studied & compared between those with a certain disease and those without. Case control design enables investigation of several genetic markers (& potentially whole genome) in the same study. Genetic mutations are rare, requiring clinical & laboratory tests that would be prohibitively expensive in a cohort study for this purpose. When looking for associations within families sometimes cases and controls do not differ sufficiently & therefore may make it difficult to measure associations. 11 Sources of Data Collection Usually by structured interviews, why? Medical records; data on risk factors is more likely to have been recorded in cases than controls e.g. alcohol consumption in patients with liver cirrhosis) From friends & family if any cases have died, potential for bias. 12 Data Analysis Odds Ratio; measure of effect & association (Just like RR for cohort design). Statistical Significance: Chi Square test & 95%CI Practical Significance: Odds Ratio with 95%CI. 13 The Odds Ratio (Please note: Cannot estimate prevalence, Incidence, Risk etc in case control design, these are only possible in cohort design). Odds Ratio is the only measure of association in a case control study which is ‘the odds of exposure in cases divided by odds of exposure in controls’. 14 Illustrated Example A study explored if there was association between exposure to asbestos & lung cancer. A total of 225 people with lung cancer (cases) and 833 people without lung cancer (controls) were selected. Study data showed 176 cases and 85 controls reported being exposed to asbestos in the past. Calculate the odds ratio from this study. Hint: Draw a 2x2 table first Odds Ratio Exposed Not exposed Cases 176 49 225 Controls 85 748 833 These numbers have been worked out by subtraction from the group total. (Cases) 225 – 176 = 49 (Controls) 833 – 85 = 748 Odds for cases = Exposed cases / non-exposed Cases = 176 / 49 Odd for controls = Exposed Controls / non-exposed Controls = 85 / 748 Odds ratio =Odds for cases / Odds for controls OR = (176/49) ÷ (85/748) = 31.61 Note: Please calculate the odds ratio as ONE step using brackets as shown here Interpretation of Odds Ratio Odds ratio =Odds for cases / Odds for controls (176/49) ÷ (85/748) = 31.61 Interpretation 1: Those who have lung cancer were 31.61 times likely to have been exposed to asbestos than those without lung cancer. Interpretation 2: Special note: Those who were exposed to asbestos were 31.61 times likely to have had lung cancer than those who were not exposed to asbestos. First interpretation is preferable but both are correct. Odds for Exposure = Exposed cases / Exposed Controls = 176 / 85 Odd for non-exposure = Cases not exposed/Controls not exposed = 49 / 748 Odds ratio = Odds for Exposure/ Odds for Non-exposure (176/85) ÷ (49/748) = 31.61 Again please do these calculations as one step using brackets Clustered Bar chart 18 Statistical Significance: Chi Square test Asbestosexposure * lungcancer Crosstabulation lungcancer Asbestosexposure Total Exposed Count Expected Count Non-exposed Count Expected Count Count Expected Count Lung cancer 176 55.5 No lung cancer 85 205.5 Total 261 261.0 49 748 169.5 627.5 797 797.0 225 833 225.0 833.0 1058 1058.0 19 Statistical Significance: Chi Square test & p value Chi-Square Tests Pearson Chi-Square Value 441.026a Asymptotic Significance (2sided).000 df 1 Continuity Correctionb 437.373 1.000 Likelihood Ratio 397.292 1.000 Exact Sig. (2sided) Fisher's Exact Test Linear-by-Linear Association N of Valid Cases.000 440.609 1 Exact Sig. (1sided).000.000 1058 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 55.51. b. Computed only for a 2x2 table 20 OR & 95% Confidence Interval Risk Estimate 95% Confidence Interval Odds Ratio for Asbestos exposure (Exposed / Nonexposed) For cohort lung cancer = Lung cancer Value 31.608 Lower 21.443 Upper 46.592 10.968 8.256 14.571 For cohort lung cancer = No lung cancer.347.291.414 N of Valid Cases 1058 95%CI for OR includes 1 = Statistical significance yes/no 95%CI for OR does not include 1 = Statistical significance yes/no 21 Resources – Mckenzie, S. 2013 Vital Statistics. First Edition, Churchill Livingstone Elsevier. – SPSS Statistics Version 22: A Practical Guide by Allen, Bennett & Heritage. Third edition. 2014. Cengage Learning – Carneiro, I & Howard, N. 2011. Introduction to Epidemiology. Second edition. McGraw Hill. – Stewart, A. 2016. Basic Statistics and Epidemiology: A Practical Guide. CRC Press Taylor & Francis Group.