A205 L2 Introduction to Biostatistics PDF
Document Details
Uploaded by AdjustableCarnelian6533
Republic Polytechnic
2024
Dr Antara Chakraborty
Tags
Summary
This document is a lecture or presentation on introduction to biostatistics and epidemiology. It covers the role of statistics in epidemiology and basic statistical concepts. The document was created on 16/10/2024 and is for an A205 course at the Republic Polytechnic in Singapore.
Full Transcript
OFFICIAL (CLOSED) \ NON-SENSITIVE Lesson 2 Introduction to Biostatistics A205 Epidemiology and Biostatistics Interactive Seminar Module Chair: Dr Antara Chakraborty Updated on: 16/10/2024 OFFICIAL (CLOSED) \...
OFFICIAL (CLOSED) \ NON-SENSITIVE Lesson 2 Introduction to Biostatistics A205 Epidemiology and Biostatistics Interactive Seminar Module Chair: Dr Antara Chakraborty Updated on: 16/10/2024 OFFICIAL (CLOSED) \ NON-SENSITIVE Contents Role of Statistics in Epidemiology Basic concepts of statistics Calculate mean, median, mode, ranges, variance, standard deviation, standard error of mean and confidence interval Hypothesis testing Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Role of Statistics in Epidemiology OFFICIAL (CLOSED) \ NON-SENSITIVE What is Statistics? Explanation of the Key terminologies: Statistics The sciences of using statistical methods to collect data, summarize data, interpret results and draw conclusions. Biostatistics The science of statistics applied to the analysis of biological or medical data. Biostatistics is essential for Scientific method of investigation and appraisal and critique of scientific literature Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE What is the Role of Statistics in Epidemiology? Can numbers be useful to understand, prevent and control diseases? COVID-19 has put an intense spotlight on statistics. The numbers of cases, deaths and infection rates reported by The governments and media are used as a Power of critical tool for supporting decision-making numbers and providing public accountability. Source: Steve MacFeely, Chief Statistician, UNCTAD Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Team Discussion: Visit the following link: https://www.moh.gov.sg/resources-statistics/infectious- disease-statistics/2022/weekly-infectious-diseases-bulletin Select a specific disease that is of interest to you: Find some statistical data relating this disease and explain the impact of this data on public health Class Sharing: Each team will share the team’s inputs in Padlet Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE What is the Role of Statistics in Epidemiology? 4488 subjects (1790 PD, 2698 controls) with genetic data of at least one LRRK2 variant were included in this study. Risk-variant-carriers who were non-caffeine- drinkers had increased PD odds compared to wildtype carriers who were caffeine-drinkers for G2385R [OR 8.6], R1628P [OR 4.6] and S1647T [OR 4.0] variants. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE What is the Role of Statistics in Epidemiology? "This research has important implications for the prevention of PD, especially in countries like Singapore where the Asian gene variants are common. consuming caffeine within normal limits offers an easy, pleasant and sociable way for people to potentially reduce their risk of PD." Copyright © 2024 Republic Polytechnic, Singapore What is the Role of Statistics in OFFICIAL (CLOSED) \ NON-SENSITIVE Epidemiology? Trends in Prevalence of Specific Developmental Disabilities in Children Aged 3 to 17 Years, NHIS, 2009 to 2017 2009– 2012– CONCLUSIONS: Condition 2015–2017, % 2011, % 2014, % The prevalence of developmental disability Any developmental disability 16.22 16.80 17.76 among US children aged ADHD 8.47 9.10 9.54 3 to 17 years increased ASD 1.12 1.60 2.49 between 2009 and 2017. Changes by Blind or unable to see at all 0.16 0.16 0.16 demographic and CP 0.31 0.34 0.28 socioeconomic Moderate to profound hearing 0.64 0.68 0.58 subgroups may be loss related to improvements LD 7.86 7.51 7.86 in awareness and access ID 0.93 1.21 1.17 to health care. Seizures, past 12 mo 0.83 0.70 0.78 Stuttered or stammered, past 12 Source: Prevalence and Trends of 2.04 1.90 2.13 developmental Disabilities among mo Children in the United States:2009- 2017 Other developmental delay 4.65 4.43 4.06 Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Basic Concepts in Statistics OFFICIAL (CLOSED) \ NON-SENSITIVE Central tendencies Basics of Statistics Mean Dispersions Variance Median Standard Deviation Mode Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Basics of Statistics 1) Central Tendencies 1. Arithmetic mean/ average (𝑥̅ ) The sum of all the measurements values divided by the total number of measurements (n). 2. Median The middle value of the set of measurement values when arranged in an ascending order from lowest to highest. 3. Mode The figures which occur the most number of times. Mode, Median, Mean, Range, and Standard Deviation (1.3) https://www.youtube.com/watch?v=mk8tOD0t8M0 Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Basics of Statistics 1) Central Tendencies Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Basics of Statistics Using Excel, determine the mean, median and standard deviation for the following numbers: Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Basics of Statistics Normal distribution The "Bell Curve" is a Normal Distribution: Mean = median Symmetry about the center 50% of values less than the mean and 50% greater than the mean Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Basics of Statistics 2) Dispersion Standard Deviation Measure of spread tells you how much variation exists in the data Standard Deviation: Measure how closely the measurement data clustered about the mean. The smaller the standard deviation, the more closely the data are clustered about the mean and the higher the precision is. Low variation Moderate variation High variation Interactivate: Normal Distribution (shodor.org) Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Basics of Statistics 2) Dispersion Standard error of mean Standard error of mean (SEM): the standard deviation of the sampling distribution of a statistic, most commonly of the mean. It indicates how different the population mean is likely to be from a sample mean. By calculating standard error, we can estimate how representative our sample is of our population and make valid conclusions. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Basics of Statistics 2) Dispersion Standard error of mean Standard error of mean (SEM): It tells us how much the sample mean would vary if we were to repeat a study using new samples from within a single population. The SEM of any particular sampling distribution of sample means is (s stands for standard deviation, n stands for sample size): The SEM of a Yes/No test is given by (p is the proportion for Yes): Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Basics of Statistics 2) Dispersion What is the difference between stand deviation and standard error of mean? Standard error: Variability across the samples Standard deviation: Variability within a sample Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Analysis and Summary of Data OFFICIAL (CLOSED) \ NON-SENSITIVE Methods in Statistics Data collection Descriptive Data summarization Statistics Statistical analysis Inferential Statistics Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE What are the Different Types of Data? Quantitative data It is expressed in numbers and graphs and is analyzed through statistical methods Qualitative data is expressed in words and analyzed through interpretations and categorizations Data collection method: How to collect data? Experimental, Interviews, surveys, secondary research, etc. Source: https://www.scribbr.com/methodology/data- collection/ Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Data summarization (Representation of Data) Bar graph Pie chart Line graph Pictogram Histogram Frequency distribution Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Useful concepts for analysis of Data 1) Probability Step 1 Visit the following link RANDOM.ORG - Dice Roller Step 2 Using this link, roll 4 dice Step 3 Record your team’s observation using the Table provided in the next slide Step 4 Share your team’s result using Padlet/white board as suggested by your lecturer Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Useful concepts for analysis of data 1) Probability Student Dice 1 Dice 2 Dice 3 Dice 4 1 2 3 4 5 Discuss… Does your answer match with your team members? Does your team’s answer match with other teams? Why/Why not? Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Useful concepts for analysis of data 1) Probability is the likelihood or chance that something will happen. Probability is an estimate of the relative average frequency with which an event occurs in repeated independent trials. The relative frequency is always between 0% (the event never occurs) and 100% (the event always occurs). Image courtesy :https://www.mathsisfun.com/probability_line.html Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Useful concepts for analysis of data 1) Probability a) Independent Events When two events are said to be independent of each other, what this means is that the probability that one event occurs in no way affects the probability of the other event occurring. b) Dependent Events When two events are said to be dependent, the probability of one event occurring influences the likelihood of the other event. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Useful concepts for analysis of data Probability value (p value) A p-value is a statistical measurement used to validate a hypothesis against observed data. Image courtesy :https://www.simplypsychology.org/p-value.html Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Useful concepts for analysis of data p-values and statistical significance p-values are most often used by researchers to say whether a certain pattern they have measured is statistically significant. Statistical significance will tell the researchers whether the p-value of a statistical test is small enough to reject the null hypothesis of the test. The lower the p-value, the greater the statistical significance of the observed difference The most common threshold is p < 0.05. But the threshold depends on the field of study – some fields prefer thresholds of 0.01, or even 0.001. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Image courtesy :https://www.simplypsychology.org/p-value.html Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis In summary Probability (p-value) is compared to the significance level (𝜶). A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis. A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Useful concepts for analysis of data 2) Confidence interval of mean The confidence level is another way to describe the probability. It is the mean of your estimate plus and minus the variation in that estimate. Confidence level = 1 − a So, if we use an alpha value of p < 0.05 for statistical significance, then our confidence level would be 1 − 0.05 = 0.95, or 95%. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Useful concepts for analysis of data 2) Confidence interval of mean When the sample size is large, approximately 95% of the sample means will fall within ± 2 standard errors of the population mean (SEM) If we construct a confidence interval with a 95% confidence level, we are confident that 95 out of 100 times the estimate will fall between the upper and lower values specified by the confidence interval. Source: https://www.youtube.com/watch?v=tFWsuO9f74o Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis Relationship among Mean, standard deviation, standard error of mean, and confidence interval of mean Type Description Formula Standard Descriptive Average difference Deviation (S) between data points and their mean Standard Error Inferential A Measure of how of Mean (SEM) variable the mean will be when the study is repeated many times 95% Confidence Inferential A range of values that Interval (95% CI) can be 95% confident contains the true mean Sample size increases, SEM decreases and 95% CI decreases, too Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis What is Hypothesis Testing? For every objective in a study, we will have two opposing hypotheses: FINAL AIM: To reach 1) Null Hypothesis a conclusion to 2) Alternative Hypothesis reject/fail to reject either Null Hypothesis or Alternative Null Hypothesis Hypothesis It is a statement assuming the default state of a situation. Generally, this statement represents neutrality or non-positive result. Alternative Hypothesis It is a statement that contrast with the Null Hypothesis. Generally, this statement represents an expected result of the investigation. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Statistical Analysis What is Hypothesis Testing? Example: A social welfare organization wants to conduct a study to see if there is any gender bias in the salary structure among polytechnic graduates in country ‘X’. Null Hypothesis: ?? Alternative Hypothesis: ?? Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE How to Perform Data Analysis? Example John is an engineering student. He scores 80% in Mathematics test. Mike is a science student. He scores 50% in Mathematics test. Analysis: Engineering students are better in Mathematics than Science students Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE What is Hypothesis Testing? 1) Formulation of Null and Alternative Hypothesis Null Hypothesis: Engineering students’ performance = Science students’ performance Alternative Hypothesis (reverse of Null) 2) Collect Data 3) Summarize Data 4) Perform appropriate statistical test (e.g., t test, ANOVA, Chi test, Correlation analysis) 5) InterpretationsCopyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE How to Perform Data Analysis? Let’s revisit our analysis regarding the performance of Science and Engineering students Null Hypothesis: Engineering students’ performance = Science students’ performance Alternative Hypothesis (reverse of Null) Perform appropriate statistical test (e.g., t test) determine the p value If Significance level, 𝜶 is 0.05 and p value is < 0.05, we reject Null Hypothesis and vice versa Explanation: We can logically conclude that engineering student perform better than science student where we have less than 5% chance of being wrong. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE State if you would reject/fail to reject null hypothesis for the following scenario (when significant level is 0.05). Scenario 1: p value is 0.005 Scenario 2: p value is 0.01 Scenario 3: p value is 0.21 Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Example: The height of students studying at a polytechnic follows a normal distribution with a mean of 1.62 m and a standard deviation of 0.15. If the sample size is 10,000, calculate the 95% CI. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE OFFICIAL (CLOSED) \ NON-SENSITIVE The ability of a new test method to measure one of the key liver enzymes, alanine transaminase (ALT), was validated. The following table shows the data collected from the new method and conventional method respectively. The measurements were taken 5 times for each method. Data ALT (U/L) Conventional New method method 1 45.2 46.5 2 53.5 54.0 3 48.9 41.5 4 52.3 65.3 5 48.9 41.5 Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE a) Calculate the following: i) Mean, median and mode ii) Standard deviation for data obtained by new method and conventional method respectively. b) Based on the results in (a), determine which method is closer to the mean. Explain your answer. Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE What You Have Learnt Explain the role of statistics in the study of epidemiology Describe the basic concepts of statistics (chance, dependent and independent variables, sample size, 95% confidence interval, standard deviation, standard error of mean) Calculate mean, median, mode, ranges, variance, standard deviation and confidence interval Explain how p value could be used to interpret the results from hypothesis testing Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Biostatistics will be covered in L10, L11, L12 and L13 Copyright © 2024 Republic Polytechnic, Singapore OFFICIAL (CLOSED) \ NON-SENSITIVE Pre-reading for Lesson 03 1. The Advanced Triangle of Epidemiology - https://www.youtube.com/watch?v=DVxAdrVqkUk 1. Chain of infection- https://www.youtube.com/watch?v=IBX3jj2uUjo 3. Infection control: break the chain- https://www.youtube.com/watch?v=_o9SxDFPUiA Copyright © 2024 Republic Polytechnic, Singapore