Biostatistics Part 1 PDF 2024
Document Details
Uploaded by Deleted User
2024
Dr. Ayman M. Abu Mustafa
Tags
Summary
These lecture notes provide a basic overview of biostatistics. They cover topics such as introduction, statistical description, normal distribution, and hypothesis testing.
Full Transcript
Biostatistics Dr. Ayman M. Abu Mustafa 1 2 Ayman Abu Mustafa 1 Dr. Ayman M. Abu Mustafa Ph...
Biostatistics Dr. Ayman M. Abu Mustafa 1 2 Ayman Abu Mustafa 1 Dr. Ayman M. Abu Mustafa PhD in Biochemistry Arab Diploma in Medical Statistics Msc. & Bsc. in Medical technology Mobile: 0599577280, 0567577280 Email: [email protected] 3 Course Outline 1. Introduction: Nature of statistics Defining the variables and information Data distribution 2. Statistical Description of Data: Mean, median and mode Standard deviation, range Graphical representations Percentiles, decimals and quartiles 4 Ayman Abu Mustafa 2 Course Outline 3. The normal distribution: Normal distribution Areas under the normal curve Applications of the normal distribution 5 Course Outline 5. Tests of Hypothesis Statistical hypothesis Type I and type II errors One-tailed and two-tailed tests Z test T test ANOVA 6 Ayman Abu Mustafa 3 Course Outline 6. Estimation of Parameters: Statistical inference Point estimation Interval estimation Estimating u. Estimating p. 7 Course Outline 7. Correlation and prediction: Linear correlation Testing the significance of r. Prediction 8. Chi-square tests Chi-square distribution Test for independence 8 Ayman Abu Mustafa 4 Variables Data Information 9 ❖ Statistics Numbers that give information about a certain situation Steps of research STEP I: Proper data collection STEP II: Proper data presentation STEP III: Proper use of statistical data analysis STEP IV: Findings interpretation and report writing 10 Ayman Abu Mustafa 5 Questionnaire Personal data Tel. No. Name Age □ 45 □Secondary school □ Preparatory school □Primary school Education □ illiterate □University Family history of diabetes □ Yes □ No BMI Weight: Kg Height: cm Diet □ Yes □ No Clinical data Duration of diabetes (Years) drugs Complications Retinopathy □ Yes □ No Cardiovascular diseases □ Yes □ No Family history HTN □ Yes □ No Neuropathy □ Yes □ No Smoking □ Yes □ No Family history of diabetes □ Yes □ No Recurrent infections □ Yes □ No Agreement: I agree to complete this questionnaire concerning my health statement. Signature…………………………….. 11 The objective of study is identify the risk factors and self reported complications in type 2 diabetic patients. Personal data Tel. No. Name Age (years) □ 19-30 □ 30-45 □ >45 □University □Secondary school □ Preparatory school Education □Primary school □ illiterate Family history of diabetes □ Yes □ No Family history HTN □ Yes □ No Smoking □ Yes □ No Family history of diabetes □ Yes □ No BMI Weight: Kg Height: cm Diet □ Yes □ No Clinical data Duration of diabetes drugs Complications Retinopathy □ Yes □ No Cardiovascular diseases □ Yes □ No Neuropathy □ Yes □ No Recurrent infections □ Yes □ No Agreement: I agree to complete this questionnaire concerning my health statement. Signature…………………………….. Thank you 12 Researcher: Ahmad M. Yassin Ayman Abu Mustafa 6 2 test General characteristics of the study population (n=300) Controls Patients General characteristics of (n=150) (n=150) 2 P-value diabetic n (%) n (%) Education 69 (46.0) 47 (31.3) 20.448 Z > 1.96 → H0 will be rejected If -1.96 ≤ Z ≤ 1.96 → H0 will not be rejected OR ▪ P-value ≥ 0.05→ I’ll accept the null hypothesis. ▪ P-value < 0.05 → I’ll reject the null hypothesis. 138 Ayman Abu Mustafa 69 Exercise 3 6. Calculation of test statistic: 139 Exercise 3 7.Statistical decision: Since -0.704 is > -1.96 (in two sided testing) → H0 can’t be rejected P-value ≥ 0.05 → H0 can’t be rejected 8.Clinical conclusion: Gender distribution is not significantly different 140 Ayman Abu Mustafa 70 Exercise 3 Calculate P value according to the z value P-value (1-tail) = 0.240 P-value (2-tail) = 0.240 × 2 = 0.480 141 142 Ayman Abu Mustafa 71 Two sample Z test Z 143 Exercise 1 An epidemiologist needs to decide whether age is different between Cairo and Alexandria. He selected representative samples 40 individuals each from the 2 governorates. The mean age in Cairo sample was 30 years while in Alexandria it was 32 years old. If you know that the variance of age in Cairo = 16 years while in Alexandria it is 25 years. Please answer the research question setting α at 0.05. Please answer by step of test hypothesis? 144 Ayman Abu Mustafa 72 Exercise 3 1.Assumptions: ▪ Quantitative data ▪ Two sample testing ▪ Large samples ▪ Representative samples ▪ Independent records ▪ Known mean & SD of both populations 1.Hypotheses: H0: µC =µA Ha: µC ≠ µA → two tail 145 Exercise 3 3.Test statistic: Z test for two samples 4.Distribution of the test statistic: Standard normal curve 5.Decision rule: If Z < -1.96 OR Z > 1.96 → H0 will be rejected If -1.96 ≤ Z ≤ 1.96 → H0 will not be rejected 146 Ayman Abu Mustafa 73 Exercise 3 6. Calculation of test statistic: Z Z=(32 – 30) ÷ (16/40)+(25/40) Z= (32 – 30) ÷ 1.012 = 1.9763 Calculate P value according to the z value, P-value one tail = 0.0228 → from table P-value two tail =0.046 (0.0228x2→ from table) 147 Exercise 3 8. Clinical conclusion: Alexandria and Cairo population are different in age Age in Alexandria population is expected to be higher than Cairo population Age in Alexandria is significantly higher than Cairo 148 Ayman Abu Mustafa 74 Exercise 3 7. Statistical decision: Since Z> 1.96 (in 2 sided test) → H0 is rejected OR Since P < 0.05 → We can reject H0 149 The Z test Limitations: 1.Known σ (SD) is very difficult in most situations 2.Works only in large samples/normal data 3.Compares mainly one or two sample in relation to population mainly use on qualitative data (proportion) 150 Ayman Abu Mustafa 75 Assignment 1. If we want to compare the cholesterol level between obese and population. representative samples of 100 adults was selected. The mean of cholesterol level among obese group was 250 mg/dl, If you know that the mean cholesterol among population 170± 10 mg/dl. Please answer by step of test hypothesis? 2. If we want to compare the working hours in Gaza and Khan Yunis, we selected representative samples 200 individuals each governorates. the working hours in Gaza was 6.5 ±1.0 hours and Khan Yunis was 8.0 ±1.0 hours? Please answer by step of test hypothesis? 151 Statistical test To assess the relationship between obesity (obese & non-obese) and cholesterol level (Low, Normal, High) in Gaza strip To assess the relationship between obesity (Kg/m^2) and cholesterol level (mg/dl) in Gaza strip To assess the relationship between obesity (obese & non-obese) and cholesterol level (mg/dl) in Gaza strip To assess the relationship between obesity (Kg/m^2) and cholesterol level (Low, Normal, High) in Gaza strip To predict (estimate) cholesterol levels from obesity scale in Gaza strip 152 Ayman Abu Mustafa 76 Statistical test Type variable Type variable Test V1L V2L 2 V1n V2n R V1L=2 V2n t V1L>2 V2n F V1L/n ?? Reg 153 Study design Strong Analytical Descriptive (Comparative) Case series Observational Experimental Case reports Cohort study RCT intervention Not intervention Definition, Prevalence Case-control Non RCT study Cross sectional Prevention, study Treatment Etiology (risk factors) , Pathogenesis, Pathology, Complication, Diagnosis & Prognosis Weak Strong 154 Ayman Abu Mustafa 77 t-test 155 t-test The idea: t value is the ratio between the mean difference divide by the SE (t=The difference ÷ SE) 156 Ayman Abu Mustafa 78 t test t-test = difference between group/ within group variation. 157 Sample vs Population 158 Ayman Abu Mustafa 79 Example (1): One t-test Researchers were interested in studying the delay in doing MRI (Magnetic Resonance Imaging). The mean ± SD in sample of 17 patients from a governmental center was 13.29 ±8.89 days. Assuming normality, researchers wish to know if we can conclude that the mean delay between injury and initial MRI is different from 15 days which is the max. delay preserves prognosis? 159 Step of test hypothesis 1. Assumptions: Quantitative data. assumed to be normally distributed One sample against a hypothetical value. Independent records. Population σ is unknown. 2. Hypotheses: H0: µ = 15 Ha : µ ≠15 160 Ayman Abu Mustafa 80 Step of test hypothesis 3. Test statistic: According to assumption will used t-test for one sample (Student’s t test). 4. Distribution of test statistic: The t curve 161 5. Decision rule (critical t ): If calculated t is < -2.12 OR > 2.12 → Reject H0 If calculated is -2.12 ≤ t ≤ 2.12 → Can’t reject H0 In one sided testing the cutoff = 1.75 162 Ayman Abu Mustafa 81 163 Step of test hypothesis Calculation of test statistic: S = SD of the sample μ0 = hypothesized value n = sample size df = (n-1) =17-1=16 t + df → P value Calculation of test statistic t= (13.29 – 15) ÷ (8.89/ 17) = -0.7931 164 Ayman Abu Mustafa 82 Step of test hypothesis 7. Statistical decision: Since -0.7931 is < 2.12 and > -2.12 →P value > 0.05 → (NS)Can’t reject H0 The calculated one tailed p value = 0.2196 (NS) The calculated two tailed p value = 0.4392 (NS) 165 Step of test hypothesis 8. Clinical conclusion: The delay in doing MRI is not significantly different from 15 days. 166 Ayman Abu Mustafa 83 Two sample testing Independent samples Two sample – Independent samples ▪ Equal variance ▪unequal variance Two sample – paired (matched) samples 167 Example: Independent samples Researchers wish to know if the data collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individuals and individuals with Down’s syndrome ▪ Data consisted of serum uric acid on 12 individuals with Down’s syndrome and 15 normal individuals. The mean ± SD of Down = 4.5±1.0 mg/dl, while it was 3.4±1.5 mg/dl in normal sample. Both samples are assumed to be normally distributed 168 Ayman Abu Mustafa 84 Repeat the down syndrome example but assuming unequal variance 1. Assumptions: Quantitative data assumed to be normally distributed Two samples (assume Unequal variance) Independent records Population σ is unknown. 2. Hypotheses: H0: µ Down’s = µNormal Ha : µ Down’s ≠ µNormal Ha : µ Down’s > µNormal 169 Step of test hypothesis Test statistic: Independent t test Distribution of test statistic: The t curve 5. Decision rule (critical t ): If calculated t is < -2.06 OR > 2.06 → Reject H 0 If calculated -2.06 ≤ t ≤ 2.06 → Can’t reject H 0 170 Ayman Abu Mustafa 85 Step of test hypothesis 6. Calculation of test statistic: DF = (N1 – 1) + (N2 – 1)=(12-1)+(15-1)=11+14=25 S21 = sample 1 variance t + df → p value S22 = sample 2 variance SED = SE of the difference n1 = sample 1 size n2 = sample 2 size t + df → p value SED= ((1)2 ÷ 12) + ((1.5) 2 ÷ 15) =0.483 t = (4.5 – 3.4) ÷ (0.483) = 2.2774 171 Step of test hypothesis 7. Statistical decision: Since 2.2774 > 2.060 (Cutoff Value)→ → we can reject H0 The calculated one tailed P = 0.016 (S) The calculated two tailed P value = 0.032 (S) 172 Ayman Abu Mustafa 86 173 Step of test hypothesis 8. Clinical conclusion: Serum uric acid levels are significantly higher in Down syndrome population than normal population 174 Ayman Abu Mustafa 87 Paired (matched) samples testing Paired (matched) samples testing In a research studying the efficacy of certain drug in lowering blood pressure among hypertensive cases, systolic blood pressure was measured in 5 patients before and after giving the drug The values before the drug were 155, 150, 145, 165, and 150. After the drug, SBP was 140, 140, 135, 140, and 130 respectively Before 155 150 145 165 150 After 140 140 135 140 130 Can the researcher conclude that the drug caused a significant decrease in SBP? Assume normality and set the α to 0.05. 175 Step of test hypothesis 1. Assumptions: Two paired samples dependent records Population σ is unknown Data assumed to be normally distributed 2. Hypotheses: H0: µbefore = µafter Ha : µbefore ≠ µafter 176 Ayman Abu Mustafa 88 Step of test hypothesis 3. Test statistic: According to assumption will used Paired t test 4. Distribution of test statistic: The t curve 177 Step of test hypothesis 5. Decision rule (critical t ): df=5-1=4 → cut off = 2.776 If calculated t is < -2.776 OR > 2.776 → Reject H0 If calculated -2.776 ≤ t ≤ 2.776 → Can’t reject H0 178 Ayman Abu Mustafa 89 Step of test hypothesis 6. Calculation of test statistic: Different Before After (After-Before) 155 140 -15 150 140 -10 145 135 -10 165 140 -25 150 130 -20 X= XD = Mean difference between pairs Mean x -16 SD = SD of the paired difference n n = number of pairs = 5 SD ( )2 x- X 6.52 SD= n -1 t = (-16) ÷ (2.92) = - 5.488 SE =SD/ n 2.92 179 Step of test hypothesis 6. Calculation of test statistic: XD = Mean difference between pairs SD = SD of the paired difference n = number of pairs t = (-16) ÷ (2.915) = - 5.488 180 Ayman Abu Mustafa 90 Step of test hypothesis 7. Statistical decision: Since -5.488 is < -2.776 → then P- value ≤ 0.05 →we can reject H0 The calculated one tailed p = 0.0025 (S) The calculated two tailed p value = 0.0050 (S) 181 Step of test hypothesis 8. Clinical conclusion: The drug can significantly lower systolic blood pressure in such hypertensive cases 182 Ayman Abu Mustafa 91 paired t-test Note: All t-tests can be done on derived data (mean, SD, n) except paired test which is done only on individual data Before After Different (After-Before) 155 140 -15 150 140 -10 145 135 -10 165 140 -25 150 130 -20 Mean -16 SD 6.52 SE 2.92 183 Limitations t-test 1. Acts ideally on normal data 2. To act on non-normal data, large samples are needed 3. Acts only on 2 groups. 184 Ayman Abu Mustafa 92 Assignment (1): In a research studying the efficacy of certain drug in lowering blood glucose among T2DM cases, Fasting blood glucose was: Before 142 165 200 182 144 After 105 121 135 125 100 Can the researcher conclude that the drug caused a significant decrease in blood glucose? Please answer by step of test hypothesis? 185 Assignment (2): Researchers were interested in studying the delay in doing fasting blood glucose in lab. The mean ± SD in sample of 40 patients from a governmental center lab was 20 ±5 min. researchers wish to know if delay or not for doing fasting blood glucose in lab (Normally time for doing Fasting blood glucose is 15 min) a. What is statistical hypothesis? b. What is research hypothesis? c. Can we conclude doing fasting blood glucose delay or not? Please answer by step of test hypotheses? 186 Ayman Abu Mustafa 93 Assignment (3): Data consisted of serum fasting blood glucose on 5 individuals with Down’s syndrome and 5 normal individuals. Both samples are assumed to be normally distributed. Fasting blood glucose was as following: Down’s 90 87 85 80 85 syndrome Normal 97 97 100 95 103 individuals Can we conclude difference in mean fasting blood glucose levels between normal individuals and individuals with Down’s syndrome? Please answer by step of test hypothesis? 187 One Way Analysis of Variance (ANOVA) 188 Ayman Abu Mustafa 94 One Way ANOVA: Basics Analysis of variance is used to test the claim that three or more population means are equal It is an extension of the two independent samples t- test As any comparison, ANOVA is composed of: Grouping variable (independent variables) that contains > 2 categories (called the factor) Tested variable (dependent variable) 189 One Way ANOVA: Basics The idea is to calculate the ratio of “between groups differences” to the “within group differences” If ratio is large ……. If ratio is small ……... 190 Ayman Abu Mustafa 95 EXAMPLE (1): You want to see if cholesterol level is different in three random samples assuming normality and equal variance. Set α = 0.05. Group 1 Group 2 Group 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204 191 Step of test hypothesis 1. Assumptions: 1. Quantitative data 2. Populations are normally distributed with equal variance 3. Random Samples 4. Independently records 5. One grouping variable 2. Hypotheses: H0: µ1 = µ2 = µ3 Ha : µ1 ≠ µ2 ≠ µ3 192 Ayman Abu Mustafa 96 Step of test hypothesis 3. Test statistic: F test (ANOVA) 4. Distribution of test statistic: F curve 5. Decision rule (critical F ): - Since F ≤ 3.89, H0 can’t be rejected - Since F > 3.89, H0 can be rejected 193 Step of test hypothesis 6. Calculation of test statistic: F = MSA ÷ MSW Calculation of F 1. Calculation of MSA (Mean Variance among groups): 1. Calculation of the mean of each group and the grand mean a) Mean of Group 1 = 249.2 b) Mean of Group 2 = 226 c) Mean of Group 3= 205.8 d) Grand mean = sum mean group / number of groups Grand mean = (Mean Gp1+ Mean Gp 2+Mean Gp 3)/3 Grand mean = (249.2+226+205)/3= 227 194 Ayman Abu Mustafa 97 Step of test hypothesis 2. Calculation of MSA: SSA = Sum of Squares variation Between groups a) SSA = 5(249.2-227)2+ 5(226 -227)2 +5(205.8 -227)2 SSA = 2464.2 + 5 + 2247.2 = 4716.4 b) df among groups= 3 – 1 = 2 c) MSA = MSA = SSA ÷ dfamong groups = 4716.4 /2= 2358.2 195 Step of test hypothesis 3. Calculation of MSW (mean of Squares variation within groups) : a) SSW (Sum of Squares variation within groups) SSW= (254-249.2)2 + (263-249.2)2 + (241-249.2)2 + (237-249.2)2 + (251- 249.2)2 + (234-226)2 +(218-226)2 +(235-226)2 +(227-226)2 +(216-226)2 +(200-205.8)2 + (222-205.8)2 + (197-205.8)2 +206-205.8)2 + (204-205.8)2 =1119.6 b) dfwithin groups = 15 – 3 = 12 c) MSW = SSW ÷ (N - k) (dfwithin groups) MSW = 1119.6/12= 93.3 196 Ayman Abu Mustafa 98 Step of test hypothesis MSA: Mean Square variation Among (Between) Groups= 2358.2 MSW: Mean Square variation Within Groups= 93.3 F = MSA ÷ MSW F = 2358.2 ÷ 93.3 = 25.275 197 Step of test hypothesis By using SPSS software result was: Group 1 Group 2 Group 3 variables F-test P-value (Mean±SD) (Mean±SD) (Mean±SD) Overall = 0.000 Group 1 vs Group 2= 0.008 Cholesterol 249.2±10.4 226.0±10.4 205.8±9.7 25.275 Group 1 vs Group 3= 0.000 (mgldl) Group 2 vs Group3= 0.019 - The groups are significantly difference between 3 groups - There are significantly difference from each other. 198 Ayman Abu Mustafa 99 Step of test hypothesis 7. Statistical decision: - Since 25.275 is > 3.89 → we can reject the H0 - The groups are statistically significantly difference between 3 groups - There are at least 2 groups statistically significantly different from each other. 8. Clinical conclusion: - The groups are significantly difference between 3 groups - 199 Example (2): An instructor noticed that the further the students were from him, the more to achieve worse in the quiz To test this opinion, random samples from front, middle and back rows were take The score for students on the quiz was recorded: Front: 82, 83, 97, 93, 55, 67, 53 Middle: 83, 78, 68, 61, 77, 54, 69, 51, 63 Back: 38, 59, 55, 66, 45, 52, 52, 61 Can we conclude the difference between score front, middle and back student. 200 Ayman Abu Mustafa 100 Step of test hypothesis 1. Assumptions: Ordinal data Random Samples Independently records One grouping variable Normal data??? Equality of variance??? 2. Hypotheses: H0: µFront = µMiddle = µBack Ha : µFront ≠ µMiddle ≠ µBack 201 Determination of normality: 1) Histogram 2) Kolmogorov Smirnov test/Shapiro Wilk test 3) Q-Q graphs for the residuals Determination of equality of variance: 1) If groups are equal in size → No need 2) Levene's tes 202 Ayman Abu Mustafa 101 Step of test hypothesis 3. Test statistic: F test (ANOVA) 4. Distribution of test statistic: F curve 5. Decision rule (critical F ): - Since F ≤ 3.47, H0 can’t be rejected - Since F > 3.47, H0 can be rejected 203 204 Ayman Abu Mustafa 102 Step of test hypothesis 6. Calculation of test statistic: F = MSA ÷ MSW Calculation of F: 1. Calculation of MSA (Mean Variance among groups): 1. Calculation of the mean of each group and the grand mean a) Mean of Front Students = 75.7 b) Mean of Middle Students = 67.1 c) Mean of Back Student = 53.5 d) Grand mean = sum mean group / number of groups Grand mean = (Mean Front + Mean Middle +Mean Back )/3 Grand mean = (75.7+67.1+53.5)/3= 65.08 205 Step of test hypothesis By using SPSS software result was: Group 1 Group 2 Group 3 variables F-test P-value (Mean±SD) (Mean±SD) (Mean±SD) Overall = 0.000 Group 1 vs Group 2= 0.008 Cholesterol 249.2±10.4 226.0±10.4 205.8±9.7 25.275 Group 1 vs Group 3= 0.000 (mgldl) Group 2 vs Group3= 0.019 206 Ayman Abu Mustafa 103 Step of test hypothesis 6. Calculation of test statistic: 2. Calculation of MSA: a) SSA = 1901.5 b) df among groups= 3 – 1 = 2 c) MSA = 950.76 3. Calculation of MSW: a) SSW = 3386.3 b) dfwithin groups = 24 – 3 = 21 c) MSW = 161.25 207 Step of test hypothesis MSA: Mean Square variation Among (Between) Groups= 2358.2 MSW: Mean Square variation Within Groups= 93.3 F = MSA ÷ MSW F = 950.76 ÷ 161.25 = 5.9 208 Ayman Abu Mustafa 104 Step of test hypothesis 7. Statistical decision: - Since 5.9 > 3.47 → we can reject the H0 - The groups are statistically significantly difference among front, middle and back student. - There are at least 2 groups statistically significantly difference from each other 8. Clinical conclusion: - The groups are significantly difference among front, middle and back student. - There are at least 2 groups significantly difference from each other. 209 Example (3): Hand et al. (1994) presented data for three groups treated with drug A, drug B, or no therapy at all (the "control group") Four girls were randomly selected in each group. For each girl, weights before & after treatment were recorded The authors wanted to test whether, in the population, all three treatments are different in effecting weight gain or not. Set significance level = 0.05. Drug A Drug B No treatment Before After Before After Before After 1 45 47 55 54 49 51 2 40 43 60 58 47 45 3 51 50 47 51 60 63 4 60 59 51 50 58 55 210 Ayman Abu Mustafa 105 Calculate the difference # Group A Group B controls 1 2 -1 2 2 3 -2 -2 3 -1 4 3 4 -1 -1 -3 211 Step of test hypothesis 1.Assumptions: 1) Quantitative data 2) Random Samples 3) Independently records 4) One grouping variable 5) Normal data??? 6) Equality of variance??? 2. Hypotheses: H0: µdrug A = µdrug B = µcontrols Ha : µdrug A ≠ µdrug b ≠ µcontrols 212 Ayman Abu Mustafa 106 Step of test hypothesis 3. Test statistic: F test (ANOVA) Posthoc tests 4. Distribution of test statistic: F curve 5. Decision rule (critical F ): - Since F < 4.26 H0 can’t reject - Since F ≥ 4.26 H0 can reject can accept the H0 213 Step of test hypothesis 6. Calculation of test statistic: F = MSA ÷ MSW 1. Calculation of MSA (Mean Variance among groups): 1. Calculation of the mean of each group and the grand mean a) Mean of drugs A= 0.75 b) Mean of drugs B = 0 c) Mean of controls= 0 d) Grand mean = sum mean group / number of groups Grand mean = (Mean drugs A + Mean drugs B +Mean controls)/3 Grand mean = (0.75+0+0)/3= 0.25 214 Ayman Abu Mustafa 107 Step of test hypothesis 6. Calculation of test statistic: 2. Calculation of MSA: a) SSA = 1.5 b) df among groups= 3 – 1 = 2 c) MSA = 0.75 3. Calculation of MSW: a) SSW = 60.7 b) dfwithin groups = 12-3= 9 c) MSW = 6.75 215 Step of test hypothesis MSA: Mean Square variation Among (Between) Groups= 2358.2 MSW: Mean Square variation Within Groups= 93.3 F = MSA ÷ MSW F = 0.75 ÷ 6.75 = 0.11 216 Ayman Abu Mustafa 108 Step of test hypothesis 7. Statistical decision: - Since 0.11 < 4.26 → we can’t reject the H0 - The groups are not significantly different from each other. - No need for posthoc tests. 8. Clinical conclusion: - All treatments are similar in effecting weight change. - None of the studied treatment could effect clinically significant weight gain. - None of the studied treatment is better in improving anorexia nervosa. 217 Chi-Square test ( 2 ) 218 Ayman Abu Mustafa 109 Chi Square test ▪ The Chi square test is used to test whether a statistically significant relationship exists between two categorical variables (qualitative) e.g. gender (male /female) and obesity (obese and non-obese ). ▪ It accompanies a cross tabulation between the two categorical variables. Influenza Vaccine Placebo Total Yes 20 80 100 No 220 140 360 2 Total 240 220 460 1 9 Step of test hypothesis Note: chi-square must be calculated on actual count data, not substituting percentages 220 Ayman Abu Mustafa 110 Example (1): For following data set in table: Is there a relationship between gender and obesity among employees? Gender: (male: female); obesity: (obese: non-obese) Influenza Vaccine Placebo Total Yes 20 80 100 No 220 140 360 Total 240 220 460 Is there a relationship between gender and obesity. Please answer by step of test hypothesis. Set α = 0.05. 221 Step of test hypothesis 1. Assumptions: a. Two variable qualitative data (categorical data) b. Random samples c. Independent record d. Adequate sample and cell sizes are ▪ The overall total is more than 460, regardless of the expected values Influenza Vaccine Placebo Total Yes 20 80 100 No 220 140 360 Overall total Total 240 220 460 222 Ayman Abu Mustafa 111 Solution 7. Calculate proportion (%) for independent variables? Influenza Vaccine Placebo Total Yes 20 (=20/240)*100 80 (=80/220)*100 100 No 220(=220/240)*100 140 (=140/220)*100 360 Total 240(=240/240)*100 220(=220/240)*100 460 Proportion for independent variables Influenza Vaccine Placebo Total Yes 20 (8.3) 80 (36.3) 100 No 220 (91.7) 140 (63.7) 360 Total 240 (100) 220 (100) 460 8.3% % of vaccinated was have Influenza while 36.3 % of Placebo have Influenza 91.7 % of vaccinated was haven't Influenza while 63.7 of placebo was haven't Influenza 223 8. Step of test hypothesis 1. Assumptions: Two variable qualitative data (categorical data) Random samples Independent record Large sample 2. Hypotheses H0: P vaccine= P Placebo (P: proportion) Ha: P vaccine ≠ P Placebo (P: proportion) 3. Test statistic: According to assumption I will used Chi square (2 ) test 4. Distribution of test statistic: Chi square curve 224 Ayman Abu Mustafa 112 Step of test hypothesis 5. Decision rule (critical 2 ): - Since 2 is ≥ 3.84 critical value (cut off value), we can reject the H0 - Since 2 is < 3.84 critical value (cut off value), we can accept the H0 df = (number of column -1)*( number of row-1)=(2-1) x (2-1) = 1 225 Step of test hypothesis 6. Calculation of test statistic: Calculating Expected values from Observed values Influenza Vaccine Placebo Total Yes 20 80 100 No 220 140 360 Total 240 220 460 Chi-Square Test: Expected Influenza Vaccine Placebo Total Yes 100 No 360 Total 240 220 460 226 Ayman Abu Mustafa 113 Step of test hypothesis 6. Calculation of test statistic: Calculating Expected values from original data ▪ Formula for calculating the expected value: Column Total RowTotal Expected = Overall Number 227 Calculating Chi-Square Test: Expected calculated how??? Influenza Vaccine Placebo Total Yes = 240*100 = 220*100 100 460 460 No = 240*360 = 220*360 360 460 460 Total 240 220 460 Chi-Square Test: Expected Influenza Vaccine Placebo Total Yes 52.2 47.8 100 No 187.8 172.2 360 Total 240 220 460 228 Ayman Abu Mustafa 114 Chi-Square Test: Observed Chi-Square Test: Expected Influenza Vaccine Placebo Total Influenza Vaccine Placebo Total Yes 52.2 47.8 100 Yes 20 80 100 No 187.8 172.2 360 No 220 140 360 Total 240 220 460 Total 240 220 460 2 = (20− 52.2) + (80− 47.8) + (220− 187.8)2 (140− 172.2)2 2 2 + 52.2 47.8 187.8 172.2 (-32.2)2 (32.2)2 ( 3 2. 2 ) 2(- 3 2. 2 ) 2 = 52.2 +4 7. 8 +1 8 7. 8 + 172.2 = 19.86+21.96+5.52+6.02 = 53.09 229 df = (number of column -1)*( number of row-1)=(2-1) x (2-1) = 1 By using chi-square distribution table we can get critical value (cut off). Also By using chi-square distribution table we can get P-value Step of test hypothesis 7. Statistical decision: - Since 2 is 5 3. 0 9 > 3.841 (cut off value) →we can reject the H0→ → statistically significantly different 8. Clinical conclusion: There was statistically significantly different between who take vaccine and who take placebo regarding influenza so vaccine is effective 230 Ayman Abu Mustafa 115 Correlation test (r) 231 Correlation test (r) 232 Ayman Abu Mustafa 116 Correlation test In two series of numerical data The values in one variable may vary correspondingly with the other one 233 Correlation + ve OR - ve + ve OR – ve: When the two variables increase & decrease in parallel →Same direction → positive correlation. When one goes up the other goes down proportionally → Opposite directions → negative correlation 234 Ayman Abu Mustafa 117 Importance of correlation 1. Facilitates difficult measures 2. Study of effectors: Dependent variable (outcome) Independent variable(s) (predictors or effectors) 235 Correlation between payment & working hours Conclusion: 1. As working hours increase & payment increase Positive correlation (proportionate correlation) 2. The increase in payment is constant in relation to increase in working hours Linear correlation 236 Ayman Abu Mustafa 118 237 The correlation coefficient Examining plots is a good way to determine the nature and strength of the relationship between two variables However, you need an objective measure to replace subjective descriptions like strong, weak, I can't make up my mind, and none Mathematically, correlation is represented by what is known as: correlation coefficient معامل االرتباط 238 Ayman Abu Mustafa 119 The Pearson correlation “0” (means no correlation) to 1 (perfect correlation) The sign is for the direction and not a value Interpretation: - Interpretation of “cc”: - From 0 to 0.25 (-0.25) = little or no relationship - From 0.25 to 0.50 (-0.25 to 0.50) = fair - From 0.50 to 0.75 (-0.50 to -0.75) = moderate to good - Greater than 0.75 (or -0.75) = very good to excellent - Strong relation may not be clinically important 239 Interpretation of “cc”: Interpretation: - Interpretation of “cc”: - From 0 to 0.25 (-0.25) = little or no relationship - From 0.25 to 0.50 (-0.25 to 0.50) = fair - From 0.50 to 0.75 (-0.50 to -0.75) = moderate to good - Greater than 0.75 (or -0.75) = very good to excellent - Strong relation may not be clinically important 240 Ayman Abu Mustafa 120 The correlation coefficient - Does NOT tell us if Y is a function of X - Does NOT tell us if X is a function of Y - Does NOT tell us if X causes Y - Does NOT tell us if Y causes X Coefficient does NOT tell us what the scatterplot looks like 241 Correlation between Age and Height 242 Ayman Abu Mustafa 121 The Pearson correlation - Is a measure of the strength of the linear correlation between two variables in one sample - “r” indicates: Strength of relationship (strong, weak, or none) → from 0 to 1 Direction of relationship → either (-) ve or (+)ve 243 Assumptions: 1) Variables are quantitative or ordinal 2) Normally distributed variables 3) Linear relationship (monotonic + constant change) 244 Ayman Abu Mustafa 122 The Pearson “r” The Pearson “r” is -Symmetric, since the correlation of x and y is the same as the correlation of y and x - Unaffected by linear transformations, such as adding a constant to all numbers or dividing all numbers by a constant -WARNING: Never compute correlation coefficients for nominal variables, even if they are nicely coded with numbers. A correlation between governorate and income is meaningless - “r” is a measure of LINEAR ASSOCIATION - When “r” = ZERO → This means NO LINEAR CORREATION – this does NOT mean there is NO CORREL ATION 245 Limitations: Linearity: Can’t describe non-linear relationships (most biological relations) عدم تقدير مدى قوة العالقة اذا كانت القيم غير موجودة Truncation of range: Underestimate strength of relationship if you can’t see full range of x value دليل No proof of causation 246 Ayman Abu Mustafa 123 Testing hypothesis The test used is the t test (revise t test uses) Statistically significant doesn’t mean clinically important or useful If you are examining many correlations coefficients, have to use the Bonferroni adjustment 247 Coefficient of determination The square of Pearson correlation Coefficient of Coefficient of determination, r2, is the proportion of variation in the values of y that is explained by the regression model with x Amount of variance accounted for in y by x Percentage increase in accuracy you gain by using the regression line to make predictions 0 ≤ r2 ≤ 1 (100%) The larger r2 , the stronger the linear relationship The closer r2 is to 1, the more confident we are in our prediction 248 Ayman Abu Mustafa 124 Example (2) A sample of 6 children was selected, data about their age in years and weight in kilograms (Kg) was recorded as shown in the following table. It is required to find the correlation between age and weight. And detect dependant and independent variable. serial Age Weight No (years) (Kg) 1 7 12 2 6 8 3 8 12 4 5 10 5 6 11 6 9 13 249 Please answer the following question: 1. Calculate sample size (n) for overall data. 2. Is sample size small or large for test statistic? Why? 3. What are dependent & independent variables? 4. What are types and scale of data (dependent & independent)? 5. Is data normal distribution? Why? 6. Drawing Scatterplot. 7. Calculate correlation between age and weight. Please answer by step of test hypothesis. Set α = 0.05. 250 Ayman Abu Mustafa 125 Solution 1. Overall sample size (n)=6 2. Small sample size because less than 30. 3. independent variable: Age & Dependent: weight 4. Type and scale of data. Type quantitative and scale ratio for Independent & Dependent variables 5. Yes we detect normality of data because this quantitative data. Mean, Median & mode approximately near each from other and SD is low (< 1/3 mean). Central tendency + Age Weight Dispersion Mean 11.0 6.8 median 11.5 6.5 Mode 12.0 6.0 SD 1.8 1.5 251 These 2 variables are of the quantitative type, one variable (Age) is called the independent and denoted as (X) variable and the other (weight) is called the dependent and denoted as (Y) variables 252 Ayman Abu Mustafa 126 6. Drawing Scatterplot. Scatterplot 14 12 10 Weight (Kg) (Y) 8 6 4 2 0 5 6 7 8 9 Age (years) (X) 253 Step of test hypothesis 1. Assumptions: Two variables are measured quantitative (numerical) Variables normally distributed linearity relationship No outliers values Small sample size 2.Statistical hypotheses (H0 & Ha): H0: ρ = 0 (no correlation) Ha: ρ ≠ 0 (there is correlation) 254 Ayman Abu Mustafa 127 Step of test hypothesis 3. Test statistic According to assumption I will used Pearson correlation coefficient test The test used is the t test (revise t test uses) t test for two sample assuming equal variance (Student’s t test). Pearson Correlation test (r) + t test 4. Distribution of test statistic The t curve 255 Pearson Correlation Step of test hypothesis 5. Decision rule A. Interpretation of " Pearson correlation coefficient test directions If (+) correlation →Same direction If (-) correlation → Opposite directions B. Interpretation of value " Pearson correlation coefficient test ": C. Interpretation t value: df = n-2 =6-2=4 -2.776 < t < 2.776 (Cutoff Value) → we Can’t reject the H0 256 Ayman Abu Mustafa 128 Step of test hypothesis 6. Calculation of test statistic: Age Weight n (years) (Kg) XY X2 Y2 (X) (Y) 1 7 12 84 49 144 2 6 8 48 36 64 3 8 12 96 64 144 4 5 10 50 25 100 5 6 11 66 36 121 6 9 13 117 81 169 n=6 ∑X= 41 ∑Y= 66 ∑XY= 461 ∑X2= 291 ∑Y2= 742 257 Pearson Correlation Step of test hypothesis 6. Calculation of test statistic: n*∑XY 2766 N XY − X Y rxy = ∑(X)*∑(Y) 2706 ( N X 2− ( X ) ) ( N Y − ( Y ) ) 2 2 2 n*∑X2 1746 n*∑XY-∑(X)*∑(Y) 60 r = 0.760 strong positive correlation ∑(X)2 1681 n*∑Y2 4452 ∑(Y)2 4356 n*∑X2-∑(X)2 65 n*∑Y2-∑(Y)2 96 SQRT (n*∑X2-∑(X)2-n*∑Y2-∑(Y)2) 78.99 r 0.760 258 Ayman Abu Mustafa 129 Pearson Correlation Step of test hypothesis 259 Pearson Correlation Step of test hypothesis 6. Calculation of test statistic: dfcorrelation = n-2 =6-2=4 260 Ayman Abu Mustafa 130 Step of test hypothesis 7. Statistical decision: According to sign → Positive correlation According to test value → strong correlation According to test t value 2.34 is < 2.766 (Cutoff Value) → we Can’t reject the H0 8. Clinical conclusion: There was strong correlation between age and weight but non statistical significant because small sample size 261 Regression analysis 262 Ayman Abu Mustafa 131 Regression analysis Regression: Putting the correlated variables into an equation Uses: Estimating the amount of effect of the independent variable (s) on the dependent variable Predicting one variable value after knowing the other variable’s value. The stronger the correlation, the better the estimation & prediction Simple regression = Bivariate = One “x” and one “y” 263 Regression Analysis ❖ Simple linear regression analysis: Predicting one numerical variable value after knowing the other numerical variable’s value ❖ Multiple linear regression analysis Predicting one numerical variable value after knowing the other more than one numerical variable’s value ❖ Logistic regression analysis Predicting one Categorical variable value after knowing the other more than one variable’s value 264 Ayman Abu Mustafa 132 Assumptions linear regression The variable X is measured without error For each value of X there is a normally distributed subpopulation of Y values The variances of the subpopulations of Y are all equal The means of the subpopulations of Y all lie on the same straight line (linearity, μ = α+βx) The Y values are statistically independent 265 Step of test hypothesis 1. Assumptions (LINEAP): 1. Linear correlation 2. Independent 3. Normal 4. Equal variances 5. Accurate 6. Research objective is Prediction 266 Ayman Abu Mustafa 133 Linear Regression 267 Regression Line of Height on Age No extrapolation 268 Ayman Abu Mustafa 134 Steps of Simple regression 1. Determine the assumptions 2. Obtain the linear equation 3. Evaluate the equation regarding prediction When we use the equation to estimate, we will be estimating on average how much the mean of the subpopulation of Y values will change with at a given change of X When we use the regression equation to predict, we will be predicting the value Y when X has a given value 269 Notes If variation due to regression is > variation due to chance → sig regression If variation due to chance is > regression → NS regression Since there are many “Y” subpopulations that associate many “X” values, hypothesis testing will be through 270 Ayman Abu Mustafa 135 Example (2) A sample of 4 smoking and 1 nonsmoking was selected. It is required to find the correlation between number of Cigarettes & lung capacity and can you predict lung Capacity from number of cigarettes. Number of N Lung Capacity cigarettes 1 0 45 2 5 42 3 10 33 4 15 31 5 20 29 271 Step of test hypothesis 1. Assumptions (LINEA): 1. Linear correlation 2. Independent 3. Normal 4. Equal variances 5. Accurate 6. Research objective is Prediction 272 Ayman Abu Mustafa 136 Step of test hypothesis 2.Statistical hypotheses (H0 & Ha): H0: β=0 Ha: β ≠ 0 3. Test statistic Simple Linear Regression Analysis & t test 4. Distribution of test statistic The t curve 273 Step of test hypothesis 5. Decision rule (critical t): df=n-2= → cut off value (Critical Value) If calculated t is ≥ 3.182 (Critical Value) → We Can Reject the H0 If calculated t is < 3.182 (Critical Value) → We Can’t reject H0 274 Ayman Abu Mustafa 137 Step of test hypothesis 6. Calculation of test statistic: 275 Step of test hypothesis 6. Calculation of test statistic: Lung n Cigs (x) Cap (y) XY x2 (x-x̄)2 ý (y-ý)2 1 0 45 0 0 100 44.6 0.16 2 5 42 210 25 25 40.3 2.89 3 10 33 330 100 0 36 9 4 15 31 465 225 25 31.7 0.49 5 20 29 580 400 100 27.4 2.56 x= 50 ∑Y= ∑XY= ∑(x-x̄)2= ∑(y-ý)2= n=5 x̄=50/5= ∑x2= 750 180 1585 250 15.1 10 276 Ayman Abu Mustafa 138 n*∑XY-∑(x)*∑(y) -1075 6. Calculation of test statistic: n*∑X2 3750 ∑(X)2 2500 B (β) -0.86 A (α) 44.6 SE 0.142 t -6.061 277 Step of test hypothesis The test used for test hypothesis in regression is the t test y: the value of the dependent variable for observation ŷ: estimated value of the dependent variable for observation x: the observed value of the independent variable for observation x̄: the mean of the independent variable, and n is the number of observation t = β/SEslope = -0.86/0.142= -6.061 β: The slope of the sample regression line SE: The standard error of the slope. 278 Ayman Abu Mustafa 139 Step of test hypothesis 7. Statistical decision: Since 6.061 > 3.182 (Cutoff Value) → P value ≤ 0.05 → We can reject the H0 8. Clinical decision or conclusion: Number of cigarettes was significant to prediction lung capacity. 279 280 Ayman Abu Mustafa 140