Introduction to Inferential Statistics PDF
Document Details
Tags
Summary
This document provides an introduction to inferential statistics. The content covers topics like probability theory, different types of correlation coefficients, hypothesis testing, and the steps involved in hypothesis testing.
Full Transcript
Inferential Statistics Statistical Inference is the process of using data analysis to deduce properties of an underlying distribution of probability Getting information about a population sample: cost benefit Crucia...
Inferential Statistics Statistical Inference is the process of using data analysis to deduce properties of an underlying distribution of probability Getting information about a population sample: cost benefit Crucial difference Client acceptability Public and political acceptability Ethical and legal concerns Probability theory. It refers to a large number of experiences, events or outcomes that will happen in a population in the long run. Likelihood and chance are similar terms. Sampling distributions are theoretical distributions developed by mathematicians to organize statistical outcomes from various sample sizes so that we can determine the probability of something happening by chance in the population from which the sample was drawn. HYPOTHESIS TESTING Types of Hypothesis Alternative/ Predictive Hypothesis- Null Hypothesis- symbolized by H1, opposing of symbolized by Ho, neutrally and well hypothesis, it specifies an objectivity which must be existence of a difference or a present in any research relationship. undertaking. Title: The NMAT Scores and Academic Achievement of Example #1 the Students in Private and Public Schools. AlternativeHypothesis: NullHypothesis: There is significant difference between the There is no significant relationship between the NMAT performance and the academic NMAT performance and the academic achievement among the four learning areas of achievement among the four learning areas of private schools, public schools and combination private schools, public schools and combination of private and public schools. of private and public schools. Title: A Study on the Relationship of Smoking Habits to Hypertension Among the Employees of BTS Corporation. Example #2 NullHypothesis? AlternativeHypothesis? 3. Select a significance level and the critical region 2 TAILED HYPOTHESIS outcome is expected in a single The direction of the effect is direction unknown Example Example: Øadministration of experimental Ø experimental therapy will drug will result a decrease in result in a different systolic BP) response rate than that of current standard of care. ) Level of Significance The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. Level of Significance Remember if it shows two tailed test result, and you wanted a one tailed result, divide the two tailes p value by 2 Example: P= 0.80 (two tailed) or p> 0.05 P= 0.40 (one tailed) or p < 0.05 CALCULATED/ COMPUTED/ CRITICAL VALUE Computing Calculated Value Obtaining Critical Value Use statistical test to derive some calculated value A criterion used based on degree of freedon (df) and alpha level (0.05 or 0.01) is compared to the calculated (eg: T value or F value) value to determine if findings are significant and therefore reject Ho. Reject or AcceptHo If Critical value > Calculated value =Accept Ho If Critical Value < Calculated value = Reject Ho If reject Ho, only supports H1; it does not prove H1. Basic Steps in Hypothesis Testing 1. Make a prediction 2. Decide on what Statistical test to use 3. Select a significance level and the critical region 4. Compute test statistics 5. Compare the test statistics to the sampling distribution (table) 6. make a decision about the null hypothesis- reject if the statistic falls in the region of rejection 7. Consider the power of the t test –its probability of detecting a significant difference – parametric test are more powerful TypeIError rejecting null hypothesis when it was true. TypeIIError accepting null hypothesis when it should been rejected. Statistical Tools Types of q Correlation qChi- square Inferential qANOVA q T-test Statistics qLogistic Regression qLinear Regression Classification of Inferential Statistics Parametric Non-parametric makes various assumptions about the nature of the makes few, if any, assumptions about the nature of the population from which the sample for study is drawn population capable of determining the actual difference or relationship in useful in measurements of nominal and ordinal data the study cannot determine the relationships in a study EXAMPLES: EXAMPLES: T-test, One-Way Analysis of Variance (1-Way ANOVA), Two-Way Chi-Square Test, Wilcoxon Signed-Rank Test Analysis of Variance (2-Way ANOVA), Multiple Analysis of Variance (MANOVA) Correlation used to measure the degree of liner relationship or association between two variables Correlation coefficients are computed and it uses the Pearson Product Moment Correlation Coefficient or Pearson r. where; x=the observed data for the independent variable y=the observed data for the dependent variable n=size of the sample Correlationcoefficient maybepositive or negative r= the degree ofrelationship between x and y. Interpretation Degree of linear relationship can be interpreted through the use of range of values for the Pearson Product Moment Correlation. To know whether the obtained correlation coefficient is significant, i.e. that a real correlation exist or that the obtained r is not merely due to sampling variation, a t-test for testing the significant of r could be used. where; r= the obtained Pearson r value n= sample size Types of Correlation 1.Pearson’sr– used when both data are quantitative Positive correlation is present when high values in one variable are associated with high values of another variable or vice versa. Negative correlation is present when high values are associated with low values of the other values and vice versa. Types of Correlation 2.SpearmanRankOrderCorrelation– is the nonparametric version of the Pearson correlation coefficient. The data must be ordinal, interval or ratio. SPEARMAN’S RANK ORDER EXAMPLE: The scores for nine students in physics and math are as follows: Physics: 35, 23, 47, 17, 10, 43, 9, 6, 28 Mathematics: 30, 33, 45, 23, 8, 49, 12, 4, 31 Compute the student’s ranks in the two subjects and compute the Spearman rank correlation. Step 1: Find the ranks for each individual subject. Order the scores from greatest to smallest; assign the rank 1 to the highest score. Step 2: Add a third column, d, to your Step 3: Sum (add up) all of your d- data. The d is the difference between squared values. 4 + 4 + 1 + 0 + 1 + 1 + 1 + ranks. For example, the first student’s 0 + 0 = 12.You’ll need this for the formula physics rank is 3 and math rank is 5, so (the Σ d is just “the sum of d-squared the difference is 2 points. In a fourth values”). column, square your d values. Step 4: Insert the values into the formula. These ranks are not tied, so use the first formula: = 1 –(6x12) 9(81-1) = 1 –72 720 = 0.59 The Spearman Rank Correlation for this set of data is 0.9 Types of Correlation 3.PhiCoefficient- is a measure of association between binary variables Binary variables (i.e. living/dead, black/white, success/failure). It is also called the Yule phi or Mean Square Contingency Coefficient and is used for contingency tables when: At least one variable is a nominal variable. Both variables are dichotomous variables. Formula Variable X A B A+B Variable Y C D C+D A+C B+D PERSON GENDER PAIN EXPERIENCE 1=Male 2=Female 1=Feel Pain 2=Don’t Feel Pain Example: A 1 2 B 2 1 C 2 1 D 1 2 E 1 2 A researcher wishes to determine if a significant F 2 1 G 1 2 H I 2 1 2 2 relationship exists between the J K 1 1 2 2 gender of the worker and if they L 1 2 experience pain while performing an electronics M 2 2 N 2 1 O P 1 2 2 1 assembly task. Q 2 1 R 2 1 S 2 1 MALE FEMALE T 1 1 U 1 2 YES 1 12 13 V 1 2 W 2 1 NO 13 2 15 X 2 1 Y 1 2 14 14 Z 2 1 a 1 2 b 2 1 Null and Alternative Hypothesis Ho: There is no relationship between the gender of the worker and if they feel pain while performing the task. H1: There is a significant relationship between the gender of the worker and if they feel pain while performing the task. Solution MALE FEMALE YES A=1 B=12 E=13 NO C=13 D=2 F=15 G=14 H=14 Suggested Interpretation of Measures ofAssociation: Values Appropriate Phrases 0.70 or higher Very strong positive relationship. 0.50 to 0.69 Substantial positive relationship. 0.30 to 0.49 Moderate positive relationship. 0.10 to 0.29 Low positive relationship. 0.01 to 0.09 Negligible positive relationship. 0.00 No relationship. -0.01 to -0.09 Negligible negative relationship. -0.10 to -0.29 Low negative relationship. -0.30 to -0.49 Moderate negative relationship. -0.50 to -0.69 Substantial negative relationship. -0.70 or lower Very strong negative relationship. Types of Correlation 4.Kendall’s Q- usually smaller values than Spearman’s rho correlation. Calculations based on concordant and discordant pairs. Insensitive to error. P values are more accurate with smaller sample sizes. Kendall’s Tau=(C– D / C +D) C= concordant D= discordant Step 1: Make a table ofrankings 2. Count the number of concordantpairs, using the second column 3. Count the number of discordant pairsand insert them into the nextcolumn. 4. Sum the values in the twocolumns: Step 5: Insert the totals into the formula: Kendall’s Tau = (C – D / C + D) = (61 – 5) / (61 + 5) = 56 / 66 =.85. Used to measure if there is a T-Test statistical difference between the mean score of two groups (either the same group of people before and after or two different group of people) Paired t-test When comparing the MEANS of a continuous variable in two non-independent samples Parametric approach or large sample approach used to compare means of two paired groups EXAMPLE: Measurement on the same people before and after a treatment Independent Sample T-Test To compare the means of a continuous variable in two independent samples Two different groups of people Example: Diabetic Vs Nondiabetic BP What does t test tell ? If there is a statistically significant difference between the the mean score (or value) of 2 groups (either the same group of people before and after or two different groups of people) What do test results look like? How do u interpret it? By looking at the corresponding p value If p 0.05, means are not significantly different from each other Chi-Square Distribution commonly used for testing relationships between categorical variable. is a non-parametric test for data presented in frequencies, or data which can be transformed to frequencies. Requires only nominal data allows researcher to determine whether frequencies that have been obtained in research differ from those that would have been expected—use a X2 sampling distribution. 2 x 2 Contingency Table How do u interpret it? By looking at the corresponding p value If p 0.05, means are not significantly different from each other chi square statistic (x2 = 3.418) alpha level of significance (0.05) degrees of freedom (df = 1). P value: 0.065 A very small chi A very large chi square test statistic square test statistic means that your means that the observed data fits data does not fit your expected data very well. extremely well.