Inferential Statistics Lecture Notes PDF
Document Details
Uploaded by Deleted User
Boğaziçi University
Saliha Erman, MA
Tags
Summary
These lecture notes cover inferential statistics, focusing on hypothesis testing and one-sample z-tests. They include examples and exercises.
Full Transcript
Inferential Statistics A Lecture on Hypothesis Testing & One Sample z-Test Teaching Assistant: Saliha Erman, MA Source: Adapted from Assoc. Prof. Güneş Ünal, Boğaziçi University, 2022 Z-scores (recap from week 5) z Scores and Ar...
Inferential Statistics A Lecture on Hypothesis Testing & One Sample z-Test Teaching Assistant: Saliha Erman, MA Source: Adapted from Assoc. Prof. Güneş Ünal, Boğaziçi University, 2022 Z-scores (recap from week 5) z Scores and Areas Under the Normal Curve If we substitute z notation for standard score, we obtain the following formula for converting a score in a sample to a z score: z= Y-Y s for converting a score in a population to a z score: Y-µ z= σ z (Score) Table There is a Z table, (aka standard normal table) to find a z score, to find the probability that a statistic is observed below, above, or between values on the standard normal distribution The values within the table are the probabilities Types of Table Cumulative from mean: gives a probability that a statistic is between 0 (mean) and z (column 5 in the text book, Appendix B) Cumulative: gives a probability that a statistic is less than Z (column 3 in the text book, Appendix B) Complementary cumulative: gives a probability that a statistic is greater than z (also column 3 in the text book, Appendix B) z Scores and Areas Under the Normal Curve Suppose Y represents the height of macaque monkeys and is normally distributed with μ = 60.0 cm and σ = 5.0 cm. What is the probability that an individual monkey chosen at random will have a height greater than 63 cm (63 cm included)? In other words, what proportion of this population have a height a t o r greater than 63 cm? Formal expression: P(Y >= 63) = ? Y-µ z= σ 63 cm on the original scale, 0.6 standard deviations on the z-score scale z Scores and Areas Under the Normal Curve Look at the relevant section of z table: z Scores and Areas Under the Normal Curve Look at the relevant section of z table: P(Y>=63 centimetres) =.2743 (= 1 - 0.7257) The probability that a monkey chosen at random is equal to or taller than 63 cm is.27 The proportion of the monkey population who are equal to or taller than 63 cm is.27 27% of monkeys are equal to or taller than 63 cm (all these statements are true; different ways of saying the same thing) Y-µ z= σ 63 cm on the original scale, 0.6 standard deviations on the z-score scale Exercise 1. A normally distributed population of scores has a standard deviation (σ ) = 20. Within this population a score of Y = 80 corresponds to z = + 0.50. What is the population mean? a. 60 b. 70 c. 75 d. Cannot be determined from the information given 3. With continuous probability distributions, the probability of observing a value ≤ Y is given by: a. the area to the right of Y b. the area to the left of Y c. the height of the distribution at Y (the probability density) d. none of the above Exercise 1. A normally distributed population of scores has a standard deviation (σ ) = 20. Within this population a score of Y = 80 corresponds to z = + 0.50. What is the population mean? a. 60 b. 70 c. 75 d. Cannot be determined from the information given 3. With continuous probability distributions, the probability of observing a value ≤ Y is given by: a. the area to the right of Y b. the area to the left of Y c. the height of the distribution at Y (the probability density) d. none of the above Exercise 4. A population of test scores is normally distributed with a mean of 500 and a standard deviation of 100. What percentage of the population would be expected to have a score: a. greater than 300? (use the graph) b. less than 500? c. less than 450 and more than 550? (use the z-table) For more exercises with very clear explanations click here. Exercise 4. A population of test scores is normally distributed with a mean of 500 and a standard deviation of 100. What percentage of the population would be expected to have a score: a. greater than 300? (use the graph) 100-[(100-95.5)/2] =.9775 b. less than 500?.50 c. less than 450 and more than 550? (use the z-table).6170 For more exercises with very clear explanations click here. z Scores and Areas Under the Normal Curve Estimation (recap from week 5 - guest lecture) Estimators One goal of data analysis is to estimate parameters (population values) from statistics (sample values). Estimators should be unbiased. Two main statistics that we frequently use in stats for psychology are the mean and the standard deviation (e.g., to calculate z-scores). Lets see if they are biased or unbiased estimators of the population parameters: 1. We can estimate the population 2. We can estimate the population mean from the sample mean. variance (or SD) from the sample variance (or SD). The sample mean (Y-bar) is an unbiased estimate of the population The sample variance (SS divided by N) is mean (mu) a biased estimate of the population mean’i variance’ı buldun ama variance. The unbiased estimate of the bunu populationdan mı population variance (sigma-squared) is sampledan mı yapcan. the estimated population variance (SS divided by N-1). şu ana kadar hep tek kişiden bahsettik What does an unbiased estimate mean? Example: Let’s assume a small population A (N = 3), take all possible random samples with a sample size of 2 (n = 2) , and calculate the mean of the means of all those possible samples. It will be equal to the population mean. -> unbiased! bulabilecğin tüm mean ihtimallerini bulup üçe bölsen yine direkt bu topluluğun mean’i Y1 çıkıyor = 7.5 10 5 A 10 (7.5 + 6.5 + 4) / 3 3 3 10 Y2 = 6.5 =6 5 5 3 Y3 = 4 µ = (10+3+5) / 3 = 6 SAME Why is the variance/SD is a biased estimate? Watch this. Sen bunlardan işlem yapıyorsun ama bulduğun mean ve sd’ler doğru mu diyor. Estimating the Population Variance The sample variance (s2, SS divided by N) is a biased estimate of the population variance (σ2, sigma-squared). The unbiased estimate of the population variance is the estimated population variance (ŝ2 , SS divided by N-1). Σ (Y - Y) 2 Σ (Y - µ) 2 Σ (Y - Y) 2 s2 = σ2 = ŝ2 = n N n-1 (biased) (unbiased) Bessel’s correction N-1 is known as the degrees of freedom. To learn more about it, you can watch this. Estimating the Population Variance Does using an unbiased estimation mean that it will be 100% correct? Remember: bias vs. accuracy → It can be unbiased, but have low accuracy/precision Biraz hata hep var ve bunu biz kabul ediyoruz. Sampling Error Sample values are likely to differ from population values because they are based on only a portion of the overall population >> does not imply there have been mistakes in the data collection or the data analysis. It is natural and doesn’t come from bias (as long as you select randomly). The amount of sampling error can be represented as the difference between: the value of a sample statistic - the value of the corresponding population parameter You can also see how biased vs. unbiased estimates differ from the population parameter Y - µ = 12.2 – 11.8 = 0.4 Y - µ = 13.8 – 11.8 = 2.0 sampling error (random) vs. bias (systematic) Sampling Distribution (recap from week 5 - guest lecture) The Sampling Distribution of the Mean Imagine drawing random samples of a given size (N) repeatedly from a population of scores. For each sample, we compute the mean (Y). 15 16 14 5 1 10 9 3 2 1 7 n=3 11 6 16 10 13 17 15 21 1 17 21 18 17 20 10 6 3 16 3 9 21 35 5 15 12 13 12 13 The Sampling Distribution of the Mean Each mean (Y) can be thought of as one observation from a population of means: the means of all possible samples of size n that could be drawn from the parent population. Because of the Central Limit Theorem, the distribution of Ys is well approximated by the normal distribution (especially when n >= 30) The distribution of Y values is known as the sampling distribution of the mean. The standard deviation of a sampling distribution of the mean is called the standard error of the mean (SEM). It represents an average deviation of the sample means from the population mean and reflects the accuracy with which sample means estimate a population mean Standard Error of the Mean It is important to note that the standard deviation of a sampling distribution is always smaller than the standard deviation of the population because we are not dealing with raw scores but rather the central tendencies (sampling distribution of the mean) and the range of the sample means is smaller than the range of the population sampled. mantıken i.e. Random samples of size 2 are selected from the finite population consisting of the numbers 3, 5, 7; 3-5, mean = 4 The range of the parent population = 7 - 3 = 4 The 3-7, mean = 5 range of the sample means = 6 - 4 = 2 5-7, mean = 6 σy = σ n If the standard error of the mean is small, it means that any sample mean drawn from the population is a fairly accurate estimator of the population mean. σy σy = σ = standard error of the mean σ = standard deviation of scores in the n population n = sample size The size of the standard error is determined by just two factors: 1. The variability of the scores themselves: The larger σ (the noisier the data), the larger will be σ y. 2. The size of the sample, n. The larger the sample size, the smaller will be σy. We can use use the sampling distribution (this lecture) to answer probability questions (last lecture) as in hypothesis testing (next lecture: inferential stats) The best video you can watch to bring together all these: https://www.youtube.com/watch?v=7S7j 75d3GM4&t=52s (best stats video I have ever seen! pay attention to this) Hypothesis Testing Research Question Examples Descriptive question: What percentage of 2021 Boğaziçi graduates experienced exclusion at the workplace? Difference question: Is there a difference between the average math achievement scores of students having high versus low math grades? Associational/relationship question: Is there an association between having a positive attitude towards the subject neuroscience and having high grades in a neuroscience class? Question implying cause & effect: Would medical use of oral ketamine (0.4 mg/kg) lead to decreased depression compared to the placebo condition? Theory A Theory is a logically organized set of proposals that defines, explains and interrelates our knowledge about related phenomena (e.g. behaviors) Theories define hypotheses Scientific studies test hypotheses derived from theories (not the theories) HMS Beagle Hypothesis A research hypothesis, developed from a theory, makes a general prediction at the beginning of a study A research hypothesis is vague until it is operationally defined to become an experimental hypothesis, which states the precise responses or behaviors that will be used to measure variables under investigation Operational Definition It aims to make make abstract ideas concrete so that you can test them, and clearly states how you are going to measure the idea you are testing Operational definitions are critical in psychology, which is full of abstract concepts such as cognition, anxiety, personality, etc. Example: Conceptual definition of personality: an individual's unique set of consistent behavioral traits Operational definition of personality: introversion vs. extroversion levels (first specify the variables to be used as an indicator of the concept “personality”) as measured by the Myers-Briggs Type Indicator (then specify how to measure these variables) Research Hypothesis vs. Experimental Hypothesis Example of a research hypothesis: Boğaziçi students are better with words than other university students Example of an (operationally defined) experimental hypothesis: Junior-year Boğaziçi students, proportionally selected from all departments, perform better in letter combination measured by the scrabble g a m e as compared to junior-year students that are randomly selected from other universities in Istanbul< The Binomial Expression of the Hypothesis Null Hypothesis fark yok Alternative Hypothesis fark var Phenomena are the same as each Phenomena are different from each other, or the same as a theoretical other, or different from a theoretical expectation. expectation. E.g. Boğaziçi students do not E.g. Boğaziçi students perform perform differently in letter differently/better in letter combination combination compared to students compared to students from other from other universities in Istanbul. universities in Istanbul Inferential Statistics Inferential Statistics: Goal 1 (finding probability distributions) Inferential statistics involves probability distribution of a statistic (i.e. mean) Example: What is the probability of observing a mean IQ of 94 or less based on 42 randomly selected individuals from the general population? (µ = 100; σ = 15) Standard error of the mean σ = standard deviation of raw scores in the population N = sample size What’s the probability of randomly picking a mean greater than a value X? >> depends on: 1) The variance of the raw scores 2) Sample size (N) 3) The difference between the population mean and sample mean Example: What is the probability of observing a mean IQ of 94 or less based on 42 randomly selected individuals from the general population? (µ = 100; σ = 15) -2.59 The area under the standard normal curve from negative infinity to -2.59 is.004. Hence, the probability of observing a mean of 94 or less based on a random sample of 42 is.004. Z-Table Inferential Statistics: Goal 2 (making inferences) Inferential statistics involves: Making inferences: using z scores & alpha level (by using standard normal curve & probability density as we have seen) One sample z test The mean IQ of the 23 third graders in Ms. Smith’s class is 108.6. Are the children in Ms. Smith’s class representative of the general population with respect to IQ? Null Hypothesis: H0 Ms. Smith’s students are representative of the population (there is no difference) Alternative Hypothesis: H1 The students are not representative of the general population (there is a difference) Inferential Statistics: Inferential Statistics: Goal 2 Goal 1 (finding (making inferences) probability distributions) One sample z test What is the probability that the mean The mean IQ of the 23 third graders in IQ of Ms. Smith's class of 23 students Ms. Smith’s class is 108.6. Are the is 108.6, given the general population children in Ms. Smith’s class IQ distribution? representative of the general No hypotheses, just a probability population with respect to IQ? question. Null Hypothesis: H0 Ms. Smith’s students are representative of the population (there is no difference) Alternative Hypothesis: H1 The students are not representative of the general population (there is a difference) One-Sample Z Test Example: The mean IQ of the 23 third graders in Ms. Smith’s class is 108.6. Are the children in Ms. Smith’s class representative of the general population with respect to IQ? Null Hypothesis: H0 Ms. Smith’s students are representative of the population (there is no difference) Alternative Hypothesis: H1 or HA The students are not representative of the general population (there is a difference) Statistical Hypothesis Testing Example: Assuming the alternative hypothesis is true: We have numbers for X (108.6) and for σX (15 / sqrt(23) = 3.128), but we do not know the numerical value for μ Hence, we cannot calculate any probabilities of sampling a mean of 108.6 according to this hypothesis! Assuming the null hypothesis is true: We can actually calculate a z because this hypothesis implies that the mean of 108.6 is sampled from a group of means where the overall mean is μ (μ = 100) If the probability of observing a mean of 108.6 is remote (to μ = 100), then we can reject this hypothesis and conclude that the students are not representative. (in statistical hypothesis testing, you test the null hypothesis to reject or fail to reject it) Testing H0 by using Z scores and alpha (α) level The percent of most unlikely outcomes is defined by the alpha (α) level By convention, α is set to.05 or 5% Two-tailed (two-sided) test: a test of hypothesis that splits α level in half (one half is used for upper tail, the other for lower tail of the sampling distribution) Such a hypothesis is also called a non directional hypothesis One-tailed (one-sided) test: a hypothesis clearly pertains to only one side of the sampling distribution. Also called a directional hypothesis Critical z values (of z-score) define the endpoints of the range within which the sample mean is expected to fall. The critical z-score values when using a 95 percent confidence level are -1.96 and +1.96 standard deviations. The p-value associated with a 95% confidence level is 0.05. (If it is a one-tailed test, the critical value is either +1.645 or -1.645) Rejection region is the set of all z scores more extreme than the critical values (i.e. less than the negative critical value or greater than the positive critical value) Back to the Example A normal curve with Example: a mean of 100, SD of 3.128 Assuming the alternative hypothesis is true: 2.5% 2.5% Lower tail Upper tail Critical values -1.96 1.96 In our example, 2.75 is greater than 1.96 (the critical z), therefore we reject the null hypothesis. (Or you can also compare the p value that corresponds to 2.75, and compare it with.05. It should be lower than 0.05 because we rejected the H0) Back to the Example: Another way of looking at it is to find the actual values that correspond to critical z-values A normal curve with a mean of 100, SD of 3.128 2.5% 2.5% Lower tail Upper tail Lower cut-off: Upper cut-off: X = 106.13 In our example, 108.6 is greater than 106.13. Hence, we reject the null hypothesis and conclude that the students are not representative of the general population. Rejection regions for directional and non-directional tests directional non- directional -1.645 0 -1.96 0 +1.96 z score z score Reject H0 Fail to reject H0 Reject H0 Fail to reject Reject H0 H0 Note that a score that would fall in the 5% rejection region for a one-sided test may not fall in the rejection region for the corresponding two-sided test. One- vs. Two-Sided Tests The decision to run either a one- or two-sided test should be taken before the data is collected Many researchers run a two-sided test even if they have a 1-sided prediction (Alternative Hypothesis) This allows them to potentially reject the Null Hypothesis if the result comes out in the opposite direction to that predicted It is considered to be “cheating” if you decide to conduct a 2-sided test (when you really do not have a prediction as to the direction of the result) and then change to a 1-sided test after the 2-sided test has been conducted This is tempting if the observed value is not significant on the 2-sided test, but would have been significant on a 1-sided test (but do NOT use it) One-Sample Z Test The one-sample z test is unusual in that: 1) In one sample z-test, we know the values of the parameters for the null model (under the null hypothesis) and do not have to estimate them 2) As the name implies, only one sample is required to perform the test When the variable under study is quantitative in nature and measured on a level that at least approximates interval characteristics, hypotheses about the value of a population mean can be tested using the one-sample z test The Three Approaches to Hypothesis Testing Approach 1: Test statistic and p values. The first solution we used in the previous example. From the z-table, the probability of observing a z greater than 2.75 is.003. The probability of observing a z less than -2.75 is also.003. Hence the p value is.006. Because the p value is less than the α level.05, we reject the null hypothesis. IMPORTANT: Always compare a z-score to a critical z-score, and compare a p value to a critical p value. You have to make the comparison on the same level! That level can be either p values, or z-scores. And they will always yield the same result. The Three Approaches to Hypothesis Testing Approach 2: Test statistic z value found and critical z (the second solution we used in the previous example) The Three Approaches to Hypothesis Testing Approach 3: Confidence Limits (Confidence Intervals) The Three Approaches to Hypothesis Testing Approach 3: Confidence Limits (Confidence Intervals) Pretty identical to the critical value approach, BUT with a crucial change in strategy: While making t h e inference, base your center on the sample value, instead of the population value. Considering the sampling error specify a range of values that you are confident the population mean falls within, based on your sample mean. Example: 23 people sampled from the general population and took an IQ test. (The mean of the population is 100, SD is 15) -> same as the previous example. >> establish the 95% confidence limits. The equivalent Z values in the standard normal distribution are -1.96 and 1.96. The lower confidence limit will be 100 – 1.96*3.128 = 93.87 and the upper limit would be 106.13. Hence, if we repeatedly sampled means based on an N of 23, then 95% of the time, that mean should fall in the interval between 93.87 and 106.13. Approach 3 Hence, if we repeatedly sampled means based on an N of 23, then 95% of the time, that mean should fall in the interval between 93.87 and 106.13. BUT ONE MINUTE: these are as same as the critical values! Well, in fact, the confidence interval is calculated around the observed statistic (the «huge difference» mentioned in the previous example). Lower confidence limit is 102.47 When calculated with the positive z critical (1.96) the upper confidence limit is 114.73 SO the confidence interval for the mean IQ in Mrs Smith’s class is between 102.47 and 114.73. >> Does the interval contain the population mean for the null hypothesis? - NO. So reject the null hypothesis that Mrs Smith’s class is a representative sample of the general population. Approach 3 CONFIDENCE INTERVALS GIVE YOU A RANGE (AS OPPOSED TO A SINGLE POINT COMPARISON) 3 different types of questions that are solved using the same probability distribution based on the z-table: Probability question based Probability question based Inference question based on sample (one on a single datapoint on sample mean: What is sample z-test): The mean IQ of the 42 (without a sample): What is the probability of observing students from a class is 94. Is this class the probability of randomly a mean IQ of 94 or less representative of the general population? (µ selecting an individual with based on 42 randomly = 100; σ = 15) an IQ of 94 or less from the selected individuals from the general population? (µ = 100; general population? (µ = -2.59 σ = 15) 100; σ = 15) Null Hypothesis: H0: The class is representative of the population (there is no difference) Alternative Hypothesis: H1 or Ha: The class is not representative of the general population (there is a difference) -0.40 -2.59 The area under the standard normal curve from negative infinity to -2.59 is The area under the.0048; and from positive infinity to +2.59 is standard normal curve from The area under the negative infinity to -0.4 is standard normal curve from.0048 (Since this is a two sided test, we take negative infinity to -2.59 is the p value as their total, which is.0096). The 0.345. Hence, the probability critical z-values for a two sided test is -1.96 of observing a score of 94.004. Hence, the probability and +1.96. Since.0096 is below the alpha or less based on random of observing a mean of 94 or less based on a random level of.05 (and the z score of -2.59 is selection is.345 (34.5%) sample of 42 is.004. (0.4%) lower than -1.96), we reject the null hypothesis. The class is not representative of the population. Exercise for exam preparation Amongst trained typists, the average typing speed on an electrical typewriter is known to be 60 words per minute (wpm) with a standard deviation of 7 wpm (ı.e. µ = 60; σ = 7). The typing speeds are normally distributed. An occupational psychologist is interested in testing the hypothesis whether typing speed is different than the average when using a PC keyboard. A random sample of 20 trained typists was tested on a PC keyboard and their speeds recorded. These are presented below: 57 75 64 52 55 57 59 69 50 66 60 63 67 64 74 69 67 61 60 74 (Taking alpha level as.05): State the alternative hypothesis in words and statistically State the null hypothesis in words and statistically State the assumptions Do the statistical calculations clearly indicating the formulas you use What is the statistical decision? Why? Exercise for exam preparation State the null hypothesis in words, and statistically: in words: Typing on a keyboard does not lead to a difference in typing speed relative to typing on an electric typewriter. statistically: H0 : µ = 60 State the alternative hypothesis in words, and statistically in words: Typing on a keyboard leads to a difference in typing speed relative to typing on an electric typewriter. statistically: H1 : µ ≠ 60 (non-directional, two sided) Exercise for exam preparation Assumptions: - The sample is randomly selected from the population of interest and sample individuals are independent. - At least an interval level measure - The underlying distribution is normal or the Central Limit Theorem can be assumed to hold (N is large enough). Do the statistical calculations clearly indicating the formulas you use Exercise for exam preparation The area under the curve for a z score of 2.01 for a two sided test is.0444 (0.222 + 0.222). The p value of.044 <.05 (p critical). And the z value of 2.01 > 1.96 (z critical). Therefore, we reject the null hypothesis. "From the application of a one-sample z test, there is significant evidence to suggest that typing on a keyboard increases typing speed relative to an electric typewriter, z = 2.01, p < 0.05.« If the z score was below 1.96 (for instance if it was 1.95) in a two sided test, then the area under the curve would be more than.050 (According to table, it would be.0512). And we would «fail to reject» the null hypothesis, concluding that the sample is representative of this population. If the z score was again 1.95, but if it was a one sided test, then the area under the curve would be less than.050 (According to table, it would be.0256) And we would again «reject» the null hypothesis, concluding that the sample is not representative of this population. (The z-critical for a one sided test is 1.645, and 1.95 exceeds that).