"LT2206 Notes" PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an overview of statistical concepts, focusing on models, distributions (empirical and theoretical), and variables. It also covers qualitative and discrete/continuous variables, and the normal distribution. The document details important statistical measures such as mean, median, and range.
Full Transcript
1\. Models: a simplified representation of a system, but not an exact replica. 2\. Models allow summarizing a complex system with a handful of descriptive features that characterize key aspects of the system. E.g. maps, restaurant menus, agendas 3\. Most useful types of models take numerical form....
1\. Models: a simplified representation of a system, but not an exact replica. 2\. Models allow summarizing a complex system with a handful of descriptive features that characterize key aspects of the system. E.g. maps, restaurant menus, agendas 3\. Most useful types of models take numerical form. 4\. Means & standard deviations summarize a distribution by providing key details that describe features of the distribution。 5\. Distributions: the position, arrangement, or frequency of occurrence over an area or throughout a space or unit of time. 6\. Empirically observed distributions: - Based on actual observations - All counts can be added up to produce a frequency distribution - Frequency distributions can be represented in a histogram - Each outcome is associated with a particular frequency value - 7\. Theoretical distributions: - Based entirely on theoretical considerations rather than data - Typically represented by probabilities rather than frequencies - Has expected probabilities based on infinite observations - 8\. Uniform distribution: - Discrete version - The probability is uniformly spread across all possible outcomes 9\. Variables: a characteristic which changes from person to person in a study 10\. Qualitative Research: - Questions are answered with a descriptive word or phrase (e.g. marital or employment status) - Questions are answered by giving a number - Can be discrete i.e. values can only differ by a fixed amount (e.g. family size, houses owned) - Can be continuous i.e. values can differ by arbitrarily small amounts (e.g. age, annual income) 11\. Discrete variables: fixed number of possible outcomes for the observations 12\. Continuous variables: the possible outcomes for the observations is infinite within the range of possibilities 13\. Theoretical distributions: - Tools for modelling empirically observed data - Assume a generating 'process' which follows a particular distribution - If the properties of the distribution can be discovered or estimated we can predict future observations of that process - Different distributions are more or less useful for modelling different types of underlying 'process' 14\. Normal Distribution (Gaussian distribution): - A theoretical distribution - Characteristic shape: Bell curve - Continuous data - Mean indicates how large or small a set of numbers are overall - Data are centered symmetrically around the mean - Bulk of the data are close to the mean - Roughly the 'location' of a distribution on the x axis - Parameters: properties of a distribution; changing parameters changes the characteristics of the distribution i.e. how the distribution looks (e.g. mean & s.d.) - Changing the mean moves the distribution along the x axis - Changing the s.d. stretches or squeezes the distribution - The mean and median are identical - Probability density: 1. The area under the curve will always add up to 1 2. \(1) + symmetry of bell curve = Regions defined by multiples of s.d. capture a characteristic proportion of the total probability density (e.g. 68% fall within 1 s.d.) - Features of ND allows predictions based on estimates of the parameters to be drawn, useful for data generated by a process that follows ND as no need to observe all instances of the process 15\. Mean: - Represents a distribution - Serves to simplify data (compression) - Used to make predictions (not very accurate if used alone) 16\. Descriptive Statistics: stat used to summarize data 17\. Inferential Statistics: stat that allow making inferences about populations of interest - Goal: to make inferences about the population parameters based on some sample estimates - Take one/multiple samples from the population and based one it/them to try to estimate the parameters that would allow the description of the population - Help to quantify uncertainty i.e. can estimate how confident one can be that one has parameter estimates that represent the population 18\. Sample: collection of observations that is designed to be representative of a larger population 19\. Population: all possible observations of a particular phenomenon of interest (don't usually have direct access too as it is too large) 20\. Samples are taken form the population to estimate the population parameters 21\. Median: the value for which half the data in the distribution fall above this value and half fall below 1. A different way of summarizing a data distribution 2. If there are an even number of values in the data, the median is computed as the mean of the middle two values 3. Advantage: Less sensitive to extreme values than the mean 4. Disadvantage: the mean incorporates more info into the summary than the median 22\. Range: a summary stat based om the difference between the min and max value - Typically complimented by other summary stats - Disadvantage: less useful as a measures of spread in the data because it is highly susceptible to changes in extreme values - Advantage: useful to get an idea of the smallest and largest numbers in a dataset 23\. Standard deviation: a measure used to characterize the spread in distribution - Roughly the average distance from the mean - Larger SD -\> flatter histograms Calculated by: 1\. Calculating the mean. 2\. Subtracting the mean from the value of each observation and squaring the result. 3\. Dividing that by the number of observation minus 1. 4\. Taking the square root. 24\. Boxplots: - 'box' =50%; 25% above & below median - Q1: 25^th^ percentile; Q2: median; Q3: 75^th^ percentile - Interquartile range: Q3-Q1 i.e. length of box - Whiskers: largest & smallest numbers that fall 1.5 times the IQR from the 1^st^ or 3^rd^ quartile - i.e. upper whisker = the largest number 1.5 times away from Q3; lower whisker = the lowest number 1.5 times away from Q1 - Outliers (extreme values): dots falling outside the whiskers 25\. Null Hypothesis Significance Testing (NHST, refer to no.17 inferential stats) - Most widely used but controversial) approach to making inferences about the population based on a sample - A frequentist approach: designed with long-run large-sample frequencies in mind - Goal: keep the rate of wrong claims about the population at a low level -\> many samples provides more certainty - Keeps the error rate for a statistical test relatively low and well specified i.e. can be relatively confident that claims about the population are correct 26\. Cohen's d - The magnitude of the difference between the means of 2 samples, divided by the s.d. of all data points in both samples - Used to measure the strength of a difference between two means - Standardized effect size measure - A measure becomes standardized is due to division by some measure of variability in the sample - -\> Captures the raw strength of the efffect and the variability - Takes into account the magnitude and variability in the data - Does not incorporate sample size into the inferential stats -\> can incorporate it by looking at the standard error (refer to no.30) 27\. Three Elements crucial for evaluating effect sizes: - Magnitude of a difference: the bigger the difference between groups (samples), the more likely it is that there's a difference in the population - Variability in the data: the less variability there is within a sample, the more certain you can be that you have estimated the difference accurately - The sample size: bigger samples allow you to measure differences more accurately 28\. Can measure strong effects either: - Larger difference (greater certainty that our samople spproximates the population) i.e. If the signal is very strong i.e. a large difference - Less variability (smaller combined s.d.) i.e. If the noise is very weak i.e. little variability (small s.d.) 29\. Effect size: - Small: \=0.5 but \=0.8 30\. Standard Error: - SE = s.d. / square root sqrt (sample size) - Combines the variability in the data & the sample size - The precision with which a quantity is measured - Smaller SEs measure the corresponding parameter more precisely - Larger SEs indicate that one is more uncertain about one's estimates - One can measure the means more precisely if there is a lot of data (large samole size) or if there is little variability (small sample s.d.) in the data, or both - Can be used to calculate 95%b confidence intervals (CIs); if alone compute a CI each time for an infinite number of experiments, 95% of the time this CI would contain the true population parameter (i.e. the mean of the population) \*assigning an uncertainty to the estimate; still possible that the actual population parameter of interest will not be inside the CI - Always working with hypotheticals (probabilities) to assign certainty - CIs allow you to be relatively confident in your inferential stats working out in the long run, but cannot do so for each individual dataset 31\. Null Hypotheses: - Involves using the CIs to make binary decisions based on decision criteria - Has been criticized for leading to "mindless statistical rituals", "dichotomous thinking", "lazy thinking" - Corresponds to the idea that an observed difference is due to chance - Used as a hypothetical case that we want to rule out - Should be stated in such a way that if it turns out to by confirmed, the observed effect is very likely to be due to chance i.e. if the NH in the stat significance testing is rejected, it can be sure that the 'real' effect is very unlikely to have occurred by chance - The significance testing of null hypothesis (NHST) is by comparing the mean between 2 groups: - H0: There is no difference between 2 groups - HA: There is a difference between the groups - Pair: Alternative hypothesis (HA): corresponds to the idea that the observed difference is real; is what one actually interested in showing; less specific than H0; no probabilities assigned to - NH is an assumption about the population: evaluate H0 based on the sample from the study after having clearly stated H0 and HA - Under the NHST framework the incompatibility of data is measured with the NH \*NH is an imaginary construct that one can never measure directly - Primary goal: evaluate NH based on the real sample of data -\> to show the data are not compatible with the NH so that it can be rejected, thus be reasonably confident about HA - Test statistic is used to measure the difference between the data and what would be expected based on the null hypothesis (to measure how incompatible the dtat are with the NH used) - T-test looks at the difference between group means but divides by the standard error: takes into accountthe 3 elemnts (refer to no.27) - In numerator (difference between means): unstandardized effect size - In denominator: variability + sample size - Almost identical to Cohen's d, but the denominator contains SE \[s.d. / square root (sample size)\] rather than just s.d. - T-test is sensitive to sample size while Cohen's d isn't - Produces t-values: actual estimates from real data; rpovide a measure that can be used to argue the data are incompatible with the NH - T-distribution: - Heavier tails: the probability of particular t-values if the NH (equal means) were true - Bell-shaped: very large/ small values are very improbable under the NH i.e. extreme t-values are very rare in the case of random sampling (NH) - T-values closer to zero are more probable (small mean differences) i.e. if there are no group differences random sampling is likely to give you a small or zero t-value - conditional probability: the probability of our measured t-value conditional to the NH i.e. probability of one event given (hypothetical) knowledge about another event - A p-value is the area under the curve for values more extreme than the measured t-value e.r. 1.5 and below/above - Doesn't represent the probability of the NH being true (NH is an assumption i.e. whether or not it is true can't be known) - Doesn't represent the strength of an effect (refer to test stat, 3 elements) i.e. can never directly read off the effect size form such a test stat, whether the effect is large or not will always b ambiguous (can't determine which factor is causing the low p-value) - Always mention some effect size measure along with the corresponding significance testing output - Combination of the two allows one to better assess how important a particular result is - Tells about incompatibility of the data with the NH, but don't allow one to draw direct conclusions about the HA - Two-tailed test: ignore sign of the test stat i.e. makes no assumptions about direction of effect - The alpha level: threshold below which p-values are good-enough evidence agianst the NH; widely set at 0.05; if p\ - If a relatively large t-value is obtained, a sample group difference can be inferred 1. Define the population(s) of interest 2. Take a sample from the them 3. For comparing groups, state the NH that the means of the 2 groups are equal 4. Compute a t-value based on the samples 5. Check how improbably this t-value is under the NH 6. If p\ - Setting a=0.05 means accepting 5% of the time may expect type I in the long run - Have certainty about the significance testing procedure working out in the long run - Have control over how often one is willing to commit a type I 33\. Type II error: failing to obtain a statistically significant effect even though the NH is false - Missed significant result - False negatives - Represented by beta (β): the probability of missing a real effect in the population based on the stats one carries out on the sample - Complement of β is statistical power (1 - β; or π): describes the ability of a testing procedure to detect a true effect - Affected by 3 ingredients: 1. Effect magnitude 2. Variability 3. Sample size - Can be increased by: 1. Increasing the magnitude of the effect i.e. make the experimental manipulations more extreme 2. Decreasing the variability in the sample i.e. make the sample more homogenous 3. Increasing the sample size i.e. collect more data (usually chosen because easier to control) - 80% chance of obtaining a statistically significant result in the presence of a real effect 34\. Type M error: an error in estimating the magnitude of an effect - E.g. The sample will suggest a much larger effect than is characteristic of the population 35\. Type S error: failure to capture the correct sign of an effect - The direction of the effect based on the sample is opposite to an actual effect that is present in the population - Increasing statistical power reduces the risk of type II, type M and type S errors, and would make interpreting null results more acceptable - Small sample sizes should thus be avoided whenever possible as they increase the risk of type II, type M and type S errors, and make null results difficult to interpret as one can't easily claim an effect isn't present when a statistically significant result is not found (absence of evidence is not evidence of absence) 36\. Multiple testing (multiple comparisons problem): id the same statistical significance test has run for multiple times, the type I error rates begin to add up - Family-wise error rate (FWER): the probability of obtaining at least 1 type I error for a given number of tests - FWER = 1 - (1 - 0.05)k \*k id the number of statistical tests conducted at the specified alpha level (0.05) - tells the probability of not committing a type I error for k tests - Becomes 0.95 if k=1 i.e. if the NH is true there is a very high chance of not obtaining a statistically significant result - As k increases the chances of false positive increase - If performing multiple tests is unavoidable, multiple comparison correction can be performed to make the significance test more conservative depending on the number of tests conducted 37\. Bonferroni correction: divide the alpha level by the number id statistical tests performed - E.g. after performing 4 tests, a = 0.05/4 = 0.0125 and p-value would need to fall belows this to reject NH - Also possible to adject the p-value rather than the alpha level (equivalent outcome) - E.g. Multiply p-value by the number if tests and compare to original alpha level 0.05 38\. Why not decide on the sample size based on running a statistical test for each new data point added to the dataset? - more data is generally better - It can be problematic to base the choice of sample size on having obtained a statistically significant result - If the NH is true, p-values are uniformly distributed between 0 and 1 i.e. any value in this range is equally probable for each test 39\. Stopping rule: determines when the data collection is completed i.e. the sample sixe should be decided in advance - Justification for sample sizes is becoming the norm which can be based on a formal power calculation or on previous similar research 40\. T-test Varieties: - Independent samples t-test - student's t-test assumes equal variance in the two groups - Welch's t-test doesn't require this assumption and can deal with fractional degrees of freedom - Paired samples t-test (dependent samples t-test): - Used when observations are linked - E.g. the same participant is exposed to two conditions in an experiment - One-sample test (single-parameter t-test/ single sample t-test) - Compares the mean of one example to a known standard (or theoretical/hypothetical) mean i.e. just 1 set o numbers (samples) is tested against some pre-established number (usually taken from what is known based on previous research 41\. T-test asuptions: - Normality: the data should be roughly normally distributed - For repeated measures, the distribution of differences between observations in the pairs should be normally distributed - Important for smaller samples as the accuracy of the estimates of the parameters is more dependent on normality - Violating normality can increase type I error ate - Homoscedasticity (homogeneity of variances): - The variance (measured by s.d.) should be roughly equivalent for the groups being compared - Violations of homoscedasticity can lead to increased type i error rates - Independence: - A dependence is any form of connection between data points - E.g. multiple data points from the same participant tend to be mopre similar than multiple data points from different participants -\ if A and B perform differentlt on some measure, then all data points from A will be more similar to each otherthan those from B (acting as a group) - For an independent (unpaired sample) t-test, every data point should come from a different participant - For a dependent (paired sample, refer to no.40) t-test, every participant can contribute maximally 1 pair of data points - Violations of independence can lead to inflation of type I error rate - Many linguistic datasets have this feature e.g. psycholinguistics, phonetics, sociolinguistics, typological studies, corpus studies 42\. Z-score (standardizing) - Linear transformation: a transformation that does not affect the relationship between the numbers it is applied to - E.g. addition, subtraction, division, multiplication (add 1 to a distribution of 2, 4, 6 and it becomes 3, 5, 7) - Centering: a linear transformation that expresses each data point in terms of how far away it is from the mean - Subtract the mean from each data point, express it as a mean deviation score - Standardizing: a linear transformation that expresses each value in a distribution in terms of how many standard deviations it is away from the mean - A z-score can be read off directly how far away a data point is from the mean in standard (deviation) units (z) - Gets rid of a variable's metric, which may help to compare different variables