Podcast
Questions and Answers
What is the primary comparison made in the logic of Null Hypothesis Significance Testing (NHST)?
What is the primary comparison made in the logic of Null Hypothesis Significance Testing (NHST)?
- Comparing Type I and Type II error rates.
- Comparing sample means to population parameters. (correct)
- Comparing alpha levels between different studies.
- Comparing an observed result to the distribution of values we would see if no intervention had occurred.
In the context of hypothesis testing, what does the null hypothesis typically represent?
In the context of hypothesis testing, what does the null hypothesis typically represent?
- A hypothesis of 'no effect' or 'no difference'.
- A hypothesis based on previous research findings. (correct)
- The hypothesis the researcher is trying to prove.
- A hypothesis that is always rejected.
Why is it important for the sample space to be complete in the logic of NHST?
Why is it important for the sample space to be complete in the logic of NHST?
- To simplify the calculations of the test statistic.
- To avoid Type I errors. (correct)
- To accurately compute probabilities.
- To ensure that the sample size is large enough.
What is the purpose of randomization in hypothesis testing?
What is the purpose of randomization in hypothesis testing?
In the context of statistical hypothesis testing, what does 'failing to reject the null hypothesis' mean?
In the context of statistical hypothesis testing, what does 'failing to reject the null hypothesis' mean?
If a researcher chooses a smaller significance level (e.g., p = .01) compared to the traditional 5% level, what is the likely consequence?
If a researcher chooses a smaller significance level (e.g., p = .01) compared to the traditional 5% level, what is the likely consequence?
What is a key difference between directional and non-directional hypothesis tests?
What is a key difference between directional and non-directional hypothesis tests?
In the context of the 'Lady Tasting Tea' example, what hypothesis was Fisher testing?
In the context of the 'Lady Tasting Tea' example, what hypothesis was Fisher testing?
What is the role of descriptive statistics in decision-making?
What is the role of descriptive statistics in decision-making?
In the context of research, what is a confounding variable?
In the context of research, what is a confounding variable?
What is the purpose of 'blinding' participants in a study?
What is the purpose of 'blinding' participants in a study?
What is the key limitation of observational studies in determining causation?
What is the key limitation of observational studies in determining causation?
In the context of hypothesis testing, what does statistical power refer to?
In the context of hypothesis testing, what does statistical power refer to?
How did Karl Pearson contribute to the history of p-values?
How did Karl Pearson contribute to the history of p-values?
In hypothesis testing, what does a Type I error represent?
In hypothesis testing, what does a Type I error represent?
Flashcards
NHST (Null Hypothesis Significance Testing)
NHST (Null Hypothesis Significance Testing)
A framework for making statistical decisions based on experimental results, often used in psychology.
Logic of NHST
Logic of NHST
Compares an observed result to the distribution of values to determine if an intervention had an effect.
P-value
P-value
The probability of obtaining test results at least as extreme as the results actually observed, assuming that the null hypothesis is correct.
Explanatory/Independent Variable
Explanatory/Independent Variable
Signup and view all the flashcards
Response/Dependent Variable
Response/Dependent Variable
Signup and view all the flashcards
Confounding Variable
Confounding Variable
Signup and view all the flashcards
Placebo Effect
Placebo Effect
Signup and view all the flashcards
Observational Study
Observational Study
Signup and view all the flashcards
Null Hypothesis (H0)
Null Hypothesis (H0)
Signup and view all the flashcards
Research Hypothesis (H1)
Research Hypothesis (H1)
Signup and view all the flashcards
Nondirectional Tests
Nondirectional Tests
Signup and view all the flashcards
Directional Tests
Directional Tests
Signup and view all the flashcards
Random Assignment
Random Assignment
Signup and view all the flashcards
Random Sampling
Random Sampling
Signup and view all the flashcards
Statistical Power
Statistical Power
Signup and view all the flashcards
Study Notes
- Null Hypothesis Significance Testing (NHST) is a traditional statistical method for understanding and making decisions about experimental results
Decision-Making in Statistics
- Descriptive statistics, distributions, sampling, probability, and estimators are key components
Logic of NHST
- It hinges on the probability of obtaining an outcome by chance
- An experimental condition is believed to be the cause if it's unlikely to have occurred by chance
- An observed result will be compared to the distribution of values that would occur without intervention
History of P-Value Computation
- “P-value” calculations trace back to the 1700s with John Arbuthnot and Pierre-Simon Laplace
- They Studied birth records from 1629 to 1710 and Thought that male to female births should be 50:50
- They found more males were born in London than females, then Suggested divine providence as the cause because it could not be due to chance
Karl Pearson
- Formally introduced the P-value in his chi-squared test in 1914
- The Chi-squared test (X2) is used for sets of categorical data, evaluating the likelihood of observed differences between sets arising by chance
P-Value Significance
- Reporting p-values is a tool to communicate certainty in science but is also highly misused
- Converging and independent evidence is a better method
Ronald Fisher
- Popularized the p-value and suggested a 1/20 chance (p=.05) of critical value exceedance as reasonable criterion
- Instead of calculating p-values for different values of x2 and N, he computed x2 values for specific p-values (for different Ns)
Lady Tasting Tea Example
- This is Fisher's archetypal example of p-values
- Dr. Muriel Bristol claimed to differentiate between tea prepared with milk first or tea first
- Dr. Bristol tasted 8 cups in a randomized order and stated correctly how each was prepared
- Perfect classification would be 1/70 (p=.014)
- Having no special tasting ability was the rejected hypothesis
- Fisher emphasized interpreting p as the proportion of values at least as extreme as the observed value, assuming chance alone
Competing Claims in Hypothesis Testing
- Hypothesis of interest or alternate hypothesis (H1) is the hypothesis that "There is something going on!"
- For example, "A glass of red wine everyday is good for cardiovascular health"
- Complementary prediction or null hypothesis (H0) is the hypothesis that "There is nothing going on!"
- For example, "A glass of red wine everyday has either no effect or a negative effect on cardiovascular health"
- Data is collected and a determination is made to either reject or retain the null hypothesis
Hypothesis
- A Hypothesis states that: In Canada, being female causes an employee to earn less than being male.
- Generality or Scope asks, What is the theoretical population to which the hypothesis applies to?
- Causal Mechanism asks, What is the chain of events that explains the observation?
- Correlation is not causation
Directional vs. Nondirectional Tests
- Nondirectional tests do not specify a direction of effect
- The treatment group differs from the control group, Research Hypothesis: μτ ≠ με, Null Hypothesis: μτ = με
- Directional tests do specify a prediction direction for the effect
- The treatment group with will perform better than the control group, Research Hypothesis: μτ > με, Null Hypothesis: μτ ≤ με
- The treatment group will perform worse than the control group, Research Hypothesis: μτ < με, Null Hypothesis: μτ ≥ με
- Sets of predicted events must be complimentary!
Explanatory and Response Variables
- Explanatory variable (Independent variable or Predictor variable) the variable you suspect is causing the effect
- Response variable (Dependent variable or Criterion variable) is the variable you think is being influenced
- These labels do not guarantee causal association, even if a relationship is identified
Definitions Related to Experimental Design
- A confounding variable covaries with the independent variable and presents an alternate explanation
- A placebo is a "treatment" missing the active ingredient (e.g., sugar pill), used in control groups
- A placebo effect is when participants improve because they believe they are getting the active treatment
- Blinding is when participants are unaware if they are in control or treatment groups
- Double-blinding is when researchers and participants are unaware of who is in the treatment versus control groups
Testing Relationships
- Observational studies lack researcher interference or manipulation, the researcher collects environmental data as a passive observer
- Observational studies can establish the presence of a relationship but cannot provide any causation
How To Determine Causation
- Use Experimental designs involving an experimental group and random samples
Random Assignment vs. Random Sampling
- Random assignment- Causal conclusion, generalized to the whole population
- No random assignment- No causal conclusion, correlation statement generalized to the whole population
- No random sampling- Causal conclusion given only for the sample
Gender Discrimination experiment
- 48 randomly selected male bank supervisors received the same personnel file and judged whether the person should be promoted to a “routine” branch manager job
- The personnel files were the same except in half, the file contained a female first name and files were randomly assigned
- 35 of 48 files were recommended for promotion, the question posed was: Were the females unfairly discriminated against?
- Independent variable - female vs male name
- Dependent variable - promotion decisions
- Promotion decisions favored males by 29.2% over females
Claims regarding Gender Discrimination experiment
- Null Hypothesis (“nothing is going on") states promotion and gender are independent, No gender discrimination, Observed proportions due to chance, Complementary to the Research Hypothesis
- Research Hypothesis (“something is going on")- Promotion and gender are dependent, There is gender discrimination, Observed proportions are NOT due to chance
Metaphor - Trial as Hypothesis Test
- Hypothesis testing is like a trial in a court of law
- H0: defendant is innocent, H1: defendant is guilty
- Evidence is collected and presented, the Evidence must be judged - “could the data plausibly have happened by chance if the null hypothesis were true?"
- If the data is very unlikely to have occurred by chance, the evidence is "beyond reasonable doubt"
- Decision making depends on how unlikely is unlikely?
- If the evidence is not strong enough to reject the presumption of innocence, the jury returns a verdict of “not guilty”, although the defendant may still be innocent
- In stats, we 'fail to reject' the null hypothesis which means the null hypothesis is not declared to be true (because we do not know) and we do not 'accept' the null hypothesis
Recap
- Hypothesis testing starts by proposing a null hypothesis ( that represents the 'status quo'
- The null hypothesis is tested against a research hypothesis, hypothesis testing is then conducted under assumption that the null hypothesis is true through randomization, or via theoretical methods
- If the test results show that the data doesn't provide convincing evidence for the research hypothesis, then we retain the null hypothesis, otherwise, the null hypothesis is rejected in favour of the research hypothesis
Complementarity
- The research hypothesis suggests an outcome, such as "One glass of red wine a day improves cardiovascular health"
- What null hypothesis would be the complementary to this?
- Red wine does nothing for cardiovascular health
- Red wine makes cardiovascular health worse
- Both of these outcomes together
- In order for the logic of NHST to work, the sample space must be complete, where Without a complete sample space, we cannot accurately compute the probabilities
Randomization Methods
- Bootstrap confidence intervals have been calculated
- New methods can test this specific hypothesis with Randomization test, Permutation test, Monte Carlo simulation, or Bootstrap
- Randomization test: redo randomization assignment performed in the experiment K times
- Permutation test: check all possible ways of random assignment
- Monte Carlo simulation: generate new data under a model for the null hypothesis
- Bootstrap: use observed sample as a model for the effect in the population and draw K new samples of size N
Randomization Test - Gender Discrimination Example
- under Ho, the gender on the CVs has no influence on the promotion decision.
- Assuming the Ho is correct, a distribution of likely results can be found by simply redoing the random assignment and then recomputing the test statistic
- Under Ho, the independent variable has no influence on the dependent variable – this makes the random assignment exchangeable!
- After shuffling, the new experimental result is assessed assuming that Ho is true against the statistic that P(observed statistic ≥ x | HO)
- If the P-value of 2/100 (p=.02) indicates that observed results could have occurred by chance with a probability of 2%, even if there is no discrimination against women
Hypothesis Testing Conclusions
- Hypothesis testing involved a judgment of guilt, and if their is a fail to reject (guilty or not guilty decision)
- A determination of being beyond "reasonable doubt is weighed in relation to the the threshold for evidence in statistics.
- 5% (p=.05) is considered sufficient evidence to reject the null hypothesis in psychology, this is arbitrary!!
Decision Errors in Hypothesis Testing
- Hypothesis tests are are not flawless, innocents can be wrongly convicted as well as the the guilty can be set free
- Similar errors occur in statistical hypothesis tests also, where the difference is that we have the tools to manage these
- Two competing hypotheses: Ho and H1 are made
- Type I Error is rejecting the null hypothesis when Ho is true, the probability of this action is defined as P(outcome > threshold | Ho)
- Type II Error is failing to reject the null hypothesis when H₁ is true, and the p-value tells nothing about this likelihood
- The thresholds can be set with the desired change of evidence
Effect Sizes and Errors in Hypothesis Testing
- Effect Size is being the strength of the effect
- As the strength of an effect increases, the likelihood of making a Type I Error goes down
- As relationships between variables are strengthened, the statistical tests become more likely to reach the critical threshold
- Strong effect => less likelihood of false positive
Statistical Power, Decisions and Significance
- Statistical “power” is the likelihood of finding an effect if one exists.
- The power of a study/experiment is its probability of rejecting HO if HO is false
- As power increases, the likelihood of making a Type II Error declines and Lower power gives a greater likelihood of a false negative
- Designed to balance the probability of Type I error with probability of Type II error
- In science, there is a cost of not approving an effective treatment and Claiming that the status quo is acceptable when in fact discrimination is occurring
- A significance level of 5% is a traditional and frequently used threshold
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore Null Hypothesis Significance Testing (NHST), a statistical method for understanding experimental results. Learn about decision-making in statistics, including descriptive statistics, distributions, and probability. Also, trace the history of P-value calculations back to the 1700s with John Arbuthnot and Pierre-Simon Laplace.