Lecture 5
15 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary comparison made in the logic of Null Hypothesis Significance Testing (NHST)?

  • Comparing Type I and Type II error rates.
  • Comparing sample means to population parameters. (correct)
  • Comparing alpha levels between different studies.
  • Comparing an observed result to the distribution of values we would see if no intervention had occurred.

In the context of hypothesis testing, what does the null hypothesis typically represent?

  • A hypothesis of 'no effect' or 'no difference'.
  • A hypothesis based on previous research findings. (correct)
  • The hypothesis the researcher is trying to prove.
  • A hypothesis that is always rejected.

Why is it important for the sample space to be complete in the logic of NHST?

  • To simplify the calculations of the test statistic.
  • To avoid Type I errors. (correct)
  • To accurately compute probabilities.
  • To ensure that the sample size is large enough.

What is the purpose of randomization in hypothesis testing?

<p>To increase the effect size of the independent variable. (B)</p> Signup and view all the answers

In the context of statistical hypothesis testing, what does 'failing to reject the null hypothesis' mean?

<p>The null hypothesis is proven to be true. (B)</p> Signup and view all the answers

If a researcher chooses a smaller significance level (e.g., p = .01) compared to the traditional 5% level, what is the likely consequence?

<p>Increased chance of Type I error. (D)</p> Signup and view all the answers

What is a key difference between directional and non-directional hypothesis tests?

<p>Non-directional tests specify the direction of the effect, while directional tests do not. (C)</p> Signup and view all the answers

In the context of the 'Lady Tasting Tea' example, what hypothesis was Fisher testing?

<p>Whether tea tastes better with milk or without milk. (B)</p> Signup and view all the answers

What is the role of descriptive statistics in decision-making?

<p>To determine statistical significance. (B)</p> Signup and view all the answers

In the context of research, what is a confounding variable?

<p>A variable used to measure the dependent variable. (B)</p> Signup and view all the answers

What is the purpose of 'blinding' participants in a study?

<p>To ensure that participants are unaware of the study's hypothesis. (B)</p> Signup and view all the answers

What is the key limitation of observational studies in determining causation?

<p>They always lead to Type I errors. (C)</p> Signup and view all the answers

In the context of hypothesis testing, what does statistical power refer to?

<p>The probability of making a Type I error. (C)</p> Signup and view all the answers

How did Karl Pearson contribute to the history of p-values?

<p>He first suggested the use of p = .05 as a critical value. (B)</p> Signup and view all the answers

In hypothesis testing, what does a Type I error represent?

<p>Correctly rejecting a false null hypothesis. (B)</p> Signup and view all the answers

Flashcards

NHST (Null Hypothesis Significance Testing)

A framework for making statistical decisions based on experimental results, often used in psychology.

Logic of NHST

Compares an observed result to the distribution of values to determine if an intervention had an effect.

P-value

The probability of obtaining test results at least as extreme as the results actually observed, assuming that the null hypothesis is correct.

Explanatory/Independent Variable

A variable that you suspect is causing an effect in an experiment.

Signup and view all the flashcards

Response/Dependent Variable

A variable you think is being influenced by the explanatory variable.

Signup and view all the flashcards

Confounding Variable

A variable that covaries with the independent variable, providing an alternate explanation for the experimental effect.

Signup and view all the flashcards

Placebo Effect

Participants show improvement because they believe they are getting the active treatment.

Signup and view all the flashcards

Observational Study

Researchers collect data without interfering or manipulating the environment.

Signup and view all the flashcards

Null Hypothesis (H0)

A statement of no effect or no difference.

Signup and view all the flashcards

Research Hypothesis (H1)

The hypothesis of interest, suggesting an effect or a difference.

Signup and view all the flashcards

Nondirectional Tests

Tests that do not specify a direction of effect; simply stating there is a difference.

Signup and view all the flashcards

Directional Tests

Tests that specify a prediction direction for the effect.

Signup and view all the flashcards

Random Assignment

Randomly assigning participants to different conditions in an experiment.

Signup and view all the flashcards

Random Sampling

Randomly selecting participants from a population.

Signup and view all the flashcards

Statistical Power

The likelihood of finding an effect if one exists.

Signup and view all the flashcards

Study Notes

  • Null Hypothesis Significance Testing (NHST) is a traditional statistical method for understanding and making decisions about experimental results

Decision-Making in Statistics

  • Descriptive statistics, distributions, sampling, probability, and estimators are key components

Logic of NHST

  • It hinges on the probability of obtaining an outcome by chance
  • An experimental condition is believed to be the cause if it's unlikely to have occurred by chance
  • An observed result will be compared to the distribution of values that would occur without intervention

History of P-Value Computation

  • “P-value” calculations trace back to the 1700s with John Arbuthnot and Pierre-Simon Laplace
  • They Studied birth records from 1629 to 1710 and Thought that male to female births should be 50:50
  • They found more males were born in London than females, then Suggested divine providence as the cause because it could not be due to chance

Karl Pearson

  • Formally introduced the P-value in his chi-squared test in 1914
  • The Chi-squared test (X2) is used for sets of categorical data, evaluating the likelihood of observed differences between sets arising by chance

P-Value Significance

  • Reporting p-values is a tool to communicate certainty in science but is also highly misused
  • Converging and independent evidence is a better method

Ronald Fisher

  • Popularized the p-value and suggested a 1/20 chance (p=.05) of critical value exceedance as reasonable criterion
  • Instead of calculating p-values for different values of x2 and N, he computed x2 values for specific p-values (for different Ns)

Lady Tasting Tea Example

  • This is Fisher's archetypal example of p-values
  • Dr. Muriel Bristol claimed to differentiate between tea prepared with milk first or tea first
  • Dr. Bristol tasted 8 cups in a randomized order and stated correctly how each was prepared
  • Perfect classification would be 1/70 (p=.014)
  • Having no special tasting ability was the rejected hypothesis
  • Fisher emphasized interpreting p as the proportion of values at least as extreme as the observed value, assuming chance alone

Competing Claims in Hypothesis Testing

  • Hypothesis of interest or alternate hypothesis (H1) is the hypothesis that "There is something going on!"
  • For example, "A glass of red wine everyday is good for cardiovascular health"
  • Complementary prediction or null hypothesis (H0) is the hypothesis that "There is nothing going on!"
  • For example, "A glass of red wine everyday has either no effect or a negative effect on cardiovascular health"
  • Data is collected and a determination is made to either reject or retain the null hypothesis

Hypothesis

  • A Hypothesis states that: In Canada, being female causes an employee to earn less than being male.
  • Generality or Scope asks, What is the theoretical population to which the hypothesis applies to?
  • Causal Mechanism asks, What is the chain of events that explains the observation?
  • Correlation is not causation

Directional vs. Nondirectional Tests

  • Nondirectional tests do not specify a direction of effect
  • The treatment group differs from the control group, Research Hypothesis: μτ ≠ με, Null Hypothesis: μτ = με
  • Directional tests do specify a prediction direction for the effect
  • The treatment group with will perform better than the control group, Research Hypothesis: μτ > με, Null Hypothesis: μτ ≤ με
  • The treatment group will perform worse than the control group, Research Hypothesis: μτ < με, Null Hypothesis: μτ ≥ με
  • Sets of predicted events must be complimentary!

Explanatory and Response Variables

  • Explanatory variable (Independent variable or Predictor variable) the variable you suspect is causing the effect
  • Response variable (Dependent variable or Criterion variable) is the variable you think is being influenced
  • These labels do not guarantee causal association, even if a relationship is identified
  • A confounding variable covaries with the independent variable and presents an alternate explanation
  • A placebo is a "treatment" missing the active ingredient (e.g., sugar pill), used in control groups
  • A placebo effect is when participants improve because they believe they are getting the active treatment
  • Blinding is when participants are unaware if they are in control or treatment groups
  • Double-blinding is when researchers and participants are unaware of who is in the treatment versus control groups

Testing Relationships

  • Observational studies lack researcher interference or manipulation, the researcher collects environmental data as a passive observer
  • Observational studies can establish the presence of a relationship but cannot provide any causation

How To Determine Causation

  • Use Experimental designs involving an experimental group and random samples

Random Assignment vs. Random Sampling

  • Random assignment- Causal conclusion, generalized to the whole population
  • No random assignment- No causal conclusion, correlation statement generalized to the whole population
  • No random sampling- Causal conclusion given only for the sample

Gender Discrimination experiment

  • 48 randomly selected male bank supervisors received the same personnel file and judged whether the person should be promoted to a “routine” branch manager job
  • The personnel files were the same except in half, the file contained a female first name and files were randomly assigned
  • 35 of 48 files were recommended for promotion, the question posed was: Were the females unfairly discriminated against?
  • Independent variable - female vs male name
  • Dependent variable - promotion decisions
  • Promotion decisions favored males by 29.2% over females

Claims regarding Gender Discrimination experiment

  • Null Hypothesis (“nothing is going on") states promotion and gender are independent, No gender discrimination, Observed proportions due to chance, Complementary to the Research Hypothesis
  • Research Hypothesis (“something is going on")- Promotion and gender are dependent, There is gender discrimination, Observed proportions are NOT due to chance

Metaphor - Trial as Hypothesis Test

  • Hypothesis testing is like a trial in a court of law
  • H0: defendant is innocent, H1: defendant is guilty
  • Evidence is collected and presented, the Evidence must be judged - “could the data plausibly have happened by chance if the null hypothesis were true?"
  • If the data is very unlikely to have occurred by chance, the evidence is "beyond reasonable doubt"
  • Decision making depends on how unlikely is unlikely?
  • If the evidence is not strong enough to reject the presumption of innocence, the jury returns a verdict of “not guilty”, although the defendant may still be innocent
  • In stats, we 'fail to reject' the null hypothesis which means the null hypothesis is not declared to be true (because we do not know) and we do not 'accept' the null hypothesis

Recap

  • Hypothesis testing starts by proposing a null hypothesis ( that represents the 'status quo'
  • The null hypothesis is tested against a research hypothesis, hypothesis testing is then conducted under assumption that the null hypothesis is true through randomization, or via theoretical methods
  • If the test results show that the data doesn't provide convincing evidence for the research hypothesis, then we retain the null hypothesis, otherwise, the null hypothesis is rejected in favour of the research hypothesis

Complementarity

  • The research hypothesis suggests an outcome, such as "One glass of red wine a day improves cardiovascular health"
  • What null hypothesis would be the complementary to this?
    • Red wine does nothing for cardiovascular health
    • Red wine makes cardiovascular health worse
    • Both of these outcomes together
  • In order for the logic of NHST to work, the sample space must be complete, where Without a complete sample space, we cannot accurately compute the probabilities

Randomization Methods

  • Bootstrap confidence intervals have been calculated
  • New methods can test this specific hypothesis with Randomization test, Permutation test, Monte Carlo simulation, or Bootstrap
  • Randomization test: redo randomization assignment performed in the experiment K times
  • Permutation test: check all possible ways of random assignment
  • Monte Carlo simulation: generate new data under a model for the null hypothesis
  • Bootstrap: use observed sample as a model for the effect in the population and draw K new samples of size N

Randomization Test - Gender Discrimination Example

  • under Ho, the gender on the CVs has no influence on the promotion decision.
  • Assuming the Ho is correct, a distribution of likely results can be found by simply redoing the random assignment and then recomputing the test statistic
  • Under Ho, the independent variable has no influence on the dependent variable – this makes the random assignment exchangeable!
  • After shuffling, the new experimental result is assessed assuming that Ho is true against the statistic that P(observed statistic ≥ x | HO)
  • If the P-value of 2/100 (p=.02) indicates that observed results could have occurred by chance with a probability of 2%, even if there is no discrimination against women

Hypothesis Testing Conclusions

  • Hypothesis testing involved a judgment of guilt, and if their is a fail to reject (guilty or not guilty decision)
  • A determination of being beyond "reasonable doubt is weighed in relation to the the threshold for evidence in statistics.
  • 5% (p=.05) is considered sufficient evidence to reject the null hypothesis in psychology, this is arbitrary!!

Decision Errors in Hypothesis Testing

  • Hypothesis tests are are not flawless, innocents can be wrongly convicted as well as the the guilty can be set free
  • Similar errors occur in statistical hypothesis tests also, where the difference is that we have the tools to manage these
  • Two competing hypotheses: Ho and H1 are made
  • Type I Error is rejecting the null hypothesis when Ho is true, the probability of this action is defined as P(outcome > threshold | Ho)
  • Type II Error is failing to reject the null hypothesis when H₁ is true, and the p-value tells nothing about this likelihood
  • The thresholds can be set with the desired change of evidence

Effect Sizes and Errors in Hypothesis Testing

  • Effect Size is being the strength of the effect
  • As the strength of an effect increases, the likelihood of making a Type I Error goes down
  • As relationships between variables are strengthened, the statistical tests become more likely to reach the critical threshold
  • Strong effect => less likelihood of false positive

Statistical Power, Decisions and Significance

  • Statistical “power” is the likelihood of finding an effect if one exists.
  • The power of a study/experiment is its probability of rejecting HO if HO is false
  • As power increases, the likelihood of making a Type II Error declines and Lower power gives a greater likelihood of a false negative
  • Designed to balance the probability of Type I error with probability of Type II error
  • In science, there is a cost of not approving an effective treatment and Claiming that the status quo is acceptable when in fact discrimination is occurring
  • A significance level of 5% is a traditional and frequently used threshold

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Explore Null Hypothesis Significance Testing (NHST), a statistical method for understanding experimental results. Learn about decision-making in statistics, including descriptive statistics, distributions, and probability. Also, trace the history of P-value calculations back to the 1700s with John Arbuthnot and Pierre-Simon Laplace.

Use Quizgecko on...
Browser
Browser