Understanding P-values and Hypothesis Testing

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Explain why the interpretation of a p-value as "the probability that the null hypothesis is true" is incorrect from a frequentist perspective.

The frequentist approach does not allow assigning probabilities to the null hypothesis; it's either true or not. It cannot have a probabilistic chance of being true.

Even if a Bayesian approach is used (which does allow probabilities to be assigned to hypotheses), why doesn't the p-value represent the probability that the null hypothesis is true?

The p-value does not represent the probability that the null hypothesis is true because that interpretation is inconsistent with the mathematical calculation of the p-value.

In hypothesis testing, what is the primary piece of information that must always be reported, regardless of the specific test being conducted, and why is it so important?

The p-value because it directly addresses the likelihood of observing the data (or more extreme data) if the null hypothesis were true.

Explain the convenience that p-values offer in hypothesis testing, allowing researchers to avoid pre-specifying an alpha level.

<p>P-values allow researchers to determine the significance of their results without needing to set an alpha level beforehand.</p> Signup and view all the answers

Briefly explain the contention regarding whether to report the exact p-value obtained from a hypothesis test or to simply state whether p < α for a predetermined significance level. What are the benefits of reporting the exact p-value?

<p>Some prefer reporting only whether p < α, while others advocate for providing the exact p-value. Reporting the exact p-value provides more information about the statistical evidence.</p> Signup and view all the answers

Flashcards

P-value Misinterpretation

The incorrect interpretation of a p-value as the probability that the null hypothesis is true.

Frequentist Hypothesis Testing

A frequentist tool where hypotheses are either true or false, probabilities can't be assigned to them.

Reporting Hypothesis Test Results

Always report the p-value and whether the outcome was significant.

Significance Level (alpha)

The threshold to determine if a result is statistically significant.

Signup and view all the flashcards

P-value Convenience

Computing a p-value means you don’t have to specify an alpha level to run the test.

Signup and view all the flashcards

Study Notes

A common but incorrect interpretation of the p-value is "the probability that the null hypothesis is true."
This is wrong for two reasons:

Frequentist Approach

The frequentist approach doesn't allow assigning probabilities to the null hypothesis; it's either true or not.

Bayesian Approach

Even in the Bayesian approach, the p-value doesn't correspond to the probability that the null is true and is inconsistent with calculations.
You should never interpret a p-value this way.

Reporting Hypothesis Results

Several pieces of information usually need reporting, varying from test to test.
A particularly detailed example of reporting can be seen in Section 12.1.9.
Regardless of the test, you must always report something about the p-value and whether the outcome was significant.
Exactly how to do this is subject to some disagreements.

The Issue of Exact vs. Inequality Reporting

P-values are convenient for direct interpretation, meaning an alpha level does not need specification to run the test.
Softening decision-making has advantages, removing treating p = .051 differently from p = .049.
Flexibility of p-values is both an advantage and a disadvantage, giving the researcher too much freedom.
Researchers could change their mind about error tolerance after seeing the data.
Temptation may arise to manipulate results, which could cause biased data reading.
Specifying the alpha value in advance ensures honesty.

Two Possible Solutions

It is rare for a researcher to specify a single alpha level ahead of time
Conventionally, researchers rely on levels: 0.05, 0.01 and 0.001
Indicating significance levels allows rejecting the null
Since these levels are fixed, people cannot choose alpha values
Some prefer reporting exact p-values to let readers interpret p = .06 values
In practice, "p < .001" is common for small p-values, as software may not print them
Human minds struggle to process numbers like .0000000001
In practice, the alternative hypothesis is a near certainty
Statistical tests rely on simplifications, an approximation, and assumptions
A stronger analysis of the study with a confidence value greater than .001 should be used.

Hypothesis Tests

A binomial test named binom.test() exists as an R function
An R command can run the test in practice

The Power Function

The major design principle in hypothesis testing tries to control the Type I error rate.
Fixing α = .05 attempts to ensure that only 5% of true null hypotheses are incorrectly rejected.
Type II errors should not be ignored, minimizing β, the Type II error rate.
Maximizing the power of the test, defined as 1 - β, is a secondary goal.

Defining the Error

A Type II error means accepting a false null hypothesis.
Calculating a single β may not always be achieved
The alternative hypothesis corresponds to many values
Calculating rejection is more probable when the null is incorrect
The power of a test depends on the true value of θ

Effect Size and George Box

If the world's true state differs from the null hypothesis, power is high and low if similar
Measuring how similar is quantifying effect size with multiple definitions

Cohen and Ellis

To capture how big the difference is between true population parameters and null assumptions
If $latex \theta_0 = 0.5$ marks the null hypothesis, an effect size is $latex \theta - \theta_0$
Reporting effect size is standard and should accompany hypothesis tests
A hypothesis test is whether an observed result is "real"
A size tells if you should care
Hypothesis test shows the observed effect is real
The effect size of the test indicates whether or not to care

Maximizing Power

Scientists aim to maximize the power of their experiments; an experiment must work correctly
Clever design or the sample size affects the power to reach its maximum potential
It would be useful to know how much power you're likely to have to avoid low success.

Power Analysis

Power Analysis involves guessing an estimate of what sample size needs to be
Helpful: since you can tell if you can make that experiment sucessfully
Arguments state that power analysis is a required component in designing the experiment
Writing a grant application can at times lead to overthinking and excess power analysis with the only objective being the writing of the grant application itself.
It is important to determine the effect size that can be calculated for any particular setting.

Issues in NHST

Orthodox framework for null hypothesis significance testing (NHST) is used
Dominates inferential statistics since the 20th century meaning the process is very consistent
Essential to understand, though flawed because the processes are difficult at times

Mashup History

NHST combines Fisher and Neyman's hypothesis testing approaches
Fisher: Determine if the null hypothesis is inconsistent with data for safe rejection, without any alternatives
Neyman: Hypothesis testing is a guide to action and requires specified alternatives to assess the test's power
The mishmash result of this, for instance, is that the p value relates to extreme data
There remain value controversies when taking this approach

Why the $latex H_0$ Statistic is Terrible

The p-value should not be confused as the $latex H_0$ being true because what they actually state is often not true
Null's should not be confused with not existing

Traps During Implementation

As per the orthodox approach of null hypothesis statistic testing: it should have high drawbacks and can be misleading
It is reasonable to not assume stupidity, or an inability to work with statistics, but avoid using the $latex H_0$ value
Without checking and being consistent against some data it should not be stated

Example Problems

Analyzing the data to ensure females and males are tested with the correct $latex H_0$ is essential
Correct data analysis is something one must test or the given response will offer no value
It is essential to know if each test answer is possible and makes sense in the real world.

Quick Notes

A recap of the notes:
Research hypotheses and statistical hypotheses. Null and alternative hypotheses. (Section 11.1).
Type 1 and Type 2 errors (Section 11.2)
Test statistics and sampling distributions (Section 11.3)
Hypothesis testing as a decision making process (Section 11.4)
p-values as "soft" decisions (Section 11.5) Reporting the results of a hypothesis test (Section 11.6)
Effect size and power (Section 11.8) A few issues to consider regarding hypothesis testing (Section 11.9)
Chapter 17 reintroduces the theories of statistical test against a few $H_0$ examples
The idea is to explain why something is being testing and what is attempting to be displayed in the data

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Description

P-values are often misinterpreted as the probability the null hypothesis is true. From a frequentist view, hypotheses are fixed, and p-values reflect the data's compatibility with the null. Even Bayesians don't see p-values as P(null true). Reporting the p-value is crucial as it conveys the evidence against the null hypothesis.