Regression Discontinuity Design

Podcast

Listen to an AI-generated conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

In a Sharp Regression Discontinuity (RD) design, why is the lack of common support a critical feature?

It necessitates extrapolation towards the cutoff point to compare control and treatment units, fundamentally shaping the RD analysis. (correct)
It ensures that the potential outcomes are identical for all units, simplifying the analysis.
It allows for direct comparison of control and treatment units across the entire range of the running variable.
It guarantees that units in the control and treatment groups have the same value of the running variable, ensuring a balanced comparison.

In Regression Discontinuity designs, if individuals cannot perfectly sort themselves around the cutoff, what does the discontinuous change in the probability of treatment allow us to estimate?

The local causal effect of the treatment around the cutoff. (correct)
The global causal effect of the treatment.
The average treatment effect across the entire population.
The administrative burden of treatment assignment.

What is the fundamental challenge in estimating the average treatment effect at a specific value of the score, E[Yi(1)|Xi =x] − E[Yi(0)|Xi =x], in a Regression Discontinuity design?

The regression curves are always parallel, making it impossible to determine any difference.
The location of the cutoff is unknown.
The value of x is always zero.
Both potential outcomes (treated and untreated) are never observed for the same individual at the specific value of x. (correct)

In Regression Discontinuity, how are units with scores just below the cutoff utilized in estimating the local causal effect?

They serve as a control group for units with scores just above the cutoff. (D)

Signup and view all the answers

In the context of Regression Discontinuity, the potential outcomes for individuals with high and low scores relative to the cutoff are likely to differ based on what pre-treatment characteristics?

Ability and motivation. (B)

Signup and view all the answers

In regression discontinuity design (RDD), what is the primary trade-off when choosing a bandwidth around the cutoff point?

Narrower bandwidths reduce bias but increase variance; wider bandwidths reduce variance but may increase bias. (A)

Signup and view all the answers

What is the key difference between sharp and fuzzy regression discontinuity designs (RDD)?

In sharp RDD, the probability of treatment changes discontinuously from 0 to 1 at the cutoff, while in fuzzy RDD, the probability of treatment changes discontinuously by an amount less than 1. (A)

Signup and view all the answers

In the context of fuzzy Regression Discontinuity Design (RDD), what role does being above the cutoff serve?

It serves as an instrument for receiving the treatment because it increases the probability of treatment, though not necessarily to 1. (D)

Signup and view all the answers

Why is it important to check the robustness of results to different modeling and data choices in a Regression Discontinuity Design (RDD)?

To address the limitations of RDD and ensure that results are not driven by specific, arbitrary choices. (A)

Signup and view all the answers

Consider a scenario where admission to a specialized school is determined by a score on an entrance exam. School A admits students scoring above 90, School B admits students above 85. What RDD challenge does this situation exemplify?

There are multiple cutoffs, leading to potential complications in identifying treatment effects. (A)

Signup and view all the answers

In the context of regression discontinuity design (RDD), what does the term 'bandwidth' refer to?

The range of values around the cutoff point of the running variable that are included in the local regression. (A)

Signup and view all the answers

In a fuzzy RDD, which of the following statistical techniques is commonly employed to estimate the treatment effect, acknowledging the imperfect compliance?

Instrumental Variables (IV) regression, where the assignment based on the cutoff is used as an instrument for actual treatment. (C)

Signup and view all the answers

Suppose a researcher is using RDD to study the effect of a scholarship on college enrollment. The cutoff is a minimum GPA of 3.5. The researcher finds that students with a GPA of 3.49 have a 60% enrollment rate, while those with a GPA of 3.51 have an 80% enrollment rate. What concern is MOST relevant?

Students may be manipulating their GPA to be above the cutoff, which could bias the results. (C)

Signup and view all the answers

A researcher wants to study the effect of a new job training program on employment rates. Eligibility for the program is determined by an index score; individuals scoring above 50 are eligible. However, not everyone eligible enrolls in the program. Assuming the researcher uses a Regression Discontinuity Design (RDD), which of the following is the most critical assumption for valid causal inference?

Individuals near the cutoff are similar in all respects except for their eligibility status and subsequent program enrollment. (C)

Signup and view all the answers

In Regression Discontinuity Design (RDD), what key assumption must hold true to ensure the validity of causal inferences?

Units do not have the ability to precisely manipulate their own value of the running variable. (A)

Signup and view all the answers

When testing the assumptions of Regression Discontinuity Design (RDD), what does plotting the histogram of the running variable help to identify?

Whether observations are evenly distributed around the threshold. (A)

Signup and view all the answers

In the context of Regression Discontinuity Design (RDD), what is the purpose of checking if observations just above and just below the threshold are similar with respect to other observables?

To validate the assumption that assignment to treatment is essentially random at the cutoff. (A)

Signup and view all the answers

In Regression Discontinuity Design, the treatment effect at the cutoff is estimated as $E[Y_i|X_i \geq c] - E[Y_i|X_i < c]$, where $c$ is the cutoff. What does this expression represent?

The difference in expected outcomes for units just above and just below the cutoff. (A)

Signup and view all the answers

What additional data would be most useful to strengthen the conclusion that a jump in death rates after age 21 in the US is due to increased alcohol access and consumption?

Data on alcohol consumption by age and causes of death by age. (D)

Signup and view all the answers

When using binned scatter plots in RDD, what is the primary reason for fitting regression lines separately on each side of the cutoff?

To allow for different relationships between the running variable and the outcome on either side of the threshold. (C)

Signup and view all the answers

What is the potential consequence of units being able to precisely manipulate their value of the running variable in Regression Discontinuity Design (RDD)?

It biases the estimation of the treatment effect at the cutoff. (B)

Signup and view all the answers

In a Regression Discontinuity Design (RDD) studying the effect of a policy change at age 21 on health outcomes, what is the most plausible threat to the validity of the design?

Individuals accurately sorting themselves around the age 21 threshold to gain the policy benefit. (C)

Signup and view all the answers

How does Regression Discontinuity Design (RDD) address the challenge of confounding variables when estimating treatment effects?

By assuming that treatment assignment is essentially random in a narrow window around the cutoff. (C)

Signup and view all the answers

In Regression Discontinuity Design (RDD), what critical assumption ensures the validity of causal inference around the cutoff?

Units on either side of the cutoff are comparable in all relevant aspects, except for their treatment status. (C)

Signup and view all the answers

Within the local randomization framework of Regression Discontinuity Design (RDD), how are units near the cutoff treated?

They are considered to be randomly assigned to treatment or control groups. (D)

Signup and view all the answers

What does the continuity-based framework in Regression Discontinuity Design (RDD) assume regarding potential outcomes near the cutoff?

Potential outcomes are continuous, meaning average potential outcomes change smoothly near the cutoff. (A)

Signup and view all the answers

Considering the example of the minimum legal drinking age in the US, what serves as the running variable in a Regression Discontinuity Design (RDD) examining the effect of alcohol access on mortality?

Age. (A)

Signup and view all the answers

In the context of Regression Discontinuity Design (RDD), what characterizes the relationship between the running variable ('a') and the treatment status ('Da') in the legal drinking age example?

Da is a deterministic and discontinuous function of 'a', changing abruptly at the cutoff. (C)

Signup and view all the answers

How does Regression Discontinuity Design (RDD) address the issue of treatment assignment not being random?

By focusing on the discontinuity in treatment assignment at the cutoff point. (A)

Signup and view all the answers

In Regression Discontinuity Design (RDD), what is the significance of observing a discontinuity in the outcome variable at the cutoff point?

It provides evidence of a causal effect of the treatment on the outcome. (B)

Signup and view all the answers

Flashcards

E[Y(1)|X]

The average potential outcome when treated, observed for those with high scores.

E[Y(0)|X]

The average potential outcome when NOT treated, observed for those with high scores.

Local Causal Effect

The effect of a treatment at the cutoff point in a Regression Discontinuity design. Control units with scores just below the cutoff are compared to treated units with scores just above it.

Lack of Common Support

In RD designs, units in control and treatment groups cannot have the same value of the running variable.

Signup and view all the flashcards

Extrapolation in RD

Estimating treatment effects at the cutoff by extending the observed relationships. We can't observe both treated and untreated states for the same value of X.

Signup and view all the flashcards

RDD Key Assumption

Units near the cutoff are similar in all aspects except treatment status.

Signup and view all the flashcards

RDD as Local Random Experiment

Units near cutoff are as if randomly assigned to treatment or control groups. (Local Randomization)

Signup and view all the flashcards

Continuity of Potential Outcomes Framework

Average potential outcomes change smoothly near the cutoff. (Continuity-Based Framework)

Signup and view all the flashcards

Minimum Legal Drinking Age Example

Legal drinking age at 21 in the US

Signup and view all the flashcards

Treatment (T) in Drinking Age Example

Legal access to alcohol.

Signup and view all the flashcards

Outcome (Y) in Drinking Age Example

Likelihood of death (or specific cause of death).

Signup and view all the flashcards

Discontinuity of Treatment Status

Treatment status abruptly changes at a specific point based on a running variable.

Signup and view all the flashcards

Sharp Cutoff

The value of the outcome variable changes sharply at a specific cutoff point of the running variable.

Signup and view all the flashcards

Treatment Effect at Cutoff

The difference in the outcome variable at the cutoff point, revealing the immediate impact of the treatment.

Signup and view all the flashcards

Treatment Effect Formula

Calculated as the difference between the regression line of the treated group (α1) and the control group (α0) at the cutoff point.

Signup and view all the flashcards

Binned Scatter Plots

Papers often aggregate individual level micro data into smaller bins to show relationships more clearly.

Signup and view all the flashcards

RDD Mechanisms

Data is often needed regarding alcohol consumption and causes of death by age to verify that the increase of death rates is actually caused by increased alcohol consumption.

Signup and view all the flashcards

RDD Assumption

Individuals should not be able to precisely control or manipulate their value of the running variable around the cutoff.

Signup and view all the flashcards

Testing for Sorting

Checking whether the observations are distributed evenly around the cutoff.

Signup and view all the flashcards

Checking Observables

Verifying if observable characteristics of individuals just above and below the threshold are similar.

Signup and view all the flashcards

Histogram of Running Variable

A graph displaying how observations are grouped across the range of the running variable.

Signup and view all the flashcards

Impact of Limited Data in RDD

Using fewer data points results in a larger variance or standard error in the estimate, indicating more noise.

Signup and view all the flashcards

Bandwidth in RDD

The segment of observations around the cutoff used for estimating the local linear regression.

Signup and view all the flashcards

Local Linear Regression (Below Cutoff)

Estimates the expected outcome (e.g., death rate) given age, specifically for those slightly younger than the cutoff (e.g., between 19 and 21). It models the relationship as a linear function.

Signup and view all the flashcards

Local Linear Regression (Above Cutoff)

Estimates the expected outcome (e.g., death rate) given age, specifically for those slightly older than the cutoff (e.g., between 21 and 23). It models the relationship as a linear function.

Signup and view all the flashcards

Robustness in RDD

Verifying that the RDD results are consistent and reliable under different analysis choices.

Signup and view all the flashcards

Bandwidth Variation

Varying the range of data points considered around the cutoff to ensure the results are not overly sensitive to the specific bandwidth chosen.

Signup and view all the flashcards

Specification Variation

Testing different equations or models that describe the relationship between the running variable and the outcome to ensure the findings are not model-dependent.

Signup and view all the flashcards

Fuzzy Regression Discontinuity (Fuzzy RD)

An RDD where the cutoff leads to a change in the probability or intensity of treatment, but not a complete switch.

Signup and view all the flashcards

Discontinuous Treatment Probability

The probability of receiving treatment changes discontinuously at the cutoff.

Signup and view all the flashcards

Study Notes

Regression discontinuity design (RDD) is an approach to estimate the causal effect of treatment

Observational Alternatives to Experiments

Selection on observables involves differing treatment and control groups based on observable characteristics
Selection on unobservables involves differences based on unobservable characteristics
Exogenous variables induce variation in treatment through instrumental variables
Regression discontinuity designs have a known selection mechanism
Treatment and controls are observed before and after treatment in difference-in-differences

RDD Basics

RDD was introduced by Thistlethwaithe and Cambell in 1960
Used to study merit awards impact where awards are given if test score exceeds a cutoff
RDD reappeared and formalized in economics in late 90s
Has proven to be a powerful causal tool in empirical economics and other disciplines
Disciplines include: political science, education, epidemiology, criminology
RDD has strong internal validity but is very data intensive
There needs to be a lot of observations near the cutoff

Causality and RDD

RDD isolates the causal effect of treatment where individuals become treated after crossing an arbitrary cutoff
RDD has three fundamental components: running variable, cutoff, and treatment
Individuals become treated after crossing a cutoff in the running variable
Sharp RDD involves treatment received with probability zero below the cutoff and one above it
Fuzzy RDD involves the probability of receiving treatment increasing discontinuously at the threshold
RDD assumes potential outcomes evolve smoothly across the cutoff
If there is no manipulation of the running variable, observations just below the threshold are similar to those just above, forming a control group

Running Variables and Cutoffs

Examples of running variables and cutoffs: entry to high school/university depending on test scores/GPA
Eligibility to vote or buy alcohol after a certain age
Access to services based on residential location and catchment areas
Candidate vote share determining election status (treatment)
Speeding fines in Finland based on exceeding the speed limit by more than 20 km

Sharp RDD

Test scores determine admission to a course or school as an example
The running variable is the test score
A threshold score is required for passing the exam, like 50/100, which is the cutoff
Treatment involves attending the course or school

Causal Inference Problem in RDD

The fundamental problem of causal inference shows because it is only possible to observe the outcome under control for units below the cutoff
Additionally, can only observe the outcome under treatment for units above the cutoff
As an example: test scores determine entry to education
Two people with the same score and a score above the cutoff, the other rejected so there is no common support in scores between the accepted and rejected group. Potential outcomes, like ability and motivation, are different between those who score high and low

Local Causal Effect

If units cannot perfectly "sort" around the cutoff, the discontinuous change in treatment probability can still be used to understand the local causal effect
Units with scores barely below the cutoff can be used as a control group
Units with scores barely above the cutoff can be the treatment

Key Point and Assumption of RDD

Units with similar score values on opposite sides of the cutoff are comparable in all aspects except treatment status
In a small neighborhood around the cutoff conditions mimic a randomized experiment with local randomization
There is a continuity of average potential outcomes near the cutoff with continuity-based framework

Minimum Legal Drinking Age Example

At the age of 21 is the legal drinking age in the U.S
T is legal access to alcohol and Y is the likelihood of dying (and specific cause)

Alcohol and Deaths

The treatment status is a deterministic function of age so age determines status
Treatment status is a discontinuous function of age where after the cutoff is reached, remains unchanged.

Testing RDD Assumptions

RDD assumes units cannot precisely manipulate their running variable value
This can be tested checking for sorting, looking at if observations just above and below the threshold are similar regarding observables
Placebo tests also confirm assumption, using noneffective cutoffs or other ages in drinking example.

Sorting or Manipulation of the Running Variables

RDD assumes units do not have ability to manipulate the value
If treatment is beneficial, units would want to receive the treatment and sort on the right side of the cutoff
With no manipulation, treated observations above the cutoff the control observations below it
Test: Use histogram of the running variable to see number of observations near cutoff similar
A formal statistical density test can also be used (McCrary test)

Test of Observable Variables

One of the most important RDD falsification tests examines if treated units are similar to control units near the cutoff in terms of observable characteristics
If units lack the ability to alter the running variable, there should be no systematic differences
All predetermined variables should be analyzed using RDD as the outcome of interest

Placebo Tests

Placebo test 1 replaces the true cutoff value with a fake cutoff in the running variable
A value at which the treatment status does not really change, performing estimation and inference using this “fake” cutoff. A significant treatment effect should occur only at the true value and not elsewhere.
Placebo test 2 runs placebos at the true cutoff but replaces the outcome Y with other outcomes that should not be affected by the treatment

Technical Issues

RDD is to comparing means for those just above to those just below the cut-off
Often, there isn't enough data to estimate the treatment effect simply by comparing means at the cutoff
This requires the choice of bandwith, which is a balance between accurate estimation and having enough data points

How to Address Limitations - Robustness

In an a RDD paper, must show robustness to different modelling, data choices, and demonstrate results are similar with different bandwidths around cutoff

Fuzzy RD (Regression Discontinuity)

When passing the cutoff creates a jump in treatment probabilities or treatment intensity
Rather than switching the treatment on/off completely, here the resulting RD design is fuzzy
Different schools have different cut-offs. Suppose there are 3 schools, and school 1 has the highest cut-off for intake. Examples include but are not inclusive to exam school or charter school.
Scoring > cut-off for school 1: this increases your probability of attending an exam school, but not to p=1

Sarvimäki, Uusitalo & Jäntti (2021)

After WW II approx 11% of the Finnish population was moved from ceded Soviet union.
Re-homed to remaining parts of Finland.
Displaced farmers approximately make up 50% of those displaced. They were given land and assistance to establish new farms in areas with soil and climate as origin regions.
Former neighbours got resettled close to each other to preserve social networks.
Since 1948 the displaced would cease to be issued subsidies, and instead were at liberty to sell land and move from the area.

Meyersson (2014)

Uses a regression discontinuity design to compare municipalities where the Islamic party barely won or [barely] lost elections
The despite negative raw correlations, the RD results reveal that, over six years, Islamic rule increased female secular high school education

Speeding Tickets

In Finland, speeding tickets become income-dependent if the driver's speed exceeds the speeding limit by more than 20 km/h
This Leads to a substantial jump in the size of the fine.
No bunching below the threshold, but smooth speed distributions are around the speed threshold.
Discontinuity may assist determining, estimating the the effect of punishment with fine size relative likelihood to re-offend.
To compare similar individuals who drove 19 and 21 km too fast.

RDD Recap

If a rule determines treatment due to a hard to predict cut-off, one can use the rule to estimate causal effect without RCT.
The necessary criteria using RDD: the running variable, treatment, and cutoff and treatment assignments.
Discontinuously change as function running the variable and the cutoff.
Units just below and above the cut-off are very similar and comparable.
Tests for Validity and Design such as Density tests, Test for balance covariate or and Test of placebo.
The challenges requires a lot of observations that are in range of cut off. Or and cannot extrapolate results for to units that are far from the cut- off using Local causal effects!

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Regression Discontinuity Design

Choose a study mode

Podcast

Questions and Answers

In a Sharp Regression Discontinuity (RD) design, why is the lack of common support a critical feature?

In Regression Discontinuity designs, if individuals cannot perfectly sort themselves around the cutoff, what does the discontinuous change in the probability of treatment allow us to estimate?

What is the fundamental challenge in estimating the average treatment effect at a specific value of the score, E[Yi(1)|Xi =x] − E[Yi(0)|Xi =x], in a Regression Discontinuity design?

In Regression Discontinuity, how are units with scores just below the cutoff utilized in estimating the local causal effect?

In the context of Regression Discontinuity, the potential outcomes for individuals with high and low scores relative to the cutoff are likely to differ based on what pre-treatment characteristics?

In regression discontinuity design (RDD), what is the primary trade-off when choosing a bandwidth around the cutoff point?

What is the key difference between sharp and fuzzy regression discontinuity designs (RDD)?

In the context of fuzzy Regression Discontinuity Design (RDD), what role does being above the cutoff serve?

Why is it important to check the robustness of results to different modeling and data choices in a Regression Discontinuity Design (RDD)?

Consider a scenario where admission to a specialized school is determined by a score on an entrance exam. School A admits students scoring above 90, School B admits students above 85. What RDD challenge does this situation exemplify?

In the context of regression discontinuity design (RDD), what does the term 'bandwidth' refer to?

In a fuzzy RDD, which of the following statistical techniques is commonly employed to estimate the treatment effect, acknowledging the imperfect compliance?

In Regression Discontinuity Design (RDD), what key assumption must hold true to ensure the validity of causal inferences?

When testing the assumptions of Regression Discontinuity Design (RDD), what does plotting the histogram of the running variable help to identify?

In the context of Regression Discontinuity Design (RDD), what is the purpose of checking if observations just above and just below the threshold are similar with respect to other observables?

In Regression Discontinuity Design, the treatment effect at the cutoff is estimated as $E[Y_i|X_i \geq c] - E[Y_i|X_i < c]$, where $c$ is the cutoff. What does this expression represent?

What additional data would be most useful to strengthen the conclusion that a jump in death rates after age 21 in the US is due to increased alcohol access and consumption?

When using binned scatter plots in RDD, what is the primary reason for fitting regression lines separately on each side of the cutoff?

What is the potential consequence of units being able to precisely manipulate their value of the running variable in Regression Discontinuity Design (RDD)?

In a Regression Discontinuity Design (RDD) studying the effect of a policy change at age 21 on health outcomes, what is the most plausible threat to the validity of the design?

How does Regression Discontinuity Design (RDD) address the challenge of confounding variables when estimating treatment effects?

In Regression Discontinuity Design (RDD), what critical assumption ensures the validity of causal inference around the cutoff?

Within the local randomization framework of Regression Discontinuity Design (RDD), how are units near the cutoff treated?

What does the continuity-based framework in Regression Discontinuity Design (RDD) assume regarding potential outcomes near the cutoff?

Considering the example of the minimum legal drinking age in the US, what serves as the running variable in a Regression Discontinuity Design (RDD) examining the effect of alcohol access on mortality?

In the context of Regression Discontinuity Design (RDD), what characterizes the relationship between the running variable ('a') and the treatment status ('Da') in the legal drinking age example?

How does Regression Discontinuity Design (RDD) address the issue of treatment assignment not being random?

In Regression Discontinuity Design (RDD), what is the significance of observing a discontinuity in the outcome variable at the cutoff point?

Flashcards

E[Y(1)|X]

E[Y(0)|X]

Local Causal Effect

Lack of Common Support

Extrapolation in RD

RDD Key Assumption

RDD as Local Random Experiment

Continuity of Potential Outcomes Framework

Minimum Legal Drinking Age Example

Treatment (T) in Drinking Age Example

Outcome (Y) in Drinking Age Example

Discontinuity of Treatment Status

Sharp Cutoff

Treatment Effect at Cutoff

Treatment Effect Formula

Binned Scatter Plots

RDD Mechanisms

RDD Assumption

Testing for Sorting

Checking Observables

Histogram of Running Variable

Impact of Limited Data in RDD

Bandwidth in RDD

Local Linear Regression (Below Cutoff)

Local Linear Regression (Above Cutoff)

Robustness in RDD

Bandwidth Variation

Specification Variation

Fuzzy Regression Discontinuity (Fuzzy RD)

Discontinuous Treatment Probability

Study Notes

Observational Alternatives to Experiments

RDD Basics

Causality and RDD

Running Variables and Cutoffs

Sharp RDD

Causal Inference Problem in RDD

Local Causal Effect

Key Point and Assumption of RDD

Minimum Legal Drinking Age Example

Alcohol and Deaths

Testing RDD Assumptions

Sorting or Manipulation of the Running Variables

Test of Observable Variables

Placebo Tests

Technical Issues

How to Address Limitations - Robustness