Lecture 6

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Within the context of non-parametric Regression Discontinuity (RD) designs, consider the estimand represented by equation (11): $lim_{\delta \rightarrow 0} [E(y_i | x_0 < x_i < x_0 + \delta) - E(y_i | x_0 - \delta < x_i < x_0)] = E(y_{1i} - y_{0i} | x_i = x_0)$. If empirical analysis reveals a statistically significant difference in means across the discontinuity, but the underlying distributions of covariates exhibit marked dissimilarity even within infinitesimally small bandwidths around $x_0$, what is the most critical epistemological challenge this poses for causal inference?

  • Compromised internal validity stemming from potential selection bias at the threshold, suggesting that the observed discontinuity in outcomes may be attributable to pre-existing differences rather than the treatment. (correct)
  • Reduced statistical power due to the sparsity of data points in the immediate vicinity of $x_0$, implying that while the causal interpretation remains valid, the precision of the estimate is fundamentally limited.
  • Violation of the conditional independence assumption, rendering the RD estimate solely correlational and devoid of causal interpretation irrespective of bandwidth refinement.
  • Inflated Type I error rates due to heteroskedasticity induced by the discontinuous treatment assignment, necessitating robust variance estimation but not invalidating the causal interpretation.

In the context of Lee's (2008) regression discontinuity design examining electoral advantage, assume a scenario where the probability of re-election exhibits a statistically significant discontinuity at the threshold of prior electoral victory. However, further investigation using high-resolution precinct-level data reveals systematic strategic manipulation of vote counts in close races, specifically around the win/loss threshold. How does this finding most critically challenge the validity of the non-parametric RD estimate?

  • It primarily introduces measurement error in the running variable (vote margin), attenuating the estimated treatment effect towards zero and reducing statistical power without fundamentally biasing the causal inference.
  • It generates a violation of the 'as-if random' assignment assumption at the threshold, as the assignment mechanism is no longer exogenous but influenced by strategic behavior correlated with potential outcomes. (correct)
  • It primarily affects the external validity of the RD estimate, limiting the generalizability of findings to electoral contexts characterized by the absence of strategic manipulation but not compromising internal validity within the studied context.
  • It necessitates the implementation of higher-order polynomial controls in a parametric RD framework to adequately model the non-linear relationship between vote margin and re-election probability, thereby restoring the validity of the causal estimate.

Consider the inherent trade-off in non-parametric Regression Discontinuity (RD) estimation between bias and variance as the bandwidth ($\delta$) is manipulated. If a researcher opts to substantially increase $\delta$ to mitigate concerns regarding data sparsity and enhance statistical power, what is the most pertinent econometric consequence concerning the validity of the RD estimate?

  • Enhanced robustness of the RD estimator to manipulation of the running variable, as a wider bandwidth dilutes the impact of localized strategic behavior around the threshold and strengthens the 'as-if random' assignment assumption.
  • Reduced precision of the RD estimator due to the inclusion of more heterogeneous observations, which can be effectively addressed by incorporating flexible non-parametric regression techniques within the expanded bandwidth.
  • Introduction of bias into the RD estimator as the linearity assumption within the bandwidth becomes increasingly untenable, and comparability of observations across the threshold diminishes with expanding bandwidth. (correct)
  • Increased variance of the RD estimator due to the inclusion of observations further from the threshold, necessitating the use of more robust standard error estimators but not introducing systematic bias.

In transitioning from non-parametric to parametric Regression Discontinuity (RD) approaches to address data limitations, particularly when employing polynomial regression, what fundamental assumption regarding the underlying conditional expectation function must be critically evaluated to ensure the validity of the parametric RD estimate?

<p>The conditional expectation function must be sufficiently smooth and well-approximated by a low-order polynomial in the vicinity of the cutoff, such that the parametric specification adequately captures the local relationship between the running variable and the outcome. (C)</p> Signup and view all the answers

Consider the Dutch tax incentive example, where employers receive deductions for training workers above age 40. If a researcher employs a Regression Discontinuity Design to assess the causal impact of this incentive on training incidence, and discovers a significant discontinuity precisely at age 40, yet also finds compelling evidence of strategic hiring practices where firms preferentially hire slightly younger workers to circumvent the incentive eligibility threshold, what is the most critical consequence for the RD analysis?

<p>The RD estimate will be biased upwards, overestimating the true effect of the incentive, as the strategic hiring behavior artificially inflates the apparent impact of the incentive by concentrating training within the eligible age range. (A)</p> Signup and view all the answers

Consider a scenario where a governmental policy mandates a significant subsidy for renewable energy projects if and only if a project proposal scores above a pre-determined environmental impact threshold on a standardized assessment. In the context of Regression Discontinuity Design, what critical assumption must be stringently verified to ensure the internal validity of causal inference regarding the subsidy's effect on project viability?

<p>Potential confounding variables, such as technological readiness and access to capital, must be smoothly continuous functions of the environmental impact score at the threshold. (D)</p> Signup and view all the answers

In a sharp Regression Discontinuity (RD) design examining the effect of a scholarship award based on a standardized test score threshold, a researcher observes a statistically significant discontinuity in college enrollment rates at the threshold. However, prior to the test, students were offered intensive, but differentially effective, preparatory tutoring based on their predicted scores. Which econometric challenge poses the most substantial threat to the causal interpretation of the observed discontinuity?

<p>Endogenous manipulation of the forcing variable, specifically strategic effort exertion to surpass the score threshold due to the anticipation of tutoring effectiveness. (B)</p> Signup and view all the answers

Consider a Fuzzy Regression Discontinuity (FRD) design where treatment assignment is imperfectly predicted by crossing a threshold on a continuous forcing variable. Identification in FRD relies on which core econometric principle to estimate the Local Average Treatment Effect (LATE)?

<p>Instrumental Variables (IV) identification, where threshold crossing serves as an instrument for actual treatment receipt. (B)</p> Signup and view all the answers

In the context of non-parametric Regression Discontinuity, what is the primary rationale for employing local linear regression as opposed to global polynomial regression when estimating treatment effects at the threshold?

<p>Local linear regression provides consistent estimates of the conditional mean function specifically at the discontinuity point, minimizing bias from functional form misspecification away from the threshold. (A)</p> Signup and view all the answers

When assessing the validity of a Regression Discontinuity design, researchers often conduct 'placebo' or 'falsification' tests. Which of the following best exemplifies a robust placebo test in a sharp RD context?

<p>Examining for discontinuities in pre-treatment outcome variables at the treatment threshold, under the premise that no discontinuity should exist if the RD assumptions hold. (A)</p> Signup and view all the answers

In a scenario employing Regression Discontinuity to evaluate the impact of unionization on firm productivity, the threshold for mandatory employer recognition of a union is achieving 50% + 1 of worker votes in a secret ballot election. However, firms anticipating a close election outcome may engage in preemptive actions that affect worker voting behavior. Which potential bias is LEAST likely to be mitigated by a well-executed Regression Discontinuity design in this context?

<p>Manipulation of the forcing variable (vote share) by firms strategically influencing worker turnout or voting intentions around the threshold. (D)</p> Signup and view all the answers

When graphically representing Regression Discontinuity results, researchers often employ scatter plots with superimposed regression lines on either side of the threshold. Beyond visually inspecting for a discontinuity, what is the most critical diagnostic information conveyed by these graphical representations for assessing RD validity?

<p>The visual confirmation of similar functional forms and trends in the outcome variable on both sides of the threshold, <em>except</em> for the discrete jump at the threshold itself. (C)</p> Signup and view all the answers

In the context of Regression Discontinuity Design (RDD), what is the most critical methodological justification for attributing causality when analyzing the effect of a treatment $D_i$ based on a forcing variable $x_i$ and threshold $x_0$?

<p>The assertion that individuals or units near the threshold $x_0$, despite differing in treatment status $D_i$, are statistically identical in expectation in the absence of treatment, effectively mimicking random assignment. (D)</p> Signup and view all the answers

Consider a scenario where a municipality implements a stringent environmental regulation ($D_i=1$) if the particulate matter concentration ($x_i$) exceeds 75 micrograms per cubic meter ($x_0 = 75$). If standard regression analysis demonstrates a spurious negative correlation between regulation and industrial output, what fundamental econometric problem does the Regression Discontinuity Design (RDD) primarily aim to mitigate in this context?

<p>Omitted variable bias arising from unobserved confounders that simultaneously influence both particulate matter concentration and industrial output. (C)</p> Signup and view all the answers

In a sharp Regression Discontinuity Design (RDD), the treatment assignment rule is precisely defined as $D_i = 1[x_i > x_0]$. Which of the following best characterizes the nature of the discontinuity exploited for causal inference in this design?

<p>A non-analytic point in the conditional expectation function $E[Y_i | x_i]$ at $x_0$, reflecting an abrupt change in the expected outcome due to the treatment. (A)</p> Signup and view all the answers

Lee (2008) employs Regression Discontinuity to assess incumbency advantage in US House elections, using the Democratic party's vote share in the previous election as the forcing variable. Assuming a sharp RDD and focusing solely on elections extremely close to the 50% vote share threshold, what specific causal estimand is Lee's design primarily intended to identify?

<p>The local average treatment effect (LATE) of incumbency for Democratic candidates whose prior election vote share was infinitesimally above or below the 50% threshold. (D)</p> Signup and view all the answers

In the context of the air pollution and housing value Regression Discontinuity example, consider a county that is just barely classified as 'non-attainment' under the Clean Air Act due to its particulate matter concentration being infinitesimally above the 75 micrograms/cubic meter threshold. What is the most pertinent counterfactual scenario implicitly invoked by the RDD approach to estimate the causal effect of 'non-attainment' status on home values?

<p>The hypothetical home values in the 'non-attainment' county if its particulate matter concentration had been infinitesimally below the 75 micrograms/cubic meter threshold, holding all else constant. (A)</p> Signup and view all the answers

Why does standard regression analysis demonstrably fail to establish a causal link in scenarios where Regression Discontinuity Design (RDD) is applicable, such as the relationship between air pollution and home values or incumbency and electoral success?

<p>The primary deficiency lies in the endogeneity of the treatment variable; factors influencing treatment status (e.g., pollution levels, candidate quality) are often correlated with the outcome variable (e.g., home values, electoral success), violating the exogeneity assumption. (A)</p> Signup and view all the answers

The validity of Regression Discontinuity Design (RDD) crucially depends on the assumption of 'local continuity' or 'smoothness'. In the context of the incumbency advantage study by Lee (2008), what specific condition must be met for this assumption to hold, thereby ensuring the internal validity of the RDD?

<p>The potential outcomes (election results in the absence of incumbency advantage) must be continuous functions of the prior Democratic vote share around the 50% threshold, except for the discontinuous effect of incumbency itself. (C)</p> Signup and view all the answers

Consider a hypothetical 'fuzzy' Regression Discontinuity Design where the assignment to environmental regulation ($D_i$) is probabilistically, but not deterministically, determined by exceeding the particulate matter threshold ($x_0 = 75$). Specifically, the probability of regulation increases sharply but not discretely at $x_0$. What is the most appropriate econometric approach to estimate the causal effect in this fuzzy RDD framework?

<p>Two-stage least squares (2SLS) instrumental variable estimation, using the indicator function $1[x_i &gt; x_0]$ as an instrument for the actual regulatory status $D_i$. (C)</p> Signup and view all the answers

A researcher aims to study the impact of receiving a prestigious research grant ($D_i$) on the publication output of academics. Grant eligibility is determined by a percentile rank ($x_i$) in a national assessment, with a threshold at the 90th percentile ($x_0 = 90$). However, concerns arise that academics close to the 90th percentile may strategically 'game' the system, for instance, by artificially inflating their assessment scores. How might such strategic behavior potentially invalidate the Regression Discontinuity Design (RDD) assumptions and compromise causal inference in this scenario?

<p>Gaming the system induces a violation of the local randomization assumption, as units just above and below the threshold are no longer comparable due to endogenous sorting around $x_0$. (D)</p> Signup and view all the answers

In the context of Lee's (2008) regression discontinuity design examining electoral success, what econometric challenge does the deterministic relationship between the forcing variable ($x_i$) and the treatment indicator ($D_i$) primarily address, and how does it refine causal inference compared to standard regression models?

<p>It leverages the discontinuity at the threshold ($x_0$) to isolate the causal effect of prior electoral victory, thus circumventing endogeneity issues that arise from assuming $D_i$ is independent of unobservables. (A)</p> Signup and view all the answers

Considering the 'placebo'-type test in Lee (2008), what specific threat to the validity of the regression discontinuity design is this test intended to address, and how would a failure of the test undermine the causal interpretation of the results?

<p>It aims to uncover potential manipulation of the forcing variable around the threshold, which, if systematic, would suggest that the observed discontinuity is not a genuine causal effect. (D)</p> Signup and view all the answers

In the context of estimating equation (3), $y_i = \alpha + f(x_i) + \rho D_i + \eta_i$, what are the potential consequences of misspecifying the functional form of $f(x_i)$ in a sharp regression discontinuity design, especially if the true relationship is highly nonlinear?

<p>The estimator of $\rho$ will be biased, as the misspecified $f(x_i)$ may not adequately capture the true relationship between $x_i$ and $y_i$, confounding the estimate of the treatment effect. (A)</p> Signup and view all the answers

Suppose you suspect the presence of unobserved heterogeneity that moderates the effect of $D_i$ on $y_i$ within Lee's (2008) framework. How could you extend the basic RD model to account for this, and what specific statistical technique would you employ to test for the presence of such heterogeneity?

<p>Include interaction terms between $D_i$ and pre-treatment covariates known to be correlated with the potential moderators, and use a Wald test to assess the significance of these interaction effects. (A)</p> Signup and view all the answers

In a scenario where the forcing variable ($x_i$) is discrete rather than continuous, what modifications to the standard sharp regression discontinuity design would be necessary to ensure valid causal inference, and how might this impact the interpretation of the estimated treatment effect?

<p>Apply a fuzzy regression discontinuity design, using the discrete forcing variable as an instrument for the treatment, and interpret the estimated treatment effect as a local average treatment effect (LATE). (D)</p> Signup and view all the answers

Assume that there are non-linear effects operating on both sides of the cutoff. What estimation strategy would allow you to estimate equation (3) if you want to allow for different functional forms on either side of the cutoff point?

<p>Use separate regressions on either side of the cutoff, where each regression uses flexible functional forms such as splines or local polynomials. (B)</p> Signup and view all the answers

Instead of just using Democrat wins, suppose we were to add Republican wins into the regression. How would this change equation (3)?

<p>The model would have to be augmented to account for the probability of being a Republican and, also interact with the original parameters to see if the effects are different. (A)</p> Signup and view all the answers

Suppose that the outcome of the next election depends on the outcome of the previous election and some observed factors. What would be the key identification assumption would have allow?

<p>The key identification assumption is there are no unobserved factors that affects both current and future elections. (A)</p> Signup and view all the answers

What strategy can you use if you feel that there is high degree of manipulation of the forcing variable?

<p>Use a fuzzy regression discontinuity design. (A)</p> Signup and view all the answers

In Lee's (2008) framework, what is the counterfactual in the regression discontinuinity design?

<p>Counterfactual is the potential outcome for units on one side of the threshold had they been on the other side. (D)</p> Signup and view all the answers

In the context of Regression Discontinuity (RD) designs, what is the primary risk associated with employing excessively flexible, high-order polynomials to model the relationship between the forcing variable and the outcome?

<p>Amplification of minor data perturbations near the cutoff, potentially fabricating a spurious discontinuity where none truly exists. (C)</p> Signup and view all the answers

Consider a sharp Regression Discontinuity design. The model is given by: $y_i = \alpha + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 + \rho D_i + \eta_i$, where $x_i$ is the forcing variable and $D_i$ is the treatment indicator. If the true relationship between $x_i$ and $y_i$ is highly nonlinear and unsmooth, which of the following issues is most likely to arise despite the inclusion of higher-order polynomial terms?

<p>The model will fail to capture the full extent of the discontinuity, leading to a biased estimate of the treatment effect $\rho$. (B)</p> Signup and view all the answers

In a sharp Regression Discontinuity (RD) design, a researcher estimates the treatment effect using a pooled regression with interaction terms, as represented by the equation (8). Under what condition would constraining the forcing variable functions to be the same on both sides of the cutoff be most likely to yield a substantially different result compared to allowing them to differ?

<p>When there is a significant and qualitative difference in the relationship between the forcing variable and the outcome on either side of the cutoff, beyond just the treatment effect. (C)</p> Signup and view all the answers

Consider a scenario where a researcher applies a sharp Regression Discontinuity (RD) design to evaluate the impact of a scholarship program on student academic performance. The forcing variable is a standardized test score, with a cutoff determining eligibility. The researcher observes a statistically significant discontinuity at the cutoff using a high-order polynomial regression. However, upon closer examination, they find evidence that students near the cutoff are strategically altering their test-taking behavior to just qualify or just miss the scholarship. What is the most critical threat to the validity of the RD estimate in this context?

<p>Violation of the ignorability assumption, as strategic manipulation introduces endogeneity that correlates the forcing variable with unobserved determinants of academic performance. (B)</p> Signup and view all the answers

In the context of Regression Discontinuity designs, what is the fundamental trade-off between using a very narrow bandwidth around the cutoff point and using a wider bandwidth?

<p>Narrower bandwidths reduce bias but increase variance, while wider bandwidths reduce variance but increase bias. (C)</p> Signup and view all the answers

In a fuzzy Regression Discontinuity design, the treatment assignment is imperfectly determined by the forcing variable crossing the threshold. Which of the following econometric techniques is most suitable for consistently estimating the local average treatment effect (LATE) in this scenario?

<p>Instrumental Variables (IV) regression, where the forcing variable crossing the threshold serves as the instrument for the actual treatment received. (B)</p> Signup and view all the answers

A researcher is employing a Regression Discontinuity design to estimate the effect of a new job training program on employment rates. The eligibility for the program is determined by an individual's score on an entrance exam. The researcher notices that individuals with scores just above the eligibility threshold have significantly different baseline characteristics (e.g., education level, prior work experience) compared to those just below the threshold. What is the most appropriate course of action to address this issue and ensure the validity of the RD estimates?

<p>Use a non-parametric RD approach with a very narrow bandwidth around the threshold to minimize the impact of differing baseline characteristics. (A)</p> Signup and view all the answers

Consider a scenario where a researcher is using a sharp Regression Discontinuity design to examine the effect of a policy intervention at a specific threshold. However, the density of observations is not continuous around the threshold, exhibiting a noticeable 'jump' or discontinuity. What is the most likely explanation for this phenomenon, and what does it imply for the validity of the RD design?

<p>This indicates a violation of the continuity assumption and suggests that individuals are manipulating the forcing variable to be on a specific side of the threshold, thus threatening the validity of the RD design. (D)</p> Signup and view all the answers

In the context of Regression Discontinuity (RD) designs, under what specific circumstances would a local linear regression approach around the cutoff be preferred over a global polynomial regression of higher order?

<p>When the functional form is misspecified and high order polynomials might fit the data too closely, potentially creating a spurious discontinuity.. (D)</p> Signup and view all the answers

A researcher aims to leverage a Regression Discontinuity design to ascertain the impact of a specialized educational program on students' standardized test outcomes. Program admission hinges on an entrance exam score, marking the forcing variable. The analyst posits that the causal parameter estimation is susceptible to bias stemming from the non-random sorting around the cutoff during exams. Which of the following methodologies aims to deal with individuals manipulating their actual score to be either above or below a certain threshold?

<p>Conducting manipulation testing through density discontinuity analysis . (C)</p> Signup and view all the answers

Flashcards

Regression Discontinuity (RD)

A design that exploits knowledge of rules determining treatment, creating experiment-like conditions.

Basic RD Idea

Considering a threshold where similar individuals have different outcomes.

RD Analogy

Treating above-threshold and below-threshold groups like treatment and control groups.

RD Example: College Financial Aid

Top test-takers get scholarships, causing a jump in scholarship amount.

Signup and view all the flashcards

RD Example: School Class Size

Limiting class size to 40, leading to differences in class sizes.

Signup and view all the flashcards

RD Example: Union Elections

Crossing 50% in union elections triggers mandatory good faith bargaining.

Signup and view all the flashcards

Sharp Regression Discontinuity

A type of RD design where the treatment is perfectly determined by the assignment variable.

Signup and view all the flashcards

Non-Attainment Classification

Counties exceeding 75 micrograms per cubic meter of pollutant particulates face stricter regulations.

Signup and view all the flashcards

Problem with Standard Regression

Regression analysis may suffer from omitted variable bias.

Signup and view all the flashcards

Sharp Regression Discontinuity (RD)

Treatment occurs when a continuous variable crosses a specific threshold.

Signup and view all the flashcards

Treatment Rule in Sharp RD

Treatment Di equals 1 if xi is greater than x0, and 0 otherwise.

Signup and view all the flashcards

Forcing Variable (in RD)

Variable determining treatment assignment in RD.

Signup and view all the flashcards

Threshold (x0) in RD

The known cutoff point determining treatment status.

Signup and view all the flashcards

Incumbency Advantage

Winning a House seat last time gives a Democratic candidate an advantage.

Signup and view all the flashcards

Incumbency Resources

Representatives using office privileges to increase their chance of reelection.

Signup and view all the flashcards

Causal Effect of Incumbency

RD can estimate the causal effect of incumbency on reelection probability.

Signup and view all the flashcards

Polynomial Modeling in RD

Modeling f(𝑥𝑖) with a 𝜌𝑡ℎ -order polynomial to represent the relationship between variables.

Signup and view all the flashcards

Variables in Lee (2008) Model

Difference in vote share (forcing variable), and treatment (getting elected).

Signup and view all the flashcards

Differing Functions in Sharp RD

Allowing the forcing variable functions to differ on each side of the cutoff point.

Signup and view all the flashcards

Pooled Regression in RD

Running a regression including interaction terms on both sides of the cutoff.

Signup and view all the flashcards

Problem with Flexible Functions

If too flexible functional form has a potential to "create” a discontinuity

Signup and view all the flashcards

Importance Functional RD

The choice of functional form for the forcing variable function is crucial.

Signup and view all the flashcards

Nonlinearity Problem in RD

What looks like a jump due to treatment may simply be some unaccounted-for nonlinearity in the forcing variable function

Signup and view all the flashcards

Non Parametric RD

Looking at data only very close to the discontinuity

Signup and view all the flashcards

Intuition of RD-design

People just above or below the threshold will be similar in both observed and unobserved ways.

Signup and view all the flashcards

Correct specification RD

The validity of the RD estimates depends on the correct specification of the forcing variable.

Signup and view all the flashcards

Lee (2008) RD Study

Examines election probability vs. vote shares.

Signup and view all the flashcards

Election Winner Indicator (Di)

Binary variable; 1 if vote share margin is positive, 0 otherwise.

Signup and view all the flashcards

Vote Share Margin (xi)

The difference between Democratic and Republican vote share.

Signup and view all the flashcards

Treatment Variable (Di)

The treatment variable in the regression discontinuity design.

Signup and view all the flashcards

Forcing Variable (xi)

A variable that determines assignment to treatment based on a specific threshold.

Signup and view all the flashcards

RD Plot Interpretation

Probability increases with vote share difference, jumps at zero.

Signup and view all the flashcards

RD Placebo Test

Predicts current wins, not previous elections.

Signup and view all the flashcards

Di as deterministic function

Treatment is a deterministic function of the forcing variable.

Signup and view all the flashcards

RD core idea

Captures causal effect via discontinuous function.

Signup and view all the flashcards

Flexible Function f(xi)

Accounts for non-linear relationships with a smooth function.

Signup and view all the flashcards

RD estimation

Compares average outcomes in a small neighborhood around the threshold.

Signup and view all the flashcards

Limit in RD Estimation

Estimates treatment effect independent of a specific model as the interval gets very small

Signup and view all the flashcards

RD Example: Close Elections

Comparing candidates who barely won or lost an election by a small margin.

Signup and view all the flashcards

Data Requirements in RD

The number of observations available close to the threshold might be limitted.

Signup and view all the flashcards

Study Notes

Regression Discontinuity (RD) Overview

  • Regression discontinuity designs are considered by some to be the closest one can get to randomized experiments in social sciences.
  • The RD design has gained traction in varied fields like labor economics, crime studies, education, environmental science, and health economics.
  • RD relies upon explicit understanding of the rules that dictate treatment.
  • The basis of RD lies in the concept that certain rules in a rule-based system are somewhat arbitrary, which presents valuable experimental opportunities.
  • RD can be implemented in two ways: "sharp" and "fuzzy."

RD Introduction

  • The principle behind RD is straightforward.
  • RD considers a threshold where marginal individuals exhibit very different outcomes.
  • Run regression based on a situation where there seems to be discontinuity in the outcomes at the threshold.
  • Above-threshold and below-threshold cases should be treated as treatment and control groups respectively, akin to a standard experiment.

RD Roadmap

  • Encompasses examples of RD.
  • Addresses "sharp" regression discontinuity.
  • Discusses parametric and non-parametric RD methods.
  • Examines threats to the RD design.
  • Mentions "fuzzy" RD.
  • Usage of graphs in RD is covered.

RD Examples

  • Examples of discontinuity exist throughout society.
  • College financial aid in the US, specifically the PSAT/NMSQT, offers a clear instance, noting that generally the top 16,000 scorers secure a scholarship.
  • Even a slight difference in test scores can lead to a significant difference in scholarship awards.
  • One can study the causal effect of scholarship on college enrollment.
  • Maimonides' Rule in Israel, which stipulates a class cannot have more than 40 students, presents another instance.
  • A school with 40 students will have one class, but 41 students will be split into two, with approximately 20-21 students each.
  • The causal effect of class size on study results is also a subject of interest here.
  • In union elections, the NLRB (US National Labor Relations Board) arranges an election if workers wish to unionize.
  • Vote outcomes determine whether an employer has to acknowledge the union (51% +) or not (50% or less).
  • You can study questions related to the influence of unionization on business metrics like survival, employment figures, output measurements, and wages.
  • Air quality regulations in the US, guided by the Clean Air Act's National Ambient Air Quality Standards, classify counties as "non-attainment" if the mean concentration of 5 specific pollutants goes over 75 micrograms/.
  • It is common to study the link between pollution and house prices.
  • Applying standard regression analysis might pose challenges.
  • The general problem is that test scores aren't random, and neither is class size or air pollution.
  • A child in general isnt much different between being in the 94.9th percentile of test scores vs the 95th percentile.
  • Is there are real different between a school with 40 kids vs one with 41 kids?
  • Randomness is likely to manifest around the threshold.

Sharp RD (formally)

  • Treatment Di is viewed functions as a discontinuous of an continuous variable xi.
  • Given a rule, the treatment Di will happen once the variable xi is past the threshold xo.
  • The treatment rule is this; Di = Di(xi) = 1[xi > xo]
  • 1[xi > xo] functions as an indicator; giving a value of one if the criteria within the brackets is met.
  • If the value of xi (forcing variable) is greater or equal to the threshold xo, then D₁ = 1.
  • Similarly, if the value of xi is less than xo, then D₁ = 0.
  • The study includes a "forcing" variable xi – an "assignment" or "running" variable.
  • The researcher is aware of the threshold value xo.

Regression Discontinuity Example (Lee 2008)

  • Lee (2008) explored if having a Democratic candidate for a seat in the US House of Representatives has an advantage when the party previously won the seat.
  • Those already in office can have factors like heightened voter satisfaction or enhanced get-out-the-vote efforts.
  • The question of representatives leveraging their office's resources to gain advantage comes to mind.
  • There is interest as to the causal effect of incumbency on retaining their seat in congress.
  • Lee (2008) studied how the likelihood of election relates to prior share of votes.
  • The premise is that the determinant for winning an election is: D₁ = 1 if xi ≥ 0, and D = 0 if not. The vote share margin (the difference between Democrat and Republican vote share) is the variable Xi.
  • With Xi as the forcing and Di ad the treatment variable.
  • This plots Democrats winning against the difference in vote share from prior elections.
  • The probability of increasing the difference in vote share has a sharp jump at 0.

RD Formalized

  • The model used Yi = a + βxi + pDi + Ni
  • The difference between this model and prior ones lies in Di becoming a deterministic function of Xi or exogenous covariate.
  • To get the causal effect, RD distinguishes between the nonlinear and discontinuous function 1[x; > xo], from the smooth function Xi.

Sharp RD with Flexible Functions

  • It is important to use a flexible, smooth function to estimate the RD model as: yi = a + f(xi) + pDi + Ni.
  • If the function f(xi) is consistent in xo's neighborhood, the model estimate is still possible, even with a varied function.
  • Use a pth-order polynomial to estimate the model.
  • You can write the model as: yi = a + B₁xi + B2x² + ··· + Bpx² + pDi + Ni

RD: Separate functions for each cutoff

  • The practice is to allow the variables to differ on each sides of the cutoff point: f0(xi) and f1(xi).
  • E[Y0i|xi] = α0 + β01xi + β02x²i + ··· + β0px?i
  • E[Y1i|xi] = α1 + ρ + β11xi + β12x²i + ··· + β1px?i, where Xi = xi - Xo.
  • Run regression on both sides that will estimate the treatment effect.
  • This can be written as: Yi = a + Bo1xi + Bo2x² + ... + Bopx² + pDi + B1 Dixi + B2 Dix² + B*p Dix?

Limitations of Sharp RD

  • Problems can exist with too much flexibility, that may cause a discontinuity.
  • Imbens & Gelman suggest to not use high-order polynomials in Regression Discontinuity Designs.
  • When implementing the RD design, use square, linear and local fits.
  • Given the need to accurately define the forcing variable, an alternate exists.
  • The parametric RD serves as an example where the choice of the form is critical.
  • The jump that is assumed to be treatment can be un accounted nonlinearity.
  • Looking at data only very close to discontinuity is an alternative.

Non-parametric RD

  • Utilizing data especially proximal to the discontinuity enhances RD design intuition.
  • People just above or below a threshold have likeness, with them being "treated" because of exposure to the said threshold.
  • One assesses data within a scope of the threshold such as [xo - δ, xo + δ] for a "δ" with small value.
  • One comparison that can be done is, E[yi|xo - 8 < xi <xo]~E[yo|xi = xo]
  • This can be also be done; E[yi|xo < xi < xo + δ]~E [y1|xi = xo].
  • This can be writted as lim δ→0 E[yi|xo < xi < xo + δ] - E[yi|xo - δ < xi < xo] = E[y1i - yo|xi = xo]
  • The comparison exists between average outcomes from both left and right's immediate neighborhoods.
  • Freeing you from reliance on model specification.
  • This also boils down to comparing individuals at both sides of the threshold.

Non-parametric RD: Lee(2008) Example

  • To test the probability of those that lost, against the likelihood of an re-election from both candidates.
  • Compare the candidate that either barely won, or barely lost.
  • With that being said, the candidates are likely to be very similar.

Non-parametric RD- Issues

  • One problem that may come up are data requirements.
  • It may be hard that you want observations only close ot the threshold!
  • The bandwidth can also be increased, such as setting a new "δ", but that will eventually compare observations further, and with the issues, less comparable.

Parametric / Non-Parametric RD?

  • Because of the issues with data, a parametric RD may be an option.
  • Focusing on that cut off however is still important.
  • As the selected window gets small, and data estimates also get smaller, models that need to be able to describe f(xi) should also go down.
  • When the observation zeros is near that of xo, Di, which is the result' should have a stable nature.

Threats to RD Design, "Manipulation"

  • Employers in the Netherlands are given a tax deduction if a worker over 40 does on the job training, or school.
  • A research is conducted to see if this increases the amount of training workers.
  • The benefit only occurs through age, which is a sharp Regression-Discontinuity.
  • D₁ = 1 if Si >= S / D₁ = 0 if Si less than S
  • With Si being the age of said individual.
  • With S being the age of 40.
  • Employers may shift on the fact that there will be a drop in the age of 39 because there will be less workers to incentivize!
  • The RD design may be in valid if individuals act on said "assignment variable"
  • Individuals may manipulate the treatment.
  • Individuals may shift the test of the score X, those that pass Xo (score + merit award) may be different from the individuals that didnt.

Notes On Sharp RD:

  • No need for control because not important covariates exist after the cutoff xo
  • IF covariates are very different , design is wrong!
  • To avoid having an wrong design switch yi with some covariates.
  • Effect will be observed near the threshold, there are some mistakes in the design.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Lecture 6 medium
45 questions

Lecture 6 medium

WellKnownConstellation avatar
WellKnownConstellation
Lecture 6
90 questions

Lecture 6

ClaraJeniffer1 avatar
ClaraJeniffer1
Lecture 6
66 questions

Lecture 6

PrizePhotorealism avatar
PrizePhotorealism
Use Quizgecko on...
Browser
Browser