Lecture 6

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Within the context of non-parametric Regression Discontinuity (RD) designs, consider the estimand represented by equation (11): $lim_{\delta \rightarrow 0} [E(y_i | x_0 < x_i < x_0 + \delta) - E(y_i | x_0 - \delta < x_i < x_0)] = E(y_{1i} - y_{0i} | x_i = x_0)$. If empirical analysis reveals a statistically significant difference in means across the discontinuity, but the underlying distributions of covariates exhibit marked dissimilarity even within infinitesimally small bandwidths around $x_0$, what is the most critical epistemological challenge this poses for causal inference?

Compromised internal validity stemming from potential selection bias at the threshold, suggesting that the observed discontinuity in outcomes may be attributable to pre-existing differences rather than the treatment. (correct)
Reduced statistical power due to the sparsity of data points in the immediate vicinity of $x_0$, implying that while the causal interpretation remains valid, the precision of the estimate is fundamentally limited.
Violation of the conditional independence assumption, rendering the RD estimate solely correlational and devoid of causal interpretation irrespective of bandwidth refinement.
Inflated Type I error rates due to heteroskedasticity induced by the discontinuous treatment assignment, necessitating robust variance estimation but not invalidating the causal interpretation.

In the context of Lee's (2008) regression discontinuity design examining electoral advantage, assume a scenario where the probability of re-election exhibits a statistically significant discontinuity at the threshold of prior electoral victory. However, further investigation using high-resolution precinct-level data reveals systematic strategic manipulation of vote counts in close races, specifically around the win/loss threshold. How does this finding most critically challenge the validity of the non-parametric RD estimate?

It primarily introduces measurement error in the running variable (vote margin), attenuating the estimated treatment effect towards zero and reducing statistical power without fundamentally biasing the causal inference.
It generates a violation of the 'as-if random' assignment assumption at the threshold, as the assignment mechanism is no longer exogenous but influenced by strategic behavior correlated with potential outcomes. (correct)
It primarily affects the external validity of the RD estimate, limiting the generalizability of findings to electoral contexts characterized by the absence of strategic manipulation but not compromising internal validity within the studied context.
It necessitates the implementation of higher-order polynomial controls in a parametric RD framework to adequately model the non-linear relationship between vote margin and re-election probability, thereby restoring the validity of the causal estimate.

Consider the inherent trade-off in non-parametric Regression Discontinuity (RD) estimation between bias and variance as the bandwidth ($\delta$) is manipulated. If a researcher opts to substantially increase $\delta$ to mitigate concerns regarding data sparsity and enhance statistical power, what is the most pertinent econometric consequence concerning the validity of the RD estimate?

Enhanced robustness of the RD estimator to manipulation of the running variable, as a wider bandwidth dilutes the impact of localized strategic behavior around the threshold and strengthens the 'as-if random' assignment assumption.
Reduced precision of the RD estimator due to the inclusion of more heterogeneous observations, which can be effectively addressed by incorporating flexible non-parametric regression techniques within the expanded bandwidth.
Introduction of bias into the RD estimator as the linearity assumption within the bandwidth becomes increasingly untenable, and comparability of observations across the threshold diminishes with expanding bandwidth. (correct)
Increased variance of the RD estimator due to the inclusion of observations further from the threshold, necessitating the use of more robust standard error estimators but not introducing systematic bias.

In transitioning from non-parametric to parametric Regression Discontinuity (RD) approaches to address data limitations, particularly when employing polynomial regression, what fundamental assumption regarding the underlying conditional expectation function must be critically evaluated to ensure the validity of the parametric RD estimate?

The conditional expectation function must be sufficiently smooth and well-approximated by a low-order polynomial in the vicinity of the cutoff, such that the parametric specification adequately captures the local relationship between the running variable and the outcome. (C) Signup and view all the answers

Consider the Dutch tax incentive example, where employers receive deductions for training workers above age 40. If a researcher employs a Regression Discontinuity Design to assess the causal impact of this incentive on training incidence, and discovers a significant discontinuity precisely at age 40, yet also finds compelling evidence of strategic hiring practices where firms preferentially hire slightly younger workers to circumvent the incentive eligibility threshold, what is the most critical consequence for the RD analysis?

The RD estimate will be biased upwards, overestimating the true effect of the incentive, as the strategic hiring behavior artificially inflates the apparent impact of the incentive by concentrating training within the eligible age range. (A) Signup and view all the answers

Consider a scenario where a governmental policy mandates a significant subsidy for renewable energy projects if and only if a project proposal scores above a pre-determined environmental impact threshold on a standardized assessment. In the context of Regression Discontinuity Design, what critical assumption must be stringently verified to ensure the internal validity of causal inference regarding the subsidy's effect on project viability?

Potential confounding variables, such as technological readiness and access to capital, must be smoothly continuous functions of the environmental impact score at the threshold. (D) Signup and view all the answers

In a sharp Regression Discontinuity (RD) design examining the effect of a scholarship award based on a standardized test score threshold, a researcher observes a statistically significant discontinuity in college enrollment rates at the threshold. However, prior to the test, students were offered intensive, but differentially effective, preparatory tutoring based on their predicted scores. Which econometric challenge poses the most substantial threat to the causal interpretation of the observed discontinuity?

Endogenous manipulation of the forcing variable, specifically strategic effort exertion to surpass the score threshold due to the anticipation of tutoring effectiveness. (B) Signup and view all the answers

Consider a Fuzzy Regression Discontinuity (FRD) design where treatment assignment is imperfectly predicted by crossing a threshold on a continuous forcing variable. Identification in FRD relies on which core econometric principle to estimate the Local Average Treatment Effect (LATE)?

Instrumental Variables (IV) identification, where threshold crossing serves as an instrument for actual treatment receipt. (B) Signup and view all the answers

In the context of non-parametric Regression Discontinuity, what is the primary rationale for employing local linear regression as opposed to global polynomial regression when estimating treatment effects at the threshold?

Local linear regression provides consistent estimates of the conditional mean function specifically at the discontinuity point, minimizing bias from functional form misspecification away from the threshold. (A) Signup and view all the answers

When assessing the validity of a Regression Discontinuity design, researchers often conduct 'placebo' or 'falsification' tests. Which of the following best exemplifies a robust placebo test in a sharp RD context?

Examining for discontinuities in pre-treatment outcome variables at the treatment threshold, under the premise that no discontinuity should exist if the RD assumptions hold. (A) Signup and view all the answers

In a scenario employing Regression Discontinuity to evaluate the impact of unionization on firm productivity, the threshold for mandatory employer recognition of a union is achieving 50% + 1 of worker votes in a secret ballot election. However, firms anticipating a close election outcome may engage in preemptive actions that affect worker voting behavior. Which potential bias is LEAST likely to be mitigated by a well-executed Regression Discontinuity design in this context?

Manipulation of the forcing variable (vote share) by firms strategically influencing worker turnout or voting intentions around the threshold. (D) Signup and view all the answers

When graphically representing Regression Discontinuity results, researchers often employ scatter plots with superimposed regression lines on either side of the threshold. Beyond visually inspecting for a discontinuity, what is the most critical diagnostic information conveyed by these graphical representations for assessing RD validity?

The visual confirmation of similar functional forms and trends in the outcome variable on both sides of the threshold, except for the discrete jump at the threshold itself. (C) Signup and view all the answers

In the context of Regression Discontinuity Design (RDD), what is the most critical methodological justification for attributing causality when analyzing the effect of a treatment $D_i$ based on a forcing variable $x_i$ and threshold $x_0$?

The assertion that individuals or units near the threshold $x_0$, despite differing in treatment status $D_i$, are statistically identical in expectation in the absence of treatment, effectively mimicking random assignment. (D) Signup and view all the answers

Consider a scenario where a municipality implements a stringent environmental regulation ($D_i=1$) if the particulate matter concentration ($x_i$) exceeds 75 micrograms per cubic meter ($x_0 = 75$). If standard regression analysis demonstrates a spurious negative correlation between regulation and industrial output, what fundamental econometric problem does the Regression Discontinuity Design (RDD) primarily aim to mitigate in this context?

Omitted variable bias arising from unobserved confounders that simultaneously influence both particulate matter concentration and industrial output. (C) Signup and view all the answers

In a sharp Regression Discontinuity Design (RDD), the treatment assignment rule is precisely defined as $D_i = 1[x_i > x_0]$. Which of the following best characterizes the nature of the discontinuity exploited for causal inference in this design?

A non-analytic point in the conditional expectation function $E[Y_i | x_i]$ at $x_0$, reflecting an abrupt change in the expected outcome due to the treatment. (A) Signup and view all the answers

Lee (2008) employs Regression Discontinuity to assess incumbency advantage in US House elections, using the Democratic party's vote share in the previous election as the forcing variable. Assuming a sharp RDD and focusing solely on elections extremely close to the 50% vote share threshold, what specific causal estimand is Lee's design primarily intended to identify?

The local average treatment effect (LATE) of incumbency for Democratic candidates whose prior election vote share was infinitesimally above or below the 50% threshold. (D) Signup and view all the answers

In the context of the air pollution and housing value Regression Discontinuity example, consider a county that is just barely classified as 'non-attainment' under the Clean Air Act due to its particulate matter concentration being infinitesimally above the 75 micrograms/cubic meter threshold. What is the most pertinent counterfactual scenario implicitly invoked by the RDD approach to estimate the causal effect of 'non-attainment' status on home values?

The hypothetical home values in the 'non-attainment' county if its particulate matter concentration had been infinitesimally below the 75 micrograms/cubic meter threshold, holding all else constant. (A) Signup and view all the answers

Why does standard regression analysis demonstrably fail to establish a causal link in scenarios where Regression Discontinuity Design (RDD) is applicable, such as the relationship between air pollution and home values or incumbency and electoral success?

The primary deficiency lies in the endogeneity of the treatment variable; factors influencing treatment status (e.g., pollution levels, candidate quality) are often correlated with the outcome variable (e.g., home values, electoral success), violating the exogeneity assumption. (A) Signup and view all the answers

The validity of Regression Discontinuity Design (RDD) crucially depends on the assumption of 'local continuity' or 'smoothness'. In the context of the incumbency advantage study by Lee (2008), what specific condition must be met for this assumption to hold, thereby ensuring the internal validity of the RDD?

The potential outcomes (election results in the absence of incumbency advantage) must be continuous functions of the prior Democratic vote share around the 50% threshold, except for the discontinuous effect of incumbency itself. (C) Signup and view all the answers

Consider a hypothetical 'fuzzy' Regression Discontinuity Design where the assignment to environmental regulation ($D_i$) is probabilistically, but not deterministically, determined by exceeding the particulate matter threshold ($x_0 = 75$). Specifically, the probability of regulation increases sharply but not discretely at $x_0$. What is the most appropriate econometric approach to estimate the causal effect in this fuzzy RDD framework?

Two-stage least squares (2SLS) instrumental variable estimation, using the indicator function $1[x_i > x_0]$ as an instrument for the actual regulatory status $D_i$. (C) Signup and view all the answers

A researcher aims to study the impact of receiving a prestigious research grant ($D_i$) on the publication output of academics. Grant eligibility is determined by a percentile rank ($x_i$) in a national assessment, with a threshold at the 90th percentile ($x_0 = 90$). However, concerns arise that academics close to the 90th percentile may strategically 'game' the system, for instance, by artificially inflating their assessment scores. How might such strategic behavior potentially invalidate the Regression Discontinuity Design (RDD) assumptions and compromise causal inference in this scenario?

Gaming the system induces a violation of the local randomization assumption, as units just above and below the threshold are no longer comparable due to endogenous sorting around $x_0$. (D) Signup and view all the answers

In the context of Lee's (2008) regression discontinuity design examining electoral success, what econometric challenge does the deterministic relationship between the forcing variable ($x_i$) and the treatment indicator ($D_i$) primarily address, and how does it refine causal inference compared to standard regression models?

It leverages the discontinuity at the threshold ($x_0$) to isolate the causal effect of prior electoral victory, thus circumventing endogeneity issues that arise from assuming $D_i$ is independent of unobservables. (A) Signup and view all the answers

Considering the 'placebo'-type test in Lee (2008), what specific threat to the validity of the regression discontinuity design is this test intended to address, and how would a failure of the test undermine the causal interpretation of the results?

It aims to uncover potential manipulation of the forcing variable around the threshold, which, if systematic, would suggest that the observed discontinuity is not a genuine causal effect. (D) Signup and view all the answers

In the context of estimating equation (3), $y_i = \alpha + f(x_i) + \rho D_i + \eta_i$, what are the potential consequences of misspecifying the functional form of $f(x_i)$ in a sharp regression discontinuity design, especially if the true relationship is highly nonlinear?

The estimator of $\rho$ will be biased, as the misspecified $f(x_i)$ may not adequately capture the true relationship between $x_i$ and $y_i$, confounding the estimate of the treatment effect. (A) Signup and view all the answers

Suppose you suspect the presence of unobserved heterogeneity that moderates the effect of $D_i$ on $y_i$ within Lee's (2008) framework. How could you extend the basic RD model to account for this, and what specific statistical technique would you employ to test for the presence of such heterogeneity?

Include interaction terms between $D_i$ and pre-treatment covariates known to be correlated with the potential moderators, and use a Wald test to assess the significance of these interaction effects. (A) Signup and view all the answers

In a scenario where the forcing variable ($x_i$) is discrete rather than continuous, what modifications to the standard sharp regression discontinuity design would be necessary to ensure valid causal inference, and how might this impact the interpretation of the estimated treatment effect?

Apply a fuzzy regression discontinuity design, using the discrete forcing variable as an instrument for the treatment, and interpret the estimated treatment effect as a local average treatment effect (LATE). (D) Signup and view all the answers

Assume that there are non-linear effects operating on both sides of the cutoff. What estimation strategy would allow you to estimate equation (3) if you want to allow for different functional forms on either side of the cutoff point?

Use separate regressions on either side of the cutoff, where each regression uses flexible functional forms such as splines or local polynomials. (B) Signup and view all the answers

Instead of just using Democrat wins, suppose we were to add Republican wins into the regression. How would this change equation (3)?

The model would have to be augmented to account for the probability of being a Republican and, also interact with the original parameters to see if the effects are different. (A) Signup and view all the answers

Suppose that the outcome of the next election depends on the outcome of the previous election and some observed factors. What would be the key identification assumption would have allow?

The key identification assumption is there are no unobserved factors that affects both current and future elections. (A) Signup and view all the answers

What strategy can you use if you feel that there is high degree of manipulation of the forcing variable?

Use a fuzzy regression discontinuity design. (A) Signup and view all the answers

In Lee's (2008) framework, what is the counterfactual in the regression discontinuinity design?

Counterfactual is the potential outcome for units on one side of the threshold had they been on the other side. (D) Signup and view all the answers

In the context of Regression Discontinuity (RD) designs, what is the primary risk associated with employing excessively flexible, high-order polynomials to model the relationship between the forcing variable and the outcome?

Amplification of minor data perturbations near the cutoff, potentially fabricating a spurious discontinuity where none truly exists. (C) Signup and view all the answers

Consider a sharp Regression Discontinuity design. The model is given by: $y_i = \alpha + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 + \rho D_i + \eta_i$, where $x_i$ is the forcing variable and $D_i$ is the treatment indicator. If the true relationship between $x_i$ and $y_i$ is highly nonlinear and unsmooth, which of the following issues is most likely to arise despite the inclusion of higher-order polynomial terms?

The model will fail to capture the full extent of the discontinuity, leading to a biased estimate of the treatment effect $\rho$. (B) Signup and view all the answers

In a sharp Regression Discontinuity (RD) design, a researcher estimates the treatment effect using a pooled regression with interaction terms, as represented by the equation (8). Under what condition would constraining the forcing variable functions to be the same on both sides of the cutoff be most likely to yield a substantially different result compared to allowing them to differ?

When there is a significant and qualitative difference in the relationship between the forcing variable and the outcome on either side of the cutoff, beyond just the treatment effect. (C) Signup and view all the answers

Consider a scenario where a researcher applies a sharp Regression Discontinuity (RD) design to evaluate the impact of a scholarship program on student academic performance. The forcing variable is a standardized test score, with a cutoff determining eligibility. The researcher observes a statistically significant discontinuity at the cutoff using a high-order polynomial regression. However, upon closer examination, they find evidence that students near the cutoff are strategically altering their test-taking behavior to just qualify or just miss the scholarship. What is the most critical threat to the validity of the RD estimate in this context?

Violation of the ignorability assumption, as strategic manipulation introduces endogeneity that correlates the forcing variable with unobserved determinants of academic performance. (B) Signup and view all the answers

In the context of Regression Discontinuity designs, what is the fundamental trade-off between using a very narrow bandwidth around the cutoff point and using a wider bandwidth?

Narrower bandwidths reduce bias but increase variance, while wider bandwidths reduce variance but increase bias. (C) Signup and view all the answers

In a fuzzy Regression Discontinuity design, the treatment assignment is imperfectly determined by the forcing variable crossing the threshold. Which of the following econometric techniques is most suitable for consistently estimating the local average treatment effect (LATE) in this scenario?

Instrumental Variables (IV) regression, where the forcing variable crossing the threshold serves as the instrument for the actual treatment received. (B) Signup and view all the answers

A researcher is employing a Regression Discontinuity design to estimate the effect of a new job training program on employment rates. The eligibility for the program is determined by an individual's score on an entrance exam. The researcher notices that individuals with scores just above the eligibility threshold have significantly different baseline characteristics (e.g., education level, prior work experience) compared to those just below the threshold. What is the most appropriate course of action to address this issue and ensure the validity of the RD estimates?

Use a non-parametric RD approach with a very narrow bandwidth around the threshold to minimize the impact of differing baseline characteristics. (A) Signup and view all the answers

Consider a scenario where a researcher is using a sharp Regression Discontinuity design to examine the effect of a policy intervention at a specific threshold. However, the density of observations is not continuous around the threshold, exhibiting a noticeable 'jump' or discontinuity. What is the most likely explanation for this phenomenon, and what does it imply for the validity of the RD design?

This indicates a violation of the continuity assumption and suggests that individuals are manipulating the forcing variable to be on a specific side of the threshold, thus threatening the validity of the RD design. (D) Signup and view all the answers

In the context of Regression Discontinuity (RD) designs, under what specific circumstances would a local linear regression approach around the cutoff be preferred over a global polynomial regression of higher order?

When the functional form is misspecified and high order polynomials might fit the data too closely, potentially creating a spurious discontinuity.. (D) Signup and view all the answers

A researcher aims to leverage a Regression Discontinuity design to ascertain the impact of a specialized educational program on students' standardized test outcomes. Program admission hinges on an entrance exam score, marking the forcing variable. The analyst posits that the causal parameter estimation is susceptible to bias stemming from the non-random sorting around the cutoff during exams. Which of the following methodologies aims to deal with individuals manipulating their actual score to be either above or below a certain threshold?

Conducting manipulation testing through density discontinuity analysis . (C) Signup and view all the answers

Flashcards

Regression Discontinuity (RD)

A design that exploits knowledge of rules determining treatment, creating experiment-like conditions.

Basic RD Idea

Considering a threshold where similar individuals have different outcomes.