Podcast
Questions and Answers
Within the context of non-parametric Regression Discontinuity (RD) designs, consider the estimand represented by equation (11): $lim_{\delta \rightarrow 0} [E(y_i | x_0 < x_i < x_0 + \delta) - E(y_i | x_0 - \delta < x_i < x_0)] = E(y_{1i} - y_{0i} | x_i = x_0)$. If empirical analysis reveals a statistically significant difference in means across the discontinuity, but the underlying distributions of covariates exhibit marked dissimilarity even within infinitesimally small bandwidths around $x_0$, what is the most critical epistemological challenge this poses for causal inference?
Within the context of non-parametric Regression Discontinuity (RD) designs, consider the estimand represented by equation (11): $lim_{\delta \rightarrow 0} [E(y_i | x_0 < x_i < x_0 + \delta) - E(y_i | x_0 - \delta < x_i < x_0)] = E(y_{1i} - y_{0i} | x_i = x_0)$. If empirical analysis reveals a statistically significant difference in means across the discontinuity, but the underlying distributions of covariates exhibit marked dissimilarity even within infinitesimally small bandwidths around $x_0$, what is the most critical epistemological challenge this poses for causal inference?
- Compromised internal validity stemming from potential selection bias at the threshold, suggesting that the observed discontinuity in outcomes may be attributable to pre-existing differences rather than the treatment. (correct)
- Reduced statistical power due to the sparsity of data points in the immediate vicinity of $x_0$, implying that while the causal interpretation remains valid, the precision of the estimate is fundamentally limited.
- Violation of the conditional independence assumption, rendering the RD estimate solely correlational and devoid of causal interpretation irrespective of bandwidth refinement.
- Inflated Type I error rates due to heteroskedasticity induced by the discontinuous treatment assignment, necessitating robust variance estimation but not invalidating the causal interpretation.
In the context of Lee's (2008) regression discontinuity design examining electoral advantage, assume a scenario where the probability of re-election exhibits a statistically significant discontinuity at the threshold of prior electoral victory. However, further investigation using high-resolution precinct-level data reveals systematic strategic manipulation of vote counts in close races, specifically around the win/loss threshold. How does this finding most critically challenge the validity of the non-parametric RD estimate?
In the context of Lee's (2008) regression discontinuity design examining electoral advantage, assume a scenario where the probability of re-election exhibits a statistically significant discontinuity at the threshold of prior electoral victory. However, further investigation using high-resolution precinct-level data reveals systematic strategic manipulation of vote counts in close races, specifically around the win/loss threshold. How does this finding most critically challenge the validity of the non-parametric RD estimate?
- It primarily introduces measurement error in the running variable (vote margin), attenuating the estimated treatment effect towards zero and reducing statistical power without fundamentally biasing the causal inference.
- It generates a violation of the 'as-if random' assignment assumption at the threshold, as the assignment mechanism is no longer exogenous but influenced by strategic behavior correlated with potential outcomes. (correct)
- It primarily affects the external validity of the RD estimate, limiting the generalizability of findings to electoral contexts characterized by the absence of strategic manipulation but not compromising internal validity within the studied context.
- It necessitates the implementation of higher-order polynomial controls in a parametric RD framework to adequately model the non-linear relationship between vote margin and re-election probability, thereby restoring the validity of the causal estimate.
Consider the inherent trade-off in non-parametric Regression Discontinuity (RD) estimation between bias and variance as the bandwidth ($\delta$) is manipulated. If a researcher opts to substantially increase $\delta$ to mitigate concerns regarding data sparsity and enhance statistical power, what is the most pertinent econometric consequence concerning the validity of the RD estimate?
Consider the inherent trade-off in non-parametric Regression Discontinuity (RD) estimation between bias and variance as the bandwidth ($\delta$) is manipulated. If a researcher opts to substantially increase $\delta$ to mitigate concerns regarding data sparsity and enhance statistical power, what is the most pertinent econometric consequence concerning the validity of the RD estimate?
- Enhanced robustness of the RD estimator to manipulation of the running variable, as a wider bandwidth dilutes the impact of localized strategic behavior around the threshold and strengthens the 'as-if random' assignment assumption.
- Reduced precision of the RD estimator due to the inclusion of more heterogeneous observations, which can be effectively addressed by incorporating flexible non-parametric regression techniques within the expanded bandwidth.
- Introduction of bias into the RD estimator as the linearity assumption within the bandwidth becomes increasingly untenable, and comparability of observations across the threshold diminishes with expanding bandwidth. (correct)
- Increased variance of the RD estimator due to the inclusion of observations further from the threshold, necessitating the use of more robust standard error estimators but not introducing systematic bias.
In transitioning from non-parametric to parametric Regression Discontinuity (RD) approaches to address data limitations, particularly when employing polynomial regression, what fundamental assumption regarding the underlying conditional expectation function must be critically evaluated to ensure the validity of the parametric RD estimate?
In transitioning from non-parametric to parametric Regression Discontinuity (RD) approaches to address data limitations, particularly when employing polynomial regression, what fundamental assumption regarding the underlying conditional expectation function must be critically evaluated to ensure the validity of the parametric RD estimate?
Consider the Dutch tax incentive example, where employers receive deductions for training workers above age 40. If a researcher employs a Regression Discontinuity Design to assess the causal impact of this incentive on training incidence, and discovers a significant discontinuity precisely at age 40, yet also finds compelling evidence of strategic hiring practices where firms preferentially hire slightly younger workers to circumvent the incentive eligibility threshold, what is the most critical consequence for the RD analysis?
Consider the Dutch tax incentive example, where employers receive deductions for training workers above age 40. If a researcher employs a Regression Discontinuity Design to assess the causal impact of this incentive on training incidence, and discovers a significant discontinuity precisely at age 40, yet also finds compelling evidence of strategic hiring practices where firms preferentially hire slightly younger workers to circumvent the incentive eligibility threshold, what is the most critical consequence for the RD analysis?
Consider a scenario where a governmental policy mandates a significant subsidy for renewable energy projects if and only if a project proposal scores above a pre-determined environmental impact threshold on a standardized assessment. In the context of Regression Discontinuity Design, what critical assumption must be stringently verified to ensure the internal validity of causal inference regarding the subsidy's effect on project viability?
Consider a scenario where a governmental policy mandates a significant subsidy for renewable energy projects if and only if a project proposal scores above a pre-determined environmental impact threshold on a standardized assessment. In the context of Regression Discontinuity Design, what critical assumption must be stringently verified to ensure the internal validity of causal inference regarding the subsidy's effect on project viability?
In a sharp Regression Discontinuity (RD) design examining the effect of a scholarship award based on a standardized test score threshold, a researcher observes a statistically significant discontinuity in college enrollment rates at the threshold. However, prior to the test, students were offered intensive, but differentially effective, preparatory tutoring based on their predicted scores. Which econometric challenge poses the most substantial threat to the causal interpretation of the observed discontinuity?
In a sharp Regression Discontinuity (RD) design examining the effect of a scholarship award based on a standardized test score threshold, a researcher observes a statistically significant discontinuity in college enrollment rates at the threshold. However, prior to the test, students were offered intensive, but differentially effective, preparatory tutoring based on their predicted scores. Which econometric challenge poses the most substantial threat to the causal interpretation of the observed discontinuity?
Consider a Fuzzy Regression Discontinuity (FRD) design where treatment assignment is imperfectly predicted by crossing a threshold on a continuous forcing variable. Identification in FRD relies on which core econometric principle to estimate the Local Average Treatment Effect (LATE)?
Consider a Fuzzy Regression Discontinuity (FRD) design where treatment assignment is imperfectly predicted by crossing a threshold on a continuous forcing variable. Identification in FRD relies on which core econometric principle to estimate the Local Average Treatment Effect (LATE)?
In the context of non-parametric Regression Discontinuity, what is the primary rationale for employing local linear regression as opposed to global polynomial regression when estimating treatment effects at the threshold?
In the context of non-parametric Regression Discontinuity, what is the primary rationale for employing local linear regression as opposed to global polynomial regression when estimating treatment effects at the threshold?
When assessing the validity of a Regression Discontinuity design, researchers often conduct 'placebo' or 'falsification' tests. Which of the following best exemplifies a robust placebo test in a sharp RD context?
When assessing the validity of a Regression Discontinuity design, researchers often conduct 'placebo' or 'falsification' tests. Which of the following best exemplifies a robust placebo test in a sharp RD context?
In a scenario employing Regression Discontinuity to evaluate the impact of unionization on firm productivity, the threshold for mandatory employer recognition of a union is achieving 50% + 1 of worker votes in a secret ballot election. However, firms anticipating a close election outcome may engage in preemptive actions that affect worker voting behavior. Which potential bias is LEAST likely to be mitigated by a well-executed Regression Discontinuity design in this context?
In a scenario employing Regression Discontinuity to evaluate the impact of unionization on firm productivity, the threshold for mandatory employer recognition of a union is achieving 50% + 1 of worker votes in a secret ballot election. However, firms anticipating a close election outcome may engage in preemptive actions that affect worker voting behavior. Which potential bias is LEAST likely to be mitigated by a well-executed Regression Discontinuity design in this context?
When graphically representing Regression Discontinuity results, researchers often employ scatter plots with superimposed regression lines on either side of the threshold. Beyond visually inspecting for a discontinuity, what is the most critical diagnostic information conveyed by these graphical representations for assessing RD validity?
When graphically representing Regression Discontinuity results, researchers often employ scatter plots with superimposed regression lines on either side of the threshold. Beyond visually inspecting for a discontinuity, what is the most critical diagnostic information conveyed by these graphical representations for assessing RD validity?
In the context of Regression Discontinuity Design (RDD), what is the most critical methodological justification for attributing causality when analyzing the effect of a treatment $D_i$ based on a forcing variable $x_i$ and threshold $x_0$?
In the context of Regression Discontinuity Design (RDD), what is the most critical methodological justification for attributing causality when analyzing the effect of a treatment $D_i$ based on a forcing variable $x_i$ and threshold $x_0$?
Consider a scenario where a municipality implements a stringent environmental regulation ($D_i=1$) if the particulate matter concentration ($x_i$) exceeds 75 micrograms per cubic meter ($x_0 = 75$). If standard regression analysis demonstrates a spurious negative correlation between regulation and industrial output, what fundamental econometric problem does the Regression Discontinuity Design (RDD) primarily aim to mitigate in this context?
Consider a scenario where a municipality implements a stringent environmental regulation ($D_i=1$) if the particulate matter concentration ($x_i$) exceeds 75 micrograms per cubic meter ($x_0 = 75$). If standard regression analysis demonstrates a spurious negative correlation between regulation and industrial output, what fundamental econometric problem does the Regression Discontinuity Design (RDD) primarily aim to mitigate in this context?
In a sharp Regression Discontinuity Design (RDD), the treatment assignment rule is precisely defined as $D_i = 1[x_i > x_0]$. Which of the following best characterizes the nature of the discontinuity exploited for causal inference in this design?
In a sharp Regression Discontinuity Design (RDD), the treatment assignment rule is precisely defined as $D_i = 1[x_i > x_0]$. Which of the following best characterizes the nature of the discontinuity exploited for causal inference in this design?
Lee (2008) employs Regression Discontinuity to assess incumbency advantage in US House elections, using the Democratic party's vote share in the previous election as the forcing variable. Assuming a sharp RDD and focusing solely on elections extremely close to the 50% vote share threshold, what specific causal estimand is Lee's design primarily intended to identify?
Lee (2008) employs Regression Discontinuity to assess incumbency advantage in US House elections, using the Democratic party's vote share in the previous election as the forcing variable. Assuming a sharp RDD and focusing solely on elections extremely close to the 50% vote share threshold, what specific causal estimand is Lee's design primarily intended to identify?
In the context of the air pollution and housing value Regression Discontinuity example, consider a county that is just barely classified as 'non-attainment' under the Clean Air Act due to its particulate matter concentration being infinitesimally above the 75 micrograms/cubic meter threshold. What is the most pertinent counterfactual scenario implicitly invoked by the RDD approach to estimate the causal effect of 'non-attainment' status on home values?
In the context of the air pollution and housing value Regression Discontinuity example, consider a county that is just barely classified as 'non-attainment' under the Clean Air Act due to its particulate matter concentration being infinitesimally above the 75 micrograms/cubic meter threshold. What is the most pertinent counterfactual scenario implicitly invoked by the RDD approach to estimate the causal effect of 'non-attainment' status on home values?
Why does standard regression analysis demonstrably fail to establish a causal link in scenarios where Regression Discontinuity Design (RDD) is applicable, such as the relationship between air pollution and home values or incumbency and electoral success?
Why does standard regression analysis demonstrably fail to establish a causal link in scenarios where Regression Discontinuity Design (RDD) is applicable, such as the relationship between air pollution and home values or incumbency and electoral success?
The validity of Regression Discontinuity Design (RDD) crucially depends on the assumption of 'local continuity' or 'smoothness'. In the context of the incumbency advantage study by Lee (2008), what specific condition must be met for this assumption to hold, thereby ensuring the internal validity of the RDD?
The validity of Regression Discontinuity Design (RDD) crucially depends on the assumption of 'local continuity' or 'smoothness'. In the context of the incumbency advantage study by Lee (2008), what specific condition must be met for this assumption to hold, thereby ensuring the internal validity of the RDD?
Consider a hypothetical 'fuzzy' Regression Discontinuity Design where the assignment to environmental regulation ($D_i$) is probabilistically, but not deterministically, determined by exceeding the particulate matter threshold ($x_0 = 75$). Specifically, the probability of regulation increases sharply but not discretely at $x_0$. What is the most appropriate econometric approach to estimate the causal effect in this fuzzy RDD framework?
Consider a hypothetical 'fuzzy' Regression Discontinuity Design where the assignment to environmental regulation ($D_i$) is probabilistically, but not deterministically, determined by exceeding the particulate matter threshold ($x_0 = 75$). Specifically, the probability of regulation increases sharply but not discretely at $x_0$. What is the most appropriate econometric approach to estimate the causal effect in this fuzzy RDD framework?
A researcher aims to study the impact of receiving a prestigious research grant ($D_i$) on the publication output of academics. Grant eligibility is determined by a percentile rank ($x_i$) in a national assessment, with a threshold at the 90th percentile ($x_0 = 90$). However, concerns arise that academics close to the 90th percentile may strategically 'game' the system, for instance, by artificially inflating their assessment scores. How might such strategic behavior potentially invalidate the Regression Discontinuity Design (RDD) assumptions and compromise causal inference in this scenario?
A researcher aims to study the impact of receiving a prestigious research grant ($D_i$) on the publication output of academics. Grant eligibility is determined by a percentile rank ($x_i$) in a national assessment, with a threshold at the 90th percentile ($x_0 = 90$). However, concerns arise that academics close to the 90th percentile may strategically 'game' the system, for instance, by artificially inflating their assessment scores. How might such strategic behavior potentially invalidate the Regression Discontinuity Design (RDD) assumptions and compromise causal inference in this scenario?
In the context of Lee's (2008) regression discontinuity design examining electoral success, what econometric challenge does the deterministic relationship between the forcing variable ($x_i$) and the treatment indicator ($D_i$) primarily address, and how does it refine causal inference compared to standard regression models?
In the context of Lee's (2008) regression discontinuity design examining electoral success, what econometric challenge does the deterministic relationship between the forcing variable ($x_i$) and the treatment indicator ($D_i$) primarily address, and how does it refine causal inference compared to standard regression models?
Considering the 'placebo'-type test in Lee (2008), what specific threat to the validity of the regression discontinuity design is this test intended to address, and how would a failure of the test undermine the causal interpretation of the results?
Considering the 'placebo'-type test in Lee (2008), what specific threat to the validity of the regression discontinuity design is this test intended to address, and how would a failure of the test undermine the causal interpretation of the results?
In the context of estimating equation (3), $y_i = \alpha + f(x_i) + \rho D_i + \eta_i$, what are the potential consequences of misspecifying the functional form of $f(x_i)$ in a sharp regression discontinuity design, especially if the true relationship is highly nonlinear?
In the context of estimating equation (3), $y_i = \alpha + f(x_i) + \rho D_i + \eta_i$, what are the potential consequences of misspecifying the functional form of $f(x_i)$ in a sharp regression discontinuity design, especially if the true relationship is highly nonlinear?
Suppose you suspect the presence of unobserved heterogeneity that moderates the effect of $D_i$ on $y_i$ within Lee's (2008) framework. How could you extend the basic RD model to account for this, and what specific statistical technique would you employ to test for the presence of such heterogeneity?
Suppose you suspect the presence of unobserved heterogeneity that moderates the effect of $D_i$ on $y_i$ within Lee's (2008) framework. How could you extend the basic RD model to account for this, and what specific statistical technique would you employ to test for the presence of such heterogeneity?
In a scenario where the forcing variable ($x_i$) is discrete rather than continuous, what modifications to the standard sharp regression discontinuity design would be necessary to ensure valid causal inference, and how might this impact the interpretation of the estimated treatment effect?
In a scenario where the forcing variable ($x_i$) is discrete rather than continuous, what modifications to the standard sharp regression discontinuity design would be necessary to ensure valid causal inference, and how might this impact the interpretation of the estimated treatment effect?
Assume that there are non-linear effects operating on both sides of the cutoff. What estimation strategy would allow you to estimate equation (3) if you want to allow for different functional forms on either side of the cutoff point?
Assume that there are non-linear effects operating on both sides of the cutoff. What estimation strategy would allow you to estimate equation (3) if you want to allow for different functional forms on either side of the cutoff point?
Instead of just using Democrat wins, suppose we were to add Republican wins into the regression. How would this change equation (3)?
Instead of just using Democrat wins, suppose we were to add Republican wins into the regression. How would this change equation (3)?
Suppose that the outcome of the next election depends on the outcome of the previous election and some observed factors. What would be the key identification assumption would have allow?
Suppose that the outcome of the next election depends on the outcome of the previous election and some observed factors. What would be the key identification assumption would have allow?
What strategy can you use if you feel that there is high degree of manipulation of the forcing variable?
What strategy can you use if you feel that there is high degree of manipulation of the forcing variable?
In Lee's (2008) framework, what is the counterfactual in the regression discontinuinity design?
In Lee's (2008) framework, what is the counterfactual in the regression discontinuinity design?
In the context of Regression Discontinuity (RD) designs, what is the primary risk associated with employing excessively flexible, high-order polynomials to model the relationship between the forcing variable and the outcome?
In the context of Regression Discontinuity (RD) designs, what is the primary risk associated with employing excessively flexible, high-order polynomials to model the relationship between the forcing variable and the outcome?
Consider a sharp Regression Discontinuity design. The model is given by: $y_i = \alpha + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 + \rho D_i + \eta_i$, where $x_i$ is the forcing variable and $D_i$ is the treatment indicator. If the true relationship between $x_i$ and $y_i$ is highly nonlinear and unsmooth, which of the following issues is most likely to arise despite the inclusion of higher-order polynomial terms?
Consider a sharp Regression Discontinuity design. The model is given by: $y_i = \alpha + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 + \rho D_i + \eta_i$, where $x_i$ is the forcing variable and $D_i$ is the treatment indicator. If the true relationship between $x_i$ and $y_i$ is highly nonlinear and unsmooth, which of the following issues is most likely to arise despite the inclusion of higher-order polynomial terms?
In a sharp Regression Discontinuity (RD) design, a researcher estimates the treatment effect using a pooled regression with interaction terms, as represented by the equation (8). Under what condition would constraining the forcing variable functions to be the same on both sides of the cutoff be most likely to yield a substantially different result compared to allowing them to differ?
In a sharp Regression Discontinuity (RD) design, a researcher estimates the treatment effect using a pooled regression with interaction terms, as represented by the equation (8). Under what condition would constraining the forcing variable functions to be the same on both sides of the cutoff be most likely to yield a substantially different result compared to allowing them to differ?
Consider a scenario where a researcher applies a sharp Regression Discontinuity (RD) design to evaluate the impact of a scholarship program on student academic performance. The forcing variable is a standardized test score, with a cutoff determining eligibility. The researcher observes a statistically significant discontinuity at the cutoff using a high-order polynomial regression. However, upon closer examination, they find evidence that students near the cutoff are strategically altering their test-taking behavior to just qualify or just miss the scholarship. What is the most critical threat to the validity of the RD estimate in this context?
Consider a scenario where a researcher applies a sharp Regression Discontinuity (RD) design to evaluate the impact of a scholarship program on student academic performance. The forcing variable is a standardized test score, with a cutoff determining eligibility. The researcher observes a statistically significant discontinuity at the cutoff using a high-order polynomial regression. However, upon closer examination, they find evidence that students near the cutoff are strategically altering their test-taking behavior to just qualify or just miss the scholarship. What is the most critical threat to the validity of the RD estimate in this context?
In the context of Regression Discontinuity designs, what is the fundamental trade-off between using a very narrow bandwidth around the cutoff point and using a wider bandwidth?
In the context of Regression Discontinuity designs, what is the fundamental trade-off between using a very narrow bandwidth around the cutoff point and using a wider bandwidth?
In a fuzzy Regression Discontinuity design, the treatment assignment is imperfectly determined by the forcing variable crossing the threshold. Which of the following econometric techniques is most suitable for consistently estimating the local average treatment effect (LATE) in this scenario?
In a fuzzy Regression Discontinuity design, the treatment assignment is imperfectly determined by the forcing variable crossing the threshold. Which of the following econometric techniques is most suitable for consistently estimating the local average treatment effect (LATE) in this scenario?
A researcher is employing a Regression Discontinuity design to estimate the effect of a new job training program on employment rates. The eligibility for the program is determined by an individual's score on an entrance exam. The researcher notices that individuals with scores just above the eligibility threshold have significantly different baseline characteristics (e.g., education level, prior work experience) compared to those just below the threshold. What is the most appropriate course of action to address this issue and ensure the validity of the RD estimates?
A researcher is employing a Regression Discontinuity design to estimate the effect of a new job training program on employment rates. The eligibility for the program is determined by an individual's score on an entrance exam. The researcher notices that individuals with scores just above the eligibility threshold have significantly different baseline characteristics (e.g., education level, prior work experience) compared to those just below the threshold. What is the most appropriate course of action to address this issue and ensure the validity of the RD estimates?
Consider a scenario where a researcher is using a sharp Regression Discontinuity design to examine the effect of a policy intervention at a specific threshold. However, the density of observations is not continuous around the threshold, exhibiting a noticeable 'jump' or discontinuity. What is the most likely explanation for this phenomenon, and what does it imply for the validity of the RD design?
Consider a scenario where a researcher is using a sharp Regression Discontinuity design to examine the effect of a policy intervention at a specific threshold. However, the density of observations is not continuous around the threshold, exhibiting a noticeable 'jump' or discontinuity. What is the most likely explanation for this phenomenon, and what does it imply for the validity of the RD design?
In the context of Regression Discontinuity (RD) designs, under what specific circumstances would a local linear regression approach around the cutoff be preferred over a global polynomial regression of higher order?
In the context of Regression Discontinuity (RD) designs, under what specific circumstances would a local linear regression approach around the cutoff be preferred over a global polynomial regression of higher order?
A researcher aims to leverage a Regression Discontinuity design to ascertain the impact of a specialized educational program on students' standardized test outcomes. Program admission hinges on an entrance exam score, marking the forcing variable. The analyst posits that the causal parameter estimation is susceptible to bias stemming from the non-random sorting around the cutoff during exams. Which of the following methodologies aims to deal with individuals manipulating their actual score to be either above or below a certain threshold?
A researcher aims to leverage a Regression Discontinuity design to ascertain the impact of a specialized educational program on students' standardized test outcomes. Program admission hinges on an entrance exam score, marking the forcing variable. The analyst posits that the causal parameter estimation is susceptible to bias stemming from the non-random sorting around the cutoff during exams. Which of the following methodologies aims to deal with individuals manipulating their actual score to be either above or below a certain threshold?
Flashcards
Regression Discontinuity (RD)
Regression Discontinuity (RD)
A design that exploits knowledge of rules determining treatment, creating experiment-like conditions.
Basic RD Idea
Basic RD Idea
Considering a threshold where similar individuals have different outcomes.
RD Analogy
RD Analogy
Treating above-threshold and below-threshold groups like treatment and control groups.
RD Example: College Financial Aid
RD Example: College Financial Aid
Signup and view all the flashcards
RD Example: School Class Size
RD Example: School Class Size
Signup and view all the flashcards
RD Example: Union Elections
RD Example: Union Elections
Signup and view all the flashcards
Sharp Regression Discontinuity
Sharp Regression Discontinuity
Signup and view all the flashcards
Non-Attainment Classification
Non-Attainment Classification
Signup and view all the flashcards
Problem with Standard Regression
Problem with Standard Regression
Signup and view all the flashcards
Sharp Regression Discontinuity (RD)
Sharp Regression Discontinuity (RD)
Signup and view all the flashcards
Treatment Rule in Sharp RD
Treatment Rule in Sharp RD
Signup and view all the flashcards
Forcing Variable (in RD)
Forcing Variable (in RD)
Signup and view all the flashcards
Threshold (x0) in RD
Threshold (x0) in RD
Signup and view all the flashcards
Incumbency Advantage
Incumbency Advantage
Signup and view all the flashcards
Incumbency Resources
Incumbency Resources
Signup and view all the flashcards
Causal Effect of Incumbency
Causal Effect of Incumbency
Signup and view all the flashcards
Polynomial Modeling in RD
Polynomial Modeling in RD
Signup and view all the flashcards
Variables in Lee (2008) Model
Variables in Lee (2008) Model
Signup and view all the flashcards
Differing Functions in Sharp RD
Differing Functions in Sharp RD
Signup and view all the flashcards
Pooled Regression in RD
Pooled Regression in RD
Signup and view all the flashcards
Problem with Flexible Functions
Problem with Flexible Functions
Signup and view all the flashcards
Importance Functional RD
Importance Functional RD
Signup and view all the flashcards
Nonlinearity Problem in RD
Nonlinearity Problem in RD
Signup and view all the flashcards
Non Parametric RD
Non Parametric RD
Signup and view all the flashcards
Intuition of RD-design
Intuition of RD-design
Signup and view all the flashcards
Correct specification RD
Correct specification RD
Signup and view all the flashcards
Lee (2008) RD Study
Lee (2008) RD Study
Signup and view all the flashcards
Election Winner Indicator (Di)
Election Winner Indicator (Di)
Signup and view all the flashcards
Vote Share Margin (xi)
Vote Share Margin (xi)
Signup and view all the flashcards
Treatment Variable (Di)
Treatment Variable (Di)
Signup and view all the flashcards
Forcing Variable (xi)
Forcing Variable (xi)
Signup and view all the flashcards
RD Plot Interpretation
RD Plot Interpretation
Signup and view all the flashcards
RD Placebo Test
RD Placebo Test
Signup and view all the flashcards
Di as deterministic function
Di as deterministic function
Signup and view all the flashcards
RD core idea
RD core idea
Signup and view all the flashcards
Flexible Function f(xi)
Flexible Function f(xi)
Signup and view all the flashcards
RD estimation
RD estimation
Signup and view all the flashcards
Limit in RD Estimation
Limit in RD Estimation
Signup and view all the flashcards
RD Example: Close Elections
RD Example: Close Elections
Signup and view all the flashcards
Data Requirements in RD
Data Requirements in RD
Signup and view all the flashcards
Study Notes
Regression Discontinuity (RD) Overview
- Regression discontinuity designs are considered by some to be the closest one can get to randomized experiments in social sciences.
- The RD design has gained traction in varied fields like labor economics, crime studies, education, environmental science, and health economics.
- RD relies upon explicit understanding of the rules that dictate treatment.
- The basis of RD lies in the concept that certain rules in a rule-based system are somewhat arbitrary, which presents valuable experimental opportunities.
- RD can be implemented in two ways: "sharp" and "fuzzy."
RD Introduction
- The principle behind RD is straightforward.
- RD considers a threshold where marginal individuals exhibit very different outcomes.
- Run regression based on a situation where there seems to be discontinuity in the outcomes at the threshold.
- Above-threshold and below-threshold cases should be treated as treatment and control groups respectively, akin to a standard experiment.
RD Roadmap
- Encompasses examples of RD.
- Addresses "sharp" regression discontinuity.
- Discusses parametric and non-parametric RD methods.
- Examines threats to the RD design.
- Mentions "fuzzy" RD.
- Usage of graphs in RD is covered.
RD Examples
- Examples of discontinuity exist throughout society.
- College financial aid in the US, specifically the PSAT/NMSQT, offers a clear instance, noting that generally the top 16,000 scorers secure a scholarship.
- Even a slight difference in test scores can lead to a significant difference in scholarship awards.
- One can study the causal effect of scholarship on college enrollment.
- Maimonides' Rule in Israel, which stipulates a class cannot have more than 40 students, presents another instance.
- A school with 40 students will have one class, but 41 students will be split into two, with approximately 20-21 students each.
- The causal effect of class size on study results is also a subject of interest here.
- In union elections, the NLRB (US National Labor Relations Board) arranges an election if workers wish to unionize.
- Vote outcomes determine whether an employer has to acknowledge the union (51% +) or not (50% or less).
- You can study questions related to the influence of unionization on business metrics like survival, employment figures, output measurements, and wages.
- Air quality regulations in the US, guided by the Clean Air Act's National Ambient Air Quality Standards, classify counties as "non-attainment" if the mean concentration of 5 specific pollutants goes over 75 micrograms/.
- It is common to study the link between pollution and house prices.
- Applying standard regression analysis might pose challenges.
- The general problem is that test scores aren't random, and neither is class size or air pollution.
- A child in general isnt much different between being in the 94.9th percentile of test scores vs the 95th percentile.
- Is there are real different between a school with 40 kids vs one with 41 kids?
- Randomness is likely to manifest around the threshold.
Sharp RD (formally)
- Treatment Di is viewed functions as a discontinuous of an continuous variable xi.
- Given a rule, the treatment Di will happen once the variable xi is past the threshold xo.
- The treatment rule is this; Di = Di(xi) = 1[xi > xo]
- 1[xi > xo] functions as an indicator; giving a value of one if the criteria within the brackets is met.
- If the value of xi (forcing variable) is greater or equal to the threshold xo, then D₁ = 1.
- Similarly, if the value of xi is less than xo, then D₁ = 0.
- The study includes a "forcing" variable xi – an "assignment" or "running" variable.
- The researcher is aware of the threshold value xo.
Regression Discontinuity Example (Lee 2008)
- Lee (2008) explored if having a Democratic candidate for a seat in the US House of Representatives has an advantage when the party previously won the seat.
- Those already in office can have factors like heightened voter satisfaction or enhanced get-out-the-vote efforts.
- The question of representatives leveraging their office's resources to gain advantage comes to mind.
- There is interest as to the causal effect of incumbency on retaining their seat in congress.
- Lee (2008) studied how the likelihood of election relates to prior share of votes.
- The premise is that the determinant for winning an election is: D₁ = 1 if xi ≥ 0, and D = 0 if not. The vote share margin (the difference between Democrat and Republican vote share) is the variable Xi.
- With Xi as the forcing and Di ad the treatment variable.
- This plots Democrats winning against the difference in vote share from prior elections.
- The probability of increasing the difference in vote share has a sharp jump at 0.
RD Formalized
- The model used Yi = a + βxi + pDi + Ni
- The difference between this model and prior ones lies in Di becoming a deterministic function of Xi or exogenous covariate.
- To get the causal effect, RD distinguishes between the nonlinear and discontinuous function 1[x; > xo], from the smooth function Xi.
Sharp RD with Flexible Functions
- It is important to use a flexible, smooth function to estimate the RD model as: yi = a + f(xi) + pDi + Ni.
- If the function f(xi) is consistent in xo's neighborhood, the model estimate is still possible, even with a varied function.
- Use a pth-order polynomial to estimate the model.
- You can write the model as: yi = a + B₁xi + B2x² + ··· + Bpx² + pDi + Ni
RD: Separate functions for each cutoff
- The practice is to allow the variables to differ on each sides of the cutoff point: f0(xi) and f1(xi).
- E[Y0i|xi] = α0 + β01xi + β02x²i + ··· + β0px?i
- E[Y1i|xi] = α1 + ρ + β11xi + β12x²i + ··· + β1px?i, where Xi = xi - Xo.
- Run regression on both sides that will estimate the treatment effect.
- This can be written as: Yi = a + Bo1xi + Bo2x² + ... + Bopx² + pDi + B1 Dixi + B2 Dix² + B*p Dix?
Limitations of Sharp RD
- Problems can exist with too much flexibility, that may cause a discontinuity.
- Imbens & Gelman suggest to not use high-order polynomials in Regression Discontinuity Designs.
- When implementing the RD design, use square, linear and local fits.
- Given the need to accurately define the forcing variable, an alternate exists.
- The parametric RD serves as an example where the choice of the form is critical.
- The jump that is assumed to be treatment can be un accounted nonlinearity.
- Looking at data only very close to discontinuity is an alternative.
Non-parametric RD
- Utilizing data especially proximal to the discontinuity enhances RD design intuition.
- People just above or below a threshold have likeness, with them being "treated" because of exposure to the said threshold.
- One assesses data within a scope of the threshold such as [xo - δ, xo + δ] for a "δ" with small value.
- One comparison that can be done is, E[yi|xo - 8 < xi <xo]~E[yo|xi = xo]
- This can be also be done; E[yi|xo < xi < xo + δ]~E [y1|xi = xo].
- This can be writted as lim δ→0 E[yi|xo < xi < xo + δ] - E[yi|xo - δ < xi < xo] = E[y1i - yo|xi = xo]
- The comparison exists between average outcomes from both left and right's immediate neighborhoods.
- Freeing you from reliance on model specification.
- This also boils down to comparing individuals at both sides of the threshold.
Non-parametric RD: Lee(2008) Example
- To test the probability of those that lost, against the likelihood of an re-election from both candidates.
- Compare the candidate that either barely won, or barely lost.
- With that being said, the candidates are likely to be very similar.
Non-parametric RD- Issues
- One problem that may come up are data requirements.
- It may be hard that you want observations only close ot the threshold!
- The bandwidth can also be increased, such as setting a new "δ", but that will eventually compare observations further, and with the issues, less comparable.
Parametric / Non-Parametric RD?
- Because of the issues with data, a parametric RD may be an option.
- Focusing on that cut off however is still important.
- As the selected window gets small, and data estimates also get smaller, models that need to be able to describe f(xi) should also go down.
- When the observation zeros is near that of xo, Di, which is the result' should have a stable nature.
Threats to RD Design, "Manipulation"
- Employers in the Netherlands are given a tax deduction if a worker over 40 does on the job training, or school.
- A research is conducted to see if this increases the amount of training workers.
- The benefit only occurs through age, which is a sharp Regression-Discontinuity.
- D₁ = 1 if Si >= S / D₁ = 0 if Si less than S
- With Si being the age of said individual.
- With S being the age of 40.
- Employers may shift on the fact that there will be a drop in the age of 39 because there will be less workers to incentivize!
- The RD design may be in valid if individuals act on said "assignment variable"
- Individuals may manipulate the treatment.
- Individuals may shift the test of the score X, those that pass Xo (score + merit award) may be different from the individuals that didnt.
Notes On Sharp RD:
- No need for control because not important covariates exist after the cutoff xo
- IF covariates are very different , design is wrong!
- To avoid having an wrong design switch yi with some covariates.
- Effect will be observed near the threshold, there are some mistakes in the design.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.