Podcast
Questions and Answers
In the context of causal inference, what does the potential outcome $y_{1i}$ represent?
In the context of causal inference, what does the potential outcome $y_{1i}$ represent?
- The outcome for individual _i_ if they receive the treatment. (correct)
- The outcome for individual _i_ if they do not receive the treatment.
- The outcome for individual _i_ regardless of whether they receive the treatment.
- The average outcome for all individuals in the study.
In the Hanna et al. (2017) study, what was identified as the 'treatment' when examining Jakarta's high-occupancy vehicle (HOV) restrictions?
In the Hanna et al. (2017) study, what was identified as the 'treatment' when examining Jakarta's high-occupancy vehicle (HOV) restrictions?
- The lifting of the HOV restriction. (correct)
- Increased enforcement of HOV restrictions.
- The average commute time of drivers in Jakarta.
- The number of vehicles on Jakarta's roads during peak hours.
In the study of Jakarta’s HOV restrictions, which of the following best exemplifies the 'population' of interest?
In the study of Jakarta’s HOV restrictions, which of the following best exemplifies the 'population' of interest?
- All residents of Jakarta, including those who do not drive.
- Drivers on trips who might use the affected routes. (correct)
- Companies operating vehicle fleets in Jakarta.
- All drivers globally.
What serves as a counterfactual in the Hanna et al. (2017) study of Jakarta's HOV restrictions?
What serves as a counterfactual in the Hanna et al. (2017) study of Jakarta's HOV restrictions?
How is the outcome of interest defined in the study examining the impact of Jakarta's high-occupancy vehicle (HOV) restriction?
How is the outcome of interest defined in the study examining the impact of Jakarta's high-occupancy vehicle (HOV) restriction?
In the context of examining the relationship between age and income, why might simply relying on summary statistics like correlation ($\rho$) and OLS regression estimates ($\beta_0$, $\beta_1$) be insufficient?
In the context of examining the relationship between age and income, why might simply relying on summary statistics like correlation ($\rho$) and OLS regression estimates ($\beta_0$, $\beta_1$) be insufficient?
When analyzing the relationship between two variables, such as age and income, what is the primary advantage of using nonparametric estimates for $E[Y|X=x]$ compared to relying solely on linear regression?
When analyzing the relationship between two variables, such as age and income, what is the primary advantage of using nonparametric estimates for $E[Y|X=x]$ compared to relying solely on linear regression?
Suppose you are analyzing the relationship between education level (X) and annual salary (Y). An OLS regression yields the equation $\hat{Y} = 20000 + 5000X$. What does the coefficient 5000 represent?
Suppose you are analyzing the relationship between education level (X) and annual salary (Y). An OLS regression yields the equation $\hat{Y} = 20000 + 5000X$. What does the coefficient 5000 represent?
What does a correlation coefficient ($\rho$) of 0.28 between two variables X and Y suggest?
What does a correlation coefficient ($\rho$) of 0.28 between two variables X and Y suggest?
Why might adding a small amount of random noise to a scatter plot of data points be useful?
Why might adding a small amount of random noise to a scatter plot of data points be useful?
Why is informal reasoning generally discouraged in serious economic research when constructing counterfactuals?
Why is informal reasoning generally discouraged in serious economic research when constructing counterfactuals?
In the context of causal research, what is the primary purpose of constructing a counterfactual?
In the context of causal research, what is the primary purpose of constructing a counterfactual?
A researcher is studying the impact of a new job training program on employment rates. Which approach would involve comparing the employment outcomes of individuals who participated in the program with those of a similar group who did not?
A researcher is studying the impact of a new job training program on employment rates. Which approach would involve comparing the employment outcomes of individuals who participated in the program with those of a similar group who did not?
What is the fundamental assumption when using a control group to establish causality?
What is the fundamental assumption when using a control group to establish causality?
Which of the following methods for constructing counterfactuals involves using a quantitative model to simulate alternative scenarios?
Which of the following methods for constructing counterfactuals involves using a quantitative model to simulate alternative scenarios?
A researcher aims to evaluate the impact of a new agricultural technique on crop yield. To do this, they compare fields where the technique was applied with similar fields where it was not. What is a critical assumption the researcher must make to ensure the validity of their causal inference?
A researcher aims to evaluate the impact of a new agricultural technique on crop yield. To do this, they compare fields where the technique was applied with similar fields where it was not. What is a critical assumption the researcher must make to ensure the validity of their causal inference?
A city implements a new policy aimed at reducing traffic congestion during peak hours. To assess the policy's effectiveness, transportation officials compare traffic flow during peak hours after the policy was implemented with traffic flow during the same hours before the policy. What unaddressed factor could undermine the validity of the findings?
A city implements a new policy aimed at reducing traffic congestion during peak hours. To assess the policy's effectiveness, transportation officials compare traffic flow during peak hours after the policy was implemented with traffic flow during the same hours before the policy. What unaddressed factor could undermine the validity of the findings?
What is the primary challenge in causal inference when trying to determine the treatment effect for an individual?
What is the primary challenge in causal inference when trying to determine the treatment effect for an individual?
What does the Average Treatment Effect (ATE) represent?
What does the Average Treatment Effect (ATE) represent?
The Average Treatment Effect on the Treated (ATT) is defined as:
The Average Treatment Effect on the Treated (ATT) is defined as:
Why might the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATT) differ?
Why might the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATT) differ?
What is the primary purpose of constructing counterfactuals in causal research?
What is the primary purpose of constructing counterfactuals in causal research?
A researcher is studying the effect of a new teaching method on student test scores. They find that students who were taught with the new method (the treated group) scored significantly higher than students taught with the traditional method. However, students in the treated group were also more motivated and had access to better resources. What validity issue does this study likely face?
A researcher is studying the effect of a new teaching method on student test scores. They find that students who were taught with the new method (the treated group) scored significantly higher than students taught with the traditional method. However, students in the treated group were also more motivated and had access to better resources. What validity issue does this study likely face?
A study finds a significant Average Treatment Effect (ATE) of a job training program in a specific city. However, when policymakers attempt to implement the same program in a rural area with a different demographic, they observe minimal impact. What type of validity is most likely compromised in this scenario?
A study finds a significant Average Treatment Effect (ATE) of a job training program in a specific city. However, when policymakers attempt to implement the same program in a rural area with a different demographic, they observe minimal impact. What type of validity is most likely compromised in this scenario?
In a study examining the impact of a new drug on blood pressure, researchers use a randomized controlled trial. After analyzing the data, they find a statistically significant reduction in blood pressure for the treatment group compared to the control group. However, some participants in the control group also started exercising regularly, which could also lower blood pressure. What is the most appropriate next step for the researchers?
In a study examining the impact of a new drug on blood pressure, researchers use a randomized controlled trial. After analyzing the data, they find a statistically significant reduction in blood pressure for the treatment group compared to the control group. However, some participants in the control group also started exercising regularly, which could also lower blood pressure. What is the most appropriate next step for the researchers?
Flashcards
Marginal Distribution
Marginal Distribution
A distribution that examines the probabilities of a single variable, disregarding others.
Conditional Distribution
Conditional Distribution
A distribution showing the probability of a variable given the value of another variable.
Conditional Expectation Function
Conditional Expectation Function
Shows the expected value of a variable, given the value of another variable.
Covariance and Correlation
Covariance and Correlation
Signup and view all the flashcards
Ordinary Least Squares (OLS) Regression
Ordinary Least Squares (OLS) Regression
Signup and view all the flashcards
Treatment (Hanna et al., 2017)
Treatment (Hanna et al., 2017)
Signup and view all the flashcards
Counterfactuals (Hanna et al., 2017)
Counterfactuals (Hanna et al., 2017)
Signup and view all the flashcards
Population (Hanna et al., 2017)
Population (Hanna et al., 2017)
Signup and view all the flashcards
Outcome (Hanna et al., 2017)
Outcome (Hanna et al., 2017)
Signup and view all the flashcards
Treatment Status (Di)
Treatment Status (Di)
Signup and view all the flashcards
Counterfactual
Counterfactual
Signup and view all the flashcards
Informal Reasoning
Informal Reasoning
Signup and view all the flashcards
Structural Models
Structural Models
Signup and view all the flashcards
Control Groups
Control Groups
Signup and view all the flashcards
Causal Questions
Causal Questions
Signup and view all the flashcards
Constructing Counterfactuals
Constructing Counterfactuals
Signup and view all the flashcards
Treatment Effect
Treatment Effect
Signup and view all the flashcards
Treatment Effect (Individual)
Treatment Effect (Individual)
Signup and view all the flashcards
Average Treatment Effect (ATE)
Average Treatment Effect (ATE)
Signup and view all the flashcards
ATE for the Treated (ATT)
ATE for the Treated (ATT)
Signup and view all the flashcards
E[a|b]
E[a|b]
Signup and view all the flashcards
Internal vs. External Validity (Treatment Effect)
Internal vs. External Validity (Treatment Effect)
Signup and view all the flashcards
ATE vs. ATT: Why the difference?
ATE vs. ATT: Why the difference?
Signup and view all the flashcards
Study Notes
- The lecture focuses on causality, potential outcomes, and research design
- Today's learning objectives include understanding causality, counterfactuals, potential outcomes, treatment effects, and selection bias
- A key objective is understanding how randomization eliminates selection bias
Logistics
- Homework 1 deadline has been extended to Thursday, January 16 at 23:59
- Homework 2 is posted on MyCourses, due next Wednesday, January 22
- Stata code for previous examples is available on MyCourses under "More Materials"
Quick Recap
- Joint distributions and associations between variables covered: marginal and conditional distribution, conditional expectation function, scatter plots, covariance/correlation, regression, and OLS
Association Between Age and Income
- How income varies with age can be visualized using a scatter plot
- Let's use measures of dependence
- The correlation between income and age is 0.28
- Estimating a regression of income (Y) on age (X) yields estimates for the intercept (Bo = 10,654) and age coefficient (B1 = 297)
- The estimates are in euros, while the y-axis is in thousands of euros
- These summary statistics are not very helpful
Flexibility of Regression
- Multivariate regression model can provide a more flexible fit: Y = Bo + B1X + B2X^2 + error
- Estimates that fit the data best are: Bo = -37,549, B1 = 2.857, B2 = -31
- In general, looking at the data in several ways is good
- Correlation measures linear dependence of two variables
- "Goodness of fit" is measured with multiple variables
Measuring How Well a Model Fits the Data
- Coefficient of determination (R^2) is typically used
- For a regression model Y = f(X) + ε where X is a vector of independent variables: R^2 = Σ(f(Xi) - Ȳ)^2 / Σ(Yi - Ȳ)^2
- R^2 measures the variability of the dependent variable
- An R^2 of 1 means a perfect prediction
Causal Questions
- Prior lectures focused on descriptive questions such as "What is the joint distribution of X and Y?" to measure the actual state of the world
- Often there is a need to evaluate X on Y impacts, like: education on earnings, marketing on sales, carbon tax on emissions, R&D on innovation, or fiscal stimulus on unemployment
- Causal questions are about comparing counterfactual states, like "how would Y change if we changed X?"
- Y is the outcome, X is treatment
Counterfactual States
- Counterfactual states are almost impossible to observe for any single individual/entity
- Everything else remains the same except the treatment (ceteris paribus)
- Possible with lab experiments in natural sciences
- More challenging when studying people
- Counterfactuals can be found for the average person in a sample
Identifying Causal Relationships via Experiments
- The lecture focuses on answering causal questions using experimental designs
- It is helpful to design comparisons to test for causality
- It can be helpful to consider the ideal experiment
- There's helpful benchmark for naturally-occurring/quasi experiments
- Natural experiments involving randomization will be discussed next week
Elements of Causal Questions
- (1) Treatment: Impact of
- (2) Counterfactual: Impact in comparison to
- (3) OUtcome: Impact on
- (4) Population: Impact for
- Worksheet (WS) 3.1: Think of a causal question and write it down
- Impact question from Jakarta on high-occupancy vehicle restriction vs unrestrivted road travel on travels travel times?
The Causal Question
- What is the impact of Jakarta's high-occupancy vehicle restrictions on drivers travel times with unrestructived road travel?
- Treatment: Lifting of the HOV rerstriction
- Counterfactuals:
- State of world prior to the treatment
- Google's prediction under "typical traffic conditions"
- Population: drivers taking those routes
- Outcome: the delay per km travelled
Potential Outcomes
- Focus is on binary (0/1) treatments, denoting the treatment status of individual i
- Dᵢ = 1 if she receives the treatment, 0 if she does not
- Outcomes are denoted by y
- Potential outcome = y1i if Di = 1, y0i if Di = 0
- y1i, is the outcome of individual i who has been treated
- y0i is her outcome who has not been treated
Treatment Effect
- The treatment effect for individual i is the different between y1i and y0i
- Causal inference prohibits observing both yᵢ1 and y0i for a unit
Average Treatment Effect
- The treatment effect for an individual cannot be identified
- But average treatment effects can be estimated
- ATE = E[y1i - yOi]
- ATT = E[y1i - y0i | Di = 1]
- Why ATE and ATT matter: The treatment effect may be different depending on those who get the treatment
- Internal validity: Do we learn the true effect for the population that's being treated?
- External validity: Can extrapolate to other populations?
Approaches for Constructing Counterfactuals
- Causal questions need what would've happened for the treatment group
- Researchers construct a counterfactual to determine causation
- Approaches for constructing the counterfactual include:
- Informal reasoning with guesses not allowed for economics!
- Structural models that use quantative models to construct alternative states of the world
- Control groups that compare treatment group with similar control group
Research Designs and Control Groups
- To approximate what would have happened to the treated group without the treatment we use the comparative control group tool
- In economics, this design/experimental approach estimates the counterfactual E[y0i|Dᵢ=1]
- Invalid control groups leads to selection bias
- Whether control groups provides a factual counterfactual is key
How to Find Control Group
- Hanna et al., include a few different routes to consider
Regression Estimation
- Dependant/outcome variable travel/delay on segment, on date d and departure hour h
- Independent/explanatory: indicator for whether variable d is after policy learning, can be shown as
- Postd = 0: group "control" lifts before policy
- Postd = 1: group "treatment" lifts after policy
How Good is the Counterfactual?
- What if the event is intended to coincide with changes in outcomes, as opposed to the changes being caused by the treatment?
- What would outcomes have been in abscence of Policy
- Can you average delay at a?
- Key assumption: the observations treated would resemble control observations without treatment
- WS 3.3: to answer causal questions using data, what is a reasonable "control group" for treatment?
Selection Bias
- As the data amounts increase, the average samples approximate the population average
- Avg[yi|D = 1] - Avg [yi | D = 0] => E[y;|D = 1] - E[y;|D = 0] * treatment group, = Control vs ATE and ATT
Randomized Selection
- Randomly assign people, creating controlled and unbiased groups
- Potential outcomes are: same expectation
- control tells us without treatment
- WS# 3.4. Identify issues when there is selection for given controls
Summary
- Causality requires comparison of counterfactual states
- Only one is observed
- Control groups can only ifere the treatment group with absence of treatment.
- Selection bias occurs when treatments not comparable
- With expectation only differences being groups recieving treatments is a part of randomization to eliminate selection bias
Upcoming
- Pre-class assignment 4 which includes summarization and reading of an article
- Homework 2 is due Jan 22 at time 14:00
- Tips: Dont wait to the last minute, skill build to time work with data set
- Session 2 is tomorrow
- Help for course can be found using "Zulip"
- participation is incentive
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Questions cover potential outcomes in causal inference. It specifically refers to Hanna et al. (2017) study on Jakarta's high-occupancy vehicle (HOV) restrictions, counterfactuals, and limitations of linear regression.