ECOM216 Applied Econometrics with R - 2024-25 PDF
Document Details
Queen Mary University of London
2024
QMUL
Dr. Pedro CL Souza
Tags
Related
- Applied Econometrics Lecture 1 PDF
- Applied Econometrics Regression Diagnostics & Complications PDF
- Applied Econometrics Endogeneity & Instrumental Variables Lecture 3 PDF
- Applied Econometrics GARCH Models Lecture Handout PDF
- Applied Econometrics VAR Models Lecture Handout 10 Autumn 2022 PDF
- EEP/IAS 118 Introductory Applied Econometrics, Section 1 PDF
Summary
Welcome to the Applied Econometrics with R module, 2024-2025, offered by Queen Mary University of London. This module provides a solid foundation in applied econometric methods, equips students with practical experience using R, and covers various topics.
Full Transcript
1. Welcome! (and Introduction to R) ECOM216 - Applied Econometrics with R Dr. Pedro CL Souza Queen Mary University of London School of Economics and Finance 2024-25 Welcome...
1. Welcome! (and Introduction to R) ECOM216 - Applied Econometrics with R Dr. Pedro CL Souza Queen Mary University of London School of Economics and Finance 2024-25 Welcome 1 / 46 The module Welcome! , What is this module about? Solid foundation in applied econometric methods, equipping students with the essential skills to analyze data, make informed decisions, and prepare for more advanced studies in econometrics and finance. Hands-on experience using econometric tools in R, with a practical emphasis on solving real-world problems. Welcome 2 / 46 The module Solid foundations in applied econometric methods Present key methods in applied econometrics (both micro and macro) Most useful, state of the art methods in the identification of causal impacts of policies Several methods such as randomised controlled trials, causal regressions, differences-in-differences Time series, model selection and prediction Strong emphasis on the interpretation of the econometric models Foundation for further study Welcome 3 / 46 The module Hands-on experience in R Applied work is hard! To get answers to questions, we need not just to understand the methods, but also to know how to implement them Just like learning how to drive a car, the theoretical understanding of the methods does not immediately mean that we know how they work in practice In this sense, experimenting with methods and coding in R are integral parts of the module Welcome 4 / 46 The module Hands-on experience in R Assume no prior knowledge of R Learning a computer language requires an active effort by the learner All lectures will have code books associated to them used in lectures, tutorials and for homework § 01 - Welcome and Introduction to R.R with fully reproducible examples and real data for real questions Welcome 5 / 46 The module The module dual purpose (methods + coding) is exciting but challenging Lectures: focused mostly on the theory and methods, with a good pinch of live coding, real coding and examples During lectures, we will flick between slides and code notebook as required Tutorials: focused mostly on the coding aspect. Clarify and expand on the codes covered in lectures Introduce new coding tools and methods not covered in lectures Opportunity to interact and ask questions Welcome 6 / 46 The module Part I - General concepts (2 lectures) Introduction to R, core R, tidyverse and main tools Loading and manipulating data Regression analysis with lm and fixest and interpretation Review of some of the OLS basic properties Exogeneity and causality Welcome 7 / 46 The module Part II - Causality (4 lectures) How to make regressions be interpretable as a causal effect Rubin causal model, the randomisation ideal, implementation in regression format Regressions with control and conditional version of the Rubin’s model From controls to fixed effects and matching on observables Differences in differences - identification, basics and upgraded regressions Staggered treatment, dynamics, and continuous treatment Welcome 8 / 46 The module Part III - Forecasting & time series models (4 lectures) Introduction to time series and stationarity Autocorrelation function and partial autocorrelation function Autoregressive and moving-average processes Dickey-Fuller tests for stationarity ARMA models, seasonal terms and external variables Box-Jenkins methodology for model selection Welcome 9 / 46 The module Assessment 80% exam 20% online quiz via QMplus either in week 8 or week 9 specific date to be announced by the student office Is coding examinable? Yes, within reason Coding is an integral part of the module May ask you to read, interpret the purpose of the code, or outcomes such as tables and figures (e.g. and determine next steps in the analysis) May ask you to suggest coding strategies to deal with a particular problem or question Will not ask you to write live coding Welcome 10 / 46 The module Feedback and support Office hours on Tuesdays at 4pm, additional times to be released if necessary Booking required and sessions held via Zoom https://calendar.app.google/Wnc6SacM4KiUJTDZ9 Think ahead around busy periods (e.g. quiz or examination times) Heavily reformulated module: feedback is increadibly useful Email: [email protected] Welcome 11 / 46 Introduction to R Welcome 12 / 46 What is R? “R is a free software environment for statistical computing and graphics” https://www.r-project.org/ https://cran.r-project.org/ Born circa 1997 in Statistics Department of the University of Auckland, and as a successor to S Open-source, free, user-supplied, strong and very active community Over 20,000 packages (!) Some packages create their own syntaxes or add important features If R is a language, then the packages can be dialects or accents we must be fluent on Welcome 13 / 46 What is R? “R is a free software environment for statistical computing and graphics” https://www.r-project.org/ https://cran.r-project.org/ Born circa 1997 in Statistics Department of the University of Auckland, and as a successor to S Open-source, free, user-supplied, strong and very active community Over 20,000 packages (!) Some packages create their own syntaxes or add important features If R is a language, then the packages can be dialects or accents we must be fluent on Welcome 14 / 46 What is R? R is a living language Pros: user-contributed and strong community ensures fast pace of adoption and latest up to date econometric and data science methods No company owns the codes (free and open source!) and you will be able to use it wherever you choose Yet, with some quality-check to be included in CRAN and be widely available Cons: user-contributed also means that there many moving parts Packages may develop their own sintaxes Package updates may change the functionality of the code Welcome 15 / 46 What is R? We will use mainly two packages tidyverse: collection of packages designed for data science https://www.tidyverse.org/ Various functionalities on top of core R, such as plotting, data handling and tidying, data readers, text and data handling It’s a-ma-zing! fixest: package for efficient estimation of large number of models including high-dimensional fixed effects https://cran.r-project.org/web/packages/fixest/vignettes/fixest_walkthrough.html Fantastic contribution by Laurent Berge and others We may use other packages as well Welcome 16 / 46 What is R? To install... 1. Install base R for your operating system https://www.r-project.org/ Text 2. Install RStudio desktop https://posit.co/download/rstudio-desktop/ 3. Packages will be installed from within R itself with the install.packages() command Suggest that you install the full R suite in your own devices as soon as possible. Welcome 17 / 46 What is R? Go to: § 01 - Welcome and Introduction to R.R up to the message box “return to lecture slides.” Welcome 18 / 46 Understanding Society Understanding Society dataset is a nationally-representative longitudinal survey, collected from the same individuals over many years Demographics, household composition, wellbeing and life satisfaction, occupation, employment, politics, environment habits,... Currently in Wave 13 We will use the teaching dataset with waves 1-9 (up to 2019) - Teaching dataset uses a subset of the data and variables - Harmonised definitions and questionnaire - Cleaned data - No sampling distortions due to COVID-19 Welcome 19 / 46 Understanding Society Following points are guidelines about how to access the data, but the process may vary slightly Go to https://ukdataservice.ac.uk and you will find the following page where you can search for “Understanding Society Teaching Dataset” (about 20 matches) or directly for the dataset that we are going to use which has index code “SN 8715.” Welcome 20 / 46 Understanding Society Select “Understanding Society: Longitudinal Teaching Dataset, Waves 1-9, 2009-2018” with code SN 8715. Welcome 21 / 46 Understanding Society This is the page for the specific dataset. Click on “Access data”. Note that the data is “safeguarded” which means that you must accept the terms and conditions to access it Welcome 22 / 46 Understanding Society Add to your account to confirm eligibility Welcome 23 / 46 Understanding Society Select Queen Mary University as your institution Welcome 24 / 46 Understanding Society Assign the data to one of your projects Welcome 25 / 46 Understanding Society You need to create a new project (“My Project”) and select “Non-commercial” as the project type Welcome 26 / 46 Understanding Society And now – finally! – you can download the data Welcome 27 / 46 Understanding Society And you can select the TAB version (tab-separated data) Welcome 28 / 46 Possible Interpretations of an Econometric model Welcome 29 / 46 Interpretation Possible interpretations of an Econometric model. The learning from statistical models can be broadly classified in Predictive or correlational: use existing data to predict or forecast future outcome, generally relying on the correlation between variables or time dependence (forecasting GDP, transaction fraud, crime hotspots) Causal: the effect, repercussion or consequence, of some action or policy on the units (individuals, households,...) that were affected by those policies (the effect of interest rates on inflation and GDP, the effects of welfare programs, the effect of policing on crime) External validity: producing broader learnings that are valid in other settings, either because we learn about mechanisms and test theoretical hypotheses or because we want to extrapolate the findings directly (learning that interest rates decreases inflation is well grounded in theory and a mechanism that is likely widely applicable) Welcome 30 / 46 Interpretation Income - life satisfaction example: predictive or correlational, causal, externally valid? We do not really think that an increase in life satisfaction would necessarily cause an increase in income. Maybe there is a direct effect (e.g. happier people will work better or more efficiently) but the coefficient also conflates - There may be third factors that are both associated to life satisfaction and income, such as education, place of living, length of the commute, and the list goes (omitted variable) - It’s possible that the relation is reversed and income increases life satisfaction (reverse causality) It is hard to know if the positive effect comes from the direct/causal effect, omitted variable, or reverse causality ⇒ so, this is interpretable as a correlational exercise Welcome 31 / 46 Use Case: Predictive Models Forecasting GDP: use of satellite data and nightlights to augment official GDP statistics Henderson, Storeygard and Weil (AER 2012) show how nightlights can be used to predict economic activity Korean peninsula between 1992 and 2008 Welcome 32 / 46 Use Case: Predictive Models Forecasting GDP: use of satellite data and nightlights to augment official GDP statistics Bluhm, Krause (JDE 2022) how to predict growth at the city level London Active area of research, e.g. Martinez (2022) shows that autocracies on average overstate GDP growth by 35% Welcome 33 / 46 Use Case: Predictive Models Other use cases: Forecasting inflation and various macroeconomic variables Forecasting asset returns Private sector uses, such as predicting consumer interests and behavior Any other? Welcome 34 / 46 Causal Interpretation What’s different? Clear sense of consequence or repercussion of some actions or treatment For example, a rise in interest rates caused inflation to go down Presence of the police causes crime to decrease QMUL-degree causes a substantial future increase in earnings The effect of unexpected news on financial markets Welcome 35 / 46 Use Case: Causal Models Example: Effect of Liquidity Constraints: He and Le Maire (JF 2023) Key question in economics: effect of financial frictions on outcomes of interest, e.g. labour markets Paper studies the 1992 mortgage reform in Denmark: allowed houseowners to borrow against house equity for non-housing purposes Study shows that the “option to borrow against housing equity enabled liquidity-constrained individuals to move to high-wage jobs and invest in valuable human and physical capital” Wages after the reform by liquidity constrait Welcome 36 / 46 We think counterfactual when we say causal What do we really think about when we think about a causal estimate? We usually think about a counterfactual outcome, a reference point where everything else is equal but one specific aspect of reality changed - What would have happened if those policies have not been implemented? Challenge: isolate policy effects from confounders, selection, unobserved determinants, and so on - Many aspects of the reality are changing at the same time - Failure to do so will generate wrong or biased policy prescriptions By all means, this is not trivial, and rather quite tricky! Welcome 37 / 46 Counterfactual, identification, selection Example: Hospitals makes people healthier? (MHE, 2.1) Most people would think that, yes, hospitals are good for our health National Health Interview Survey, in the U.S.: “During the past 12 months, was the respondent a patient in a hospital overnight?” “Would you say your health in general is excellent, very good, good, fair or poor?” Group Sample Size Mean Health Status Std. Error Hospital 7,774 3.21.014 No hospital 90,049 3.93.003 - Difference in means:.72, highly significant in favor of non-hospitalized - At face value, seems to suggest that hospitals are bad for health! - What is going on? Welcome 38 / 46 Counterfactual, identification, selection Example: Hospitals makes people healthier? (MHE, 2.1) (cont’d) It is possible that people who required hospital care are less healthy to begin with... Comparison between health across the two groups likely not a good measure of the causal impact of hospital on health Better comparison: - Condition on the “hospital” sample: what would have been their health had they not gone? - Condition on the “non-hospital” sample: what would have been their health had they gone? - Also harder as it requires us to input scenarios that never really happened, as we don’t get to observe individuals in the two states Welcome 39 / 46 Towards a causal model Dates back to Neyman (1923), Fischer (1925), then taken on by Rubin (1974, 1975, 1978), Holland (1986). See review in Imbens and Rubin (2015). Two potential outcomes - Yi (1) is the outcome of individual i if they get the treatment - Yi (0) is the outcome of individual i if they do not get the treatment We would ideally measure the treatment effect for individual i as Yi (1) − Yi (0) or alternatively as the ratio Yi (1)/Yi (0). Define Ti = 1 for the individuals i who got the treatment and 0 otherwise The fundamental problem is that we only observe either Yi (1) or Yi (0) ( obs Yi (0) if Ti = 0 Yi = Yi (Ti ) = Yi (1) if Ti = 1 We can’t observe both Yi (0) and Yi (1), so we cannot draw inference on the causal impacts Welcome without further assumptions 40 / 46 Towards a causal model (In most cases) we only observe one state of the world! The following equation describes what we can observe: Yiobs = Yi (1) · Ti + Yi (0) · (1 − Ti ) = Yi (0) + Ti (Yi (1) − yi (0)) where Yi and Ti are observed, but only one of Yi (0) and Yi (1) is not. Welcome 41 / 46 ATE, ATT, ATU and not too LATE Quantities that we might be of interested Average treatment effects on the treated or ATT τatt ≡ E (Yi (1) − Yi (0)|Ti = 1) Average treatment effects on the untreated or ATU τatu ≡ E (Yi (1) − Yi (0)|Ti = 0) Average treatment effects or ATE τate ≡ E (Yi (1) − Yi (0)) Local average treatment effects or LATE: average treatment effects for those that were induced to take the treatment, but wouldn’t otherwise (precise definition to follo win the IV lecture) Welcome 42 / 46 ATE, ATT, ATU and not too LATE, conditional on x And versions conditional to x: τate (xi ) ≡ E (Yi (1) − Yi (0)|xi ) τatt (xi ) ≡ E (Yi (1) − Yi (0)|Ti = 1, xi ) τatu (xi ) ≡ E (Yi (1) − Yi (0)|Ti = 0, xi ) For example - the effect of hospital on health of immunodeficient patients - the effect of training on unemployed workers All of this relies on a non-interference assumption, also known as SUTVA (stable unit treatment value assumption). In this way, one can define one’s own potential outcomes in terms of one’s own treatment allocation only. Welcome 43 / 46 Selection bias Consider the difference in the population between treatment and control D = E (Yi (1)|Ti = 1) − E (Yi (0)|Ti = 0) where D can be calculated from the data However (in red the quantities that cannot be observed from data) D = E (Yi (1)|Ti = 1) − E (Yi (0)|Ti = 0) = E (Yi (1)|Ti = 1) − E (Yi (0)|Ti = 0) + E (Yi (0)|Ti = 1) − E (Yi (0)|Ti = 1) = E ( Yi (1) − Yi (0)| Ti = 1) + E (Yi (0)|Ti = 1) − E (Yi (0)|Ti = 0) | {z } | {z } τatt Selection bias Selection bias: outcomes of untreated are not good counterfactuals for the potential outcome of the treated in the absence of treatment - Highlights the crucial problem in using D for policy evaluations - D uses the non-treated as the counterfactual outcome for treated, in the absence of treatment Welcome 44 / 46 Selection bias The selection bias arises for example when individuals choose if they want to take the treatment or not For example, if we would like to assess if health insurance makes people healthier - Those who get insurance can be more conscious about their health, both having healthier habits and more likely to choose health insurance in the first place E (Yi (0)|Ti = 1) outcomes of those who get health insurance, had they not gotten insurance E (Yi (0)|Ti = 0) outcomes of those who did not get insurance in the first place - Comparison between health yi of insured against non-insured is no longer apples-to-apples – no longer ceteris paribus - Ideally, we would like to measure the effect conditional on individuals’ health preference Welcome 45 / 46 Conclusion Today you learned Core R, tidyverse and basic manipulation tools Quarto files How to load data Running the first regressions in R and interpreting their outputs Welcome 46 / 46