Podcast
Questions and Answers
What does the horizontal distance between the CDFs for different years in the graph represent?
What does the horizontal distance between the CDFs for different years in the graph represent?
- The income of individuals who moved between percentiles during those years.
- The dollar change in income for a specific percentile between those years. (correct)
- Changes in the Gini coefficient over the period represented.
- The percentage change in income for a specific percentile over time.
Based on the income distribution data provided, which of the following statements is most accurate?
Based on the income distribution data provided, which of the following statements is most accurate?
- The income gap between the 90th and 10th percentile increased substantially between 1946 and 2014. (correct)
- Incomes for the bottom 10 percentiles have increased more than those in the top 10 percentiles since 1980.
- Incomes for all percentiles have remained stagnant since 1946.
- The income gap between the 90th and 10th percentile decreased significantly between 1946 and 2014.
What is the most likely reason the graph specifies 'bottom 95 percentiles'?
What is the most likely reason the graph specifies 'bottom 95 percentiles'?
- Data collection is less reliable for the top 5 percentiles.
- The top 5 percentiles experienced no income growth during this period.
- The top 5 percentiles have a disproportionate impact on overall income distribution and might skew the scale. (correct)
- The data is normalized to exclude outliers within the bottom 95 percentiles.
Which data source is most suitable for analyzing long-term trends in income distribution, particularly focusing on top income earners?
Which data source is most suitable for analyzing long-term trends in income distribution, particularly focusing on top income earners?
Assuming the trend continues, what might one expect to observe in the CDF of income distribution for 2030, compared to 2014?
Assuming the trend continues, what might one expect to observe in the CDF of income distribution for 2030, compared to 2014?
What could be a limitation of using tax data to study income distribution?
What could be a limitation of using tax data to study income distribution?
If the 10th percentile income remained constant from 1946 to 2014, while the 90th percentile income increased, what would necessarily be true?
If the 10th percentile income remained constant from 1946 to 2014, while the 90th percentile income increased, what would necessarily be true?
According to the provided data about income distribution, which statement best describes the trend in average annual income growth rates between 1946–1980 and 1980–2018?
According to the provided data about income distribution, which statement best describes the trend in average annual income growth rates between 1946–1980 and 1980–2018?
What does E[w | E = e, G = G] represent in the context of wage analysis, according to Autor (2014)?
What does E[w | E = e, G = G] represent in the context of wage analysis, according to Autor (2014)?
What is the significance of analyzing 'top percent shares' in income distribution studies?
What is the significance of analyzing 'top percent shares' in income distribution studies?
Based on the provided information, which period exhibited a more equitable distribution of income growth across all income percentiles?
Based on the provided information, which period exhibited a more equitable distribution of income growth across all income percentiles?
According to the annual real pre-tax income growth per adult shown, which income group experienced the slowest growth rate during the 1980-2018 period?
According to the annual real pre-tax income growth per adult shown, which income group experienced the slowest growth rate during the 1980-2018 period?
What is a scatter plot primarily used for in the context of empirical analysis?
What is a scatter plot primarily used for in the context of empirical analysis?
What does each 'dot' represent in the described scatter plot of individual income in 2009 and 2010?
What does each 'dot' represent in the described scatter plot of individual income in 2009 and 2010?
What is the key function of conditional expectation in empirical analysis?
What is the key function of conditional expectation in empirical analysis?
How did the income growth of the top 1 percent compare to the average income growth from 1946 to 1980?
How did the income growth of the top 1 percent compare to the average income growth from 1946 to 1980?
What is one potential limitation of a counterfactual analysis that aims to address income inequality?
What is one potential limitation of a counterfactual analysis that aims to address income inequality?
Consider a dataset where individual income in 2015 is plotted against individual income in 2016 using a scatter plot. If most of the points cluster closely around a line with a positive slope, what does this indicate?
Consider a dataset where individual income in 2015 is plotted against individual income in 2016 using a scatter plot. If most of the points cluster closely around a line with a positive slope, what does this indicate?
What does the Pearson correlation coefficient represent?
What does the Pearson correlation coefficient represent?
What is the range of values for the Pearson correlation coefficient?
What is the range of values for the Pearson correlation coefficient?
Given two variables, X and Y, with a positive covariance. What can be inferred about their correlation?
Given two variables, X and Y, with a positive covariance. What can be inferred about their correlation?
Consider two datasets with the same covariance but different standard deviations. How would this affect the Pearson correlation coefficient?
Consider two datasets with the same covariance but different standard deviations. How would this affect the Pearson correlation coefficient?
If the correlation between two variables X and Y is close to 1, what can you infer?
If the correlation between two variables X and Y is close to 1, what can you infer?
Two variables have a Pearson correlation coefficient of -1. Which statement accurately describes their relationship?
Two variables have a Pearson correlation coefficient of -1. Which statement accurately describes their relationship?
A researcher calculates a Pearson correlation coefficient of 1.2 between two variables. What does this indicate?
A researcher calculates a Pearson correlation coefficient of 1.2 between two variables. What does this indicate?
Flashcards
Group Averages
Group Averages
Analyzing averages within specific groups of a population, like income brackets.
Distributional Changes
Distributional Changes
Examining how income changes across the entire spectrum, not just averages.
Income Distribution Extras
Income Distribution Extras
Additional metrics like the share of income held by the top 1% or generational changes in income.
Income Data Sources
Income Data Sources
Signup and view all the flashcards
Variable Income Growth
Variable Income Growth
Signup and view all the flashcards
90/10 Percentile Ratio
90/10 Percentile Ratio
Signup and view all the flashcards
CDF of Income Distribution
CDF of Income Distribution
Signup and view all the flashcards
Percentile Comparison Over Time
Percentile Comparison Over Time
Signup and view all the flashcards
CDF Horizontal Distance
CDF Horizontal Distance
Signup and view all the flashcards
Cross-Sectional Comparison
Cross-Sectional Comparison
Signup and view all the flashcards
Pearson Correlation Definition
Pearson Correlation Definition
Signup and view all the flashcards
Correlation Range
Correlation Range
Signup and view all the flashcards
What Does Correlation Measure?
What Does Correlation Measure?
Signup and view all the flashcards
Zero Correlation Meaning
Zero Correlation Meaning
Signup and view all the flashcards
Correlation of 1
Correlation of 1
Signup and view all the flashcards
Correlation of 0.009
Correlation of 0.009
Signup and view all the flashcards
Pearson Correlation Coefficient
Pearson Correlation Coefficient
Signup and view all the flashcards
1946-1980 Income Growth
1946-1980 Income Growth
Signup and view all the flashcards
1980-2018 Income Growth
1980-2018 Income Growth
Signup and view all the flashcards
Scatter Plot
Scatter Plot
Signup and view all the flashcards
Scatter Plot Use
Scatter Plot Use
Signup and view all the flashcards
Conditional Expectation
Conditional Expectation
Signup and view all the flashcards
Dot in Scatter Plot
Dot in Scatter Plot
Signup and view all the flashcards
Income Persistence
Income Persistence
Signup and view all the flashcards
2009-2010 Scatter Plot
2009-2010 Scatter Plot
Signup and view all the flashcards
Study Notes
Course Outline
- Focus: Data, descriptive statistics, and causality
- Topics include:
- Introduction to data
- Samples and descriptive statistics
- Conditional descriptive statistics
- Causality and randomization
- Statistical inference
- Revealed preferences in observed data
- Methods: Quasi-experimental
Learning Objectives
- Characterizing conditional distributions
- Linear relationships between variables are characterized.
- Application of knowledge when interpreting data on income distributions.
Kernel Density Graphs
- Kernel density estimator is a weighted average for each value x.
- Bandwidth (h): measurement of the amount of data used around x.
- Kernel Function (Kh): weighting of observations within the bandwidth, and whether further observations from x get lower weight.
Conditional Descriptive Statistics
- Conditional descriptives are statistics of a variable, conditional on other variables.
- Conditional expectation is the most important.
- Denoted as E[Y|X = x], representing the expectation of a random variable Y when another random variable X holds the value x.
- Conditional sample average is the empirical counterpart.
- Joint Distribution: All conditional descriptive statistics follow the joint distribution of two or more variables.
Joint Distribution (review)
- Cross Tabulation: Displays small data sets of two variables efficiently
- Rows indicate the number of values that Y can take.
- Columns indicate the number of values that X can take.
- Cells report the number of observations with value (y, x).
- Alternatively, cross tabulation cells can report the share of observations with value (y, x).
- Joint Density Function: An empirical counterpart of the joint distribution, defined as fxy(x, y) = P(X = x, Y = y).
Marginal Distribution
- The marginal distribution of Y is defined as fy(y) = ∑ fxy(x,y)
- Value of X is not factored into the probability of Y.
- The marginal distribution of X is fx(x) = ∑ fxy(x,y)
Conditional Distribution
- A conditional distribution of Y: fy\x(y\x) = fxy(x,y) / fx(x)
- i.e., Probability that Y takes value y conditional that X takes value x
- One can estimate P(Y = b\X = w) where the "hats" indicate estimates.
Conditional Expectation
- The conditional expectation function (CEF), when Y is discrete is E[Y|X = x] = ∑ tfy\x(t\X = x).
- Population average of Y holding X fixed.
- Weight average of Y is used.
- The weight used for each value of Y is the share of sub-population.
- X can be a vector.
Income Distribution in the U.S.
- Basic results of income distribution literature.
- Group averages are reviewed.
- Changes over the entire distribution is researched.
- Top percent shares, social mobility extras are reviewed.
- Researched based on the tax data that is available over long time periods in many countries.
- Earlier periods are limited to the top.
- Tax records do not capture all income.
- Surveys completed based on surveys.
- Particularly the labor force survey.
- In the periods of 1946 - 1980, there was roughly 2% annual income growth across the distribution among "the 99%"
- From 1980-2018 income growth was faster among the more wealthy even among "the 99%", the very different from the rest
- CDFs for Income Distribution: The U.S. income distribution, 1962–2014, bottom 95 percentiles
- 90/10 percentile ratio.
- CDF comparisons over time: the CDFs show that horizontal distance btw the CDFs = dollar change for each percentile.
- Are not the same people, and percentiles are being compared.
- With very skewed distributions, CDFs are uninformative, and CDF changes are made visible.
Correlation
- Conditional expectation can detect variable associations.
- Scatter plots: show variable relationship observations.
- Persistence of income over time can be graphed in a scatter plot
- Correlation indicates how two variables' values aligned.
- A correlation of 0.92 was observed.
Covariance
- Used to determine correlation.
- Formula: Cov(X, Y) = E[X – E(X)]E[Y – E(Y)].
- Empirical Counterpart: Cov(X, Y) = 1 / n ∑(xi – x) (yi – ỹ)
- For the example shown, the covariance is 256.6.
- Measurement unit is equal to the unit of X times the unit of Y.
Pearson Correlation Coefficient
- Scaled covariance that varies between -1 ≤ Cor(X, Y) ≤ 1, making the number easier to interpret.
- Cor(X, Y) = PX,Y = Cov(X, Y) / SD(X)SD(Y)
- Additional Examples
- correlation = 1 observed in the presentation
- correlation = 0.009 observed in the presentation
- correlation 0 observed in the presentation
- Correlation measures a linear dependence.
- It's possible to have dependence and zero correlation.
Regression
- A regression model is a closely associated approach for assessing linear dependence.
- A bivariate regression model: Y = βο + β₁X + €
- Y = dependent variable
- X = independent variable
- € = residual or error term
- βο = constant parameter
- β₁ = regression coefficient parameter
- In a regression, the data was described best.
- In Ordinary Least Squares the variables can equal arg min βο,βι Σ[Υ; - (βο + β₁X;)]²
- Minimizes difference between observed data and the regression model's prediction.
Regression vs Correlation
- Correlation and bivariate regression are related.
- β1 = Cov(X, Y) / Var(X)
- Pearson correlation coefficient formula: Px,y = Cov(X, Y) / ✓ Var(X) Var(Y)✓
- Y ≈ B1 because Var(X) ≈ Var(Y)
Intergenerational Mobility
- A complementary way to think about inequality.
- Involves questions of equality of opportunity.
- Compares the extent to which people can inherit position vs compete of level playing field.
- Powerful, but incomplete measurement: E[pc|Pp = pp] pc is child's position pp is her parent's position
Regression & Expectation
- Conditional expectation function where Y is linear in X: E[Y|X = x] = βο + β₁×.
- If CEF isn't linear, regression provides an approximation.
- Regression is the minimum mean squared error linear CEF approximation.
- The approximation is often good enough, especially with flexible multivariate regression.
Multivariate Regression Model
- The formula is: Y = βο + β₁X + β2X² + €.
- In one example, this means the estimates that fit the data best are: βο = -37,549, β₁ = 2.857, β₂ = -31
- This is generally good for approx E[Y|X = x] within certain age range.
- This analysis found multivariate regressions are pretty good for approximating the figure.
- Less good for new born incomes
- Looking at the data through the lenses of several methods is a good idea.
End of Lecture Summary
- Marginal and conditional distribution. Conditional expectation function Scatter plots. Covariance and correlation Regression, ordinary least square (OLS)
- Stata code for those things is on MyCourses/More Materials/. Homework: In-class worksheet 2 due on MyCourses before next lecture.
- Aim measures the actual state by trying to determine, " "what is joint distribution of X and Y?"
Up Next: Causal Questions
- A need to be able to to evaluate the impact of X on Y.
- Impacts: of education on earnings; marketing campaigns on sales; carbon tax on emissions; R&D subsidy on innovation; fiscal stimulus on unemployment.
- These are CAUSAL questions.
- Aim compare states.
- Aim " how would Y change if we changed X?"
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore conditional descriptive statistics, focusing on characterizing conditional distributions and linear relationships between variables. Learn about kernel density estimators, bandwidth, and kernel functions. Apply knowledge to interpret income distribution data and quasi-experimental methods.