3
25 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does the horizontal distance between the CDFs for different years in the graph represent?

  • The income of individuals who moved between percentiles during those years.
  • The dollar change in income for a specific percentile between those years. (correct)
  • Changes in the Gini coefficient over the period represented.
  • The percentage change in income for a specific percentile over time.

Based on the income distribution data provided, which of the following statements is most accurate?

  • The income gap between the 90th and 10th percentile increased substantially between 1946 and 2014. (correct)
  • Incomes for the bottom 10 percentiles have increased more than those in the top 10 percentiles since 1980.
  • Incomes for all percentiles have remained stagnant since 1946.
  • The income gap between the 90th and 10th percentile decreased significantly between 1946 and 2014.

What is the most likely reason the graph specifies 'bottom 95 percentiles'?

  • Data collection is less reliable for the top 5 percentiles.
  • The top 5 percentiles experienced no income growth during this period.
  • The top 5 percentiles have a disproportionate impact on overall income distribution and might skew the scale. (correct)
  • The data is normalized to exclude outliers within the bottom 95 percentiles.

Which data source is most suitable for analyzing long-term trends in income distribution, particularly focusing on top income earners?

<p>Tax data, because it is consistently available over long time periods and captures top incomes. (B)</p> Signup and view all the answers

Assuming the trend continues, what might one expect to observe in the CDF of income distribution for 2030, compared to 2014?

<p>An increased horizontal distance between the 90th and 10th percentiles. (C)</p> Signup and view all the answers

What could be a limitation of using tax data to study income distribution?

<p>Tax records may not capture all forms of income, potentially leading to an incomplete picture. (C)</p> Signup and view all the answers

If the 10th percentile income remained constant from 1946 to 2014, while the 90th percentile income increased, what would necessarily be true?

<p>The 90/10 percentile ratio would increase. (C)</p> Signup and view all the answers

According to the provided data about income distribution, which statement best describes the trend in average annual income growth rates between 1946–1980 and 1980–2018?

<p>Income growth decreased between 1980 and 2018 compared to 1946 and 1980 across all income percentiles. (C)</p> Signup and view all the answers

What does E[w | E = e, G = G] represent in the context of wage analysis, according to Autor (2014)?

<p>The expected wage (w) given a specific education level (e) and gender (G). (A)</p> Signup and view all the answers

What is the significance of analyzing 'top percent shares' in income distribution studies?

<p>It provides insights into income inequality by showing the proportion of total income held by the highest earners. (A)</p> Signup and view all the answers

Based on the provided information, which period exhibited a more equitable distribution of income growth across all income percentiles?

<p>The period from 1946 to 1980, with growth evenly distributed across income groups. (C)</p> Signup and view all the answers

According to the annual real pre-tax income growth per adult shown, which income group experienced the slowest growth rate during the 1980-2018 period?

<p>The bottom income groups. (B)</p> Signup and view all the answers

What is a scatter plot primarily used for in the context of empirical analysis?

<p>Detecting relationships between two variables by plotting observations. (C)</p> Signup and view all the answers

What does each 'dot' represent in the described scatter plot of individual income in 2009 and 2010?

<p>A single individual's income in both 2009 and 2010. (B)</p> Signup and view all the answers

What is the key function of conditional expectation in empirical analysis?

<p>To detect how variables are associated with each other. (A)</p> Signup and view all the answers

How did the income growth of the top 1 percent compare to the average income growth from 1946 to 1980?

<p>The top 1 percent grew slower than the average. (A)</p> Signup and view all the answers

What is one potential limitation of a counterfactual analysis that aims to address income inequality?

<p>It may result in lower average growth. (A)</p> Signup and view all the answers

Consider a dataset where individual income in 2015 is plotted against individual income in 2016 using a scatter plot. If most of the points cluster closely around a line with a positive slope, what does this indicate?

<p>There is a strong positive correlation, suggesting income persistence. (D)</p> Signup and view all the answers

What does the Pearson correlation coefficient represent?

<p>A scaled covariance between two variables. (A)</p> Signup and view all the answers

What is the range of values for the Pearson correlation coefficient?

<p>-1 to 1 (C)</p> Signup and view all the answers

Given two variables, X and Y, with a positive covariance. What can be inferred about their correlation?

<p>They are positively correlated. (C)</p> Signup and view all the answers

Consider two datasets with the same covariance but different standard deviations. How would this affect the Pearson correlation coefficient?

<p>The dataset with smaller standard deviations will have a higher correlation coefficient. (D)</p> Signup and view all the answers

If the correlation between two variables X and Y is close to 1, what can you infer?

<p>Increases in X are associated with increases in Y. (C)</p> Signup and view all the answers

Two variables have a Pearson correlation coefficient of -1. Which statement accurately describes their relationship?

<p>There is a perfect negative linear relationship. (C)</p> Signup and view all the answers

A researcher calculates a Pearson correlation coefficient of 1.2 between two variables. What does this indicate?

<p>A calculation error. (B)</p> Signup and view all the answers

Flashcards

Group Averages

Analyzing averages within specific groups of a population, like income brackets.

Distributional Changes

Examining how income changes across the entire spectrum, not just averages.

Income Distribution Extras

Additional metrics like the share of income held by the top 1% or generational changes in income.

Income Data Sources

Data sourced from tax filings and survey data.

Signup and view all the flashcards

Variable Income Growth

Income growth rates vary significantly across different income percentiles, with higher percentiles experiencing faster growth.

Signup and view all the flashcards

90/10 Percentile Ratio

The ratio of income at the 90th percentile to the income at the 10th percentile within a distribution.

Signup and view all the flashcards

CDF of Income Distribution

A graph showing the cumulative distribution of income, indicating the percentage of the population earning up to a certain income level.

Signup and view all the flashcards

Percentile Comparison Over Time

Comparing income levels across different percentiles at different points in time to observe income shifts.

Signup and view all the flashcards

CDF Horizontal Distance

The horizontal space between CDF curves represents the change in income at a particular percentile over the compared time period.

Signup and view all the flashcards

Cross-Sectional Comparison

The data compares income percentiles across different years, not the income changes of the same individuals over time.

Signup and view all the flashcards

Pearson Correlation Definition

Pearson correlation is covariance scaled by standard deviations.

Signup and view all the flashcards

Correlation Range

The range is from -1 to +1, inclusive.

Signup and view all the flashcards

What Does Correlation Measure?

Measures the linear relationship between two variables.

Signup and view all the flashcards

Zero Correlation Meaning

It indicates no linear relationship.

Signup and view all the flashcards

Correlation of 1

Shows perfect positive linear relationship.

Signup and view all the flashcards

Correlation of 0.009

Indicates a very weak positive correlation.

Signup and view all the flashcards

Pearson Correlation Coefficient

Pearson correlation coefficient, denoted as ρ(X, Y) or Cor(X, Y), measures the strength and direction of a linear relationship between two variables.

Signup and view all the flashcards

1946-1980 Income Growth

From 1946-1980, income growth was evenly distributed across all income groups.

Signup and view all the flashcards

1980-2018 Income Growth

From 1980-2018, income growth became uneven, favoring top income groups.

Signup and view all the flashcards

Scatter Plot

Visual representation of data points on a graph.

Signup and view all the flashcards

Scatter Plot Use

Graph used to represent the relationship between two variables.

Signup and view all the flashcards

Conditional Expectation

A method to understand variable relationships.

Signup and view all the flashcards

Dot in Scatter Plot

Each dot represents one individual's data for two different points in time.

Signup and view all the flashcards

Income Persistence

Visual display of each individual's income in two different years.

Signup and view all the flashcards

2009-2010 Scatter Plot

Shows individual income in 2009 and 2010, revealing income persistence.

Signup and view all the flashcards

Study Notes

Course Outline

  • Focus: Data, descriptive statistics, and causality
  • Topics include:
    • Introduction to data
    • Samples and descriptive statistics
    • Conditional descriptive statistics
    • Causality and randomization
    • Statistical inference
    • Revealed preferences in observed data
  • Methods: Quasi-experimental

Learning Objectives

  • Characterizing conditional distributions
  • Linear relationships between variables are characterized.
  • Application of knowledge when interpreting data on income distributions.

Kernel Density Graphs

  • Kernel density estimator is a weighted average for each value x.
  • Bandwidth (h): measurement of the amount of data used around x.
  • Kernel Function (Kh): weighting of observations within the bandwidth, and whether further observations from x get lower weight.

Conditional Descriptive Statistics

  • Conditional descriptives are statistics of a variable, conditional on other variables.
  • Conditional expectation is the most important.
    • Denoted as E[Y|X = x], representing the expectation of a random variable Y when another random variable X holds the value x.
  • Conditional sample average is the empirical counterpart.
  • Joint Distribution: All conditional descriptive statistics follow the joint distribution of two or more variables.

Joint Distribution (review)

  • Cross Tabulation: Displays small data sets of two variables efficiently
    • Rows indicate the number of values that Y can take.
    • Columns indicate the number of values that X can take.
    • Cells report the number of observations with value (y, x).
  • Alternatively, cross tabulation cells can report the share of observations with value (y, x).
  • Joint Density Function: An empirical counterpart of the joint distribution, defined as fxy(x, y) = P(X = x, Y = y).

Marginal Distribution

  • The marginal distribution of Y is defined as fy(y) = ∑ fxy(x,y)
  • Value of X is not factored into the probability of Y.
  • The marginal distribution of X is fx(x) = ∑ fxy(x,y)

Conditional Distribution

  • A conditional distribution of Y: fy\x(y\x) = fxy(x,y) / fx(x)
  • i.e., Probability that Y takes value y conditional that X takes value x
  • One can estimate P(Y = b\X = w) where the "hats" indicate estimates.

Conditional Expectation

  • The conditional expectation function (CEF), when Y is discrete is E[Y|X = x] = ∑ tfy\x(t\X = x).
  • Population average of Y holding X fixed.
  • Weight average of Y is used.
  • The weight used for each value of Y is the share of sub-population.
  • X can be a vector.

Income Distribution in the U.S.

  • Basic results of income distribution literature.
    • Group averages are reviewed.
    • Changes over the entire distribution is researched.
    • Top percent shares, social mobility extras are reviewed.
  • Researched based on the tax data that is available over long time periods in many countries.
    • Earlier periods are limited to the top.
    • Tax records do not capture all income.
  • Surveys completed based on surveys.
    • Particularly the labor force survey.
  • In the periods of 1946 - 1980, there was roughly 2% annual income growth across the distribution among "the 99%"
  • From 1980-2018 income growth was faster among the more wealthy even among "the 99%", the very different from the rest
  • CDFs for Income Distribution: The U.S. income distribution, 1962–2014, bottom 95 percentiles
    • 90/10 percentile ratio.
  • CDF comparisons over time: the CDFs show that horizontal distance btw the CDFs = dollar change for each percentile.
    • Are not the same people, and percentiles are being compared.
  • With very skewed distributions, CDFs are uninformative, and CDF changes are made visible.

Correlation

  • Conditional expectation can detect variable associations.
  • Scatter plots: show variable relationship observations.
  • Persistence of income over time can be graphed in a scatter plot
  • Correlation indicates how two variables' values aligned.
    • A correlation of 0.92 was observed.

Covariance

  • Used to determine correlation.
  • Formula: Cov(X, Y) = E[X – E(X)]E[Y – E(Y)].
  • Empirical Counterpart: Cov(X, Y) = 1 / n ∑(xi – x) (yi – ỹ)
  • For the example shown, the covariance is 256.6.
  • Measurement unit is equal to the unit of X times the unit of Y.

Pearson Correlation Coefficient

  • Scaled covariance that varies between -1 ≤ Cor(X, Y) ≤ 1, making the number easier to interpret.
  • Cor(X, Y) = PX,Y = Cov(X, Y) / SD(X)SD(Y)
  • Additional Examples
    • correlation = 1 observed in the presentation
    • correlation = 0.009 observed in the presentation
    • correlation 0 observed in the presentation
  • Correlation measures a linear dependence.
    • It's possible to have dependence and zero correlation.

Regression

  • A regression model is a closely associated approach for assessing linear dependence.
  • A bivariate regression model: Y = βο + β₁X + €
    • Y = dependent variable
    • X = independent variable
    • € = residual or error term
    • βο = constant parameter
    • β₁ = regression coefficient parameter
  • In a regression, the data was described best.
  • In Ordinary Least Squares the variables can equal arg min βο,βι Σ[Υ; - (βο + β₁X;)]²
  • Minimizes difference between observed data and the regression model's prediction.

Regression vs Correlation

  • Correlation and bivariate regression are related.
  • β1 = Cov(X, Y) / Var(X)
  • Pearson correlation coefficient formula: Px,y = Cov(X, Y) / ✓ Var(X) Var(Y)✓
  • Y ≈ B1 because Var(X) ≈ Var(Y)

Intergenerational Mobility

  • A complementary way to think about inequality.
  • Involves questions of equality of opportunity.
  • Compares the extent to which people can inherit position vs compete of level playing field.
  • Powerful, but incomplete measurement: E[pc|Pp = pp] pc is child's position pp is her parent's position

Regression & Expectation

  • Conditional expectation function where Y is linear in X: E[Y|X = x] = βο + β₁×.
  • If CEF isn't linear, regression provides an approximation.
  • Regression is the minimum mean squared error linear CEF approximation.
  • The approximation is often good enough, especially with flexible multivariate regression.

Multivariate Regression Model

  • The formula is: Y = βο + β₁X + β2X² + €.
  • In one example, this means the estimates that fit the data best are: βο = -37,549, β₁ = 2.857, β₂ = -31
  • This is generally good for approx E[Y|X = x] within certain age range.
  • This analysis found multivariate regressions are pretty good for approximating the figure.
  • Less good for new born incomes
  • Looking at the data through the lenses of several methods is a good idea.

End of Lecture Summary

  • Marginal and conditional distribution. Conditional expectation function Scatter plots. Covariance and correlation Regression, ordinary least square (OLS)
  • Stata code for those things is on MyCourses/More Materials/. Homework: In-class worksheet 2 due on MyCourses before next lecture.
  • Aim measures the actual state by trying to determine, " "what is joint distribution of X and Y?"

Up Next: Causal Questions

  • A need to be able to to evaluate the impact of X on Y.
    • Impacts: of education on earnings; marketing campaigns on sales; carbon tax on emissions; R&D subsidy on innovation; fiscal stimulus on unemployment.
  • These are CAUSAL questions.
  • Aim compare states.
  • Aim " how would Y change if we changed X?"

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Explore conditional descriptive statistics, focusing on characterizing conditional distributions and linear relationships between variables. Learn about kernel density estimators, bandwidth, and kernel functions. Apply knowledge to interpret income distribution data and quasi-experimental methods.

More Like This

Probability and Statistics
6 questions
Conditional Probability in Statistics
8 questions
Statistics and Probability Quiz
24 questions

Statistics and Probability Quiz

AppreciatedFauvism4671 avatar
AppreciatedFauvism4671
Use Quizgecko on...
Browser
Browser