PSYU2248 Statistics II: Correlation and Scatterplots

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the standardized regression coefficient formula, according to the content?

standardized beta = b * sy

What is true about the relationship between the standardized and unstandardized regression coefficients?

They have the same signs but differ in value (correct)
They have the same value and sign
They are not related
They have different signs and values

In simple linear regression, the standardized coefficient IS the ____________.

correlation coefficient

What is the purpose of the regression equation in predicting scores?

To predict scores on the dependent variable (Y) based on the independent variable (X) Signup and view all the answers

Higher perceived fairness of wealth statistically significantly predicted more support for redistribution of wealth. (True/False)

False (B) Signup and view all the answers

What is the purpose of running a statistical analysis within a research context?

To answer research questions/hypotheses and relate the findings back to the research context (D) Signup and view all the answers

Correlation analysis always follows experimental design.

False (B) Signup and view all the answers

What are the criteria for a cause-and-effect (causal) relationship?

<ol> <li>Covariance rule: there must be a relationship; 2. Temporal precedence: the cause must precede the effect; 3. Internal validity: excluding other potential causes of the effect</li> </ol> Signup and view all the answers

____ measures co-variation and is just covariance, standardized.

Correlation Signup and view all the answers

Match the correlation coefficient with its description:

Pearson’s product-moment correlation = Normal correlation coefficient Spearman’s correlation rs = Correlation on ranked data, non-parametric correlation Point-biserial correlation rpb = Use for one numeric and one dichotomous variable Phi φ = Use for two dichotomous variables Signup and view all the answers

What does R-squared value of 0.75 indicate?

R-squared value of 0.75 indicates that 75% of the variation in the dependent variable can be explained by the independent variable(s). Signup and view all the answers

What does the coefficient of 4.58 for x with a p-value of 0.003 signify?

The coefficient of 4.58 for x indicates that for every one-unit increase in x, the dependent variable y increases by 4.58. The low p-value of 0.003 suggests that this relationship is statistically significant. Signup and view all the answers

What does an unstandardized beta represent in regression analysis?

The size of the effect of the independent variable on the dependent variable (C) Signup and view all the answers

What are the assumptions required for a correlation to be appropriate?

Linear and monotonic relationship (B), No gaps or problematic outliers (C) Signup and view all the answers

Outliers are only problematic if they distort the results.

True (A) Signup and view all the answers

What are the key assumptions of correlation?

Numeric Data, Independence of observations, Monotonicity, Linearity, No major gaps or outliers Signup and view all the answers

The correlation formula can be expressed in multiple ways: r = [cov(X,Y)] / [s_x * s_y]. Covariance is the __________ correlation.

unstandardized Signup and view all the answers

What does a confidence interval for a correlation estimate?

It provides a range within which the true population correlation coefficient is likely to fall. Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Research Process and Design

Statistical analysis exists within a research context, and it's essential to understand the research context to properly apply and interpret statistical analyses.
The research process involves:
- Making an observation
- Reviewing the literature and identifying the theory
- Generating aims, research questions, and hypotheses
- Designing the study
- Obtaining ethical approval
- Running the study and collecting data
- Analyzing the data
- Writing up and disseminating the findings
Design steps:
- Understanding research questions and hypotheses
- Identifying the sampling population
- Understanding how variables are measured
Statistics steps:
- Describing variables using univariate and bivariate summaries
- Fitting an appropriate statistical model
- Formally testing assumptions
- Interpreting results and drawing conclusions

Stata Recap

Stata can be downloaded from the MQ student website.
Data files can be opened in Stata, including .dta files and imported from Excel.
Useful Stata YouTube videos are available.

Things You Should Know How to Do in Stata

Open a data file
Look at the data file to identify variables, number of observations, etc.
Run descriptive statistics for various types of variables
Create a new variable and attach value labels to categorical variables
Run statistical analyses, such as one-sample t-tests, independent t-tests, paired t-tests, correlations, and chi-square tests
Run assumption checks, such as Shapiro-Wilk and Levene's tests

Revision (Plus) of Correlations and Scatterplots

Correlation analysis is used in non-experimental design, where the researcher doesn't intervene or manipulate the variables.
Correlation doesn't imply causation.
Criteria for a cause-and-effect relationship:
- Covariance rule: there must be a relationship
- Temporal precedence: the cause must precede the effect
- Internal validity: excluding other potential causes of the effect
Correlations can be true or spurious.
Example of a spurious correlation: infant mortality rate and number of doctors in a population.

Correlation Coefficient

The correlation coefficient measures the strength and direction of a linear relationship between two variables.
Pearson's product-moment correlation (r) is used for numeric variables.
The correlation coefficient ranges from -1 to 1.
Strength of correlation:
- 0 to 0.10: no real relationship
- 0.10 to 0.30: weak relationship
- 0.30 to 0.50: moderate relationship
- 0.50 to 1: strong relationship

Scatterplots

Scatterplots visualize the relationship between two variables.
When analyzing a scatterplot, consider:
- Monotonicity (does the trend keep in one direction?)
- Linearity (can it be summarized by a straight line?)
- Direction of association (positive or negative?)
- Effect of X on Y (how steep is the slope?)
- Correlation (how strong is the correlation?)
- Gaps (are there any gaps?)
- Outliers (are there any outliers?)
Scatterplots are essential for checking the assumptions of correlation.

Calculating Correlation and Covariance

Correlation formula: 𝑟 = 𝑐𝑜𝑣(𝑥, 𝑦) / (𝑠𝑥 𝑠𝑦)
Covariance formula: 𝑐𝑜𝑣(𝑥, 𝑦) = Σ(𝑥 − 𝑥̄)(𝑦 − 𝑦̄) / (𝑛 − 1)

Confidence Intervals

Confidence intervals are interval estimates that provide a range of values within which the true population estimate is likely to lie.
Formula for a 95% CI: point estimate +/- 1.96 x SE
SE (standard error) is a measure of variability.
Calculating a CI for a correlation involves transforming the correlation coefficient into a z-score.### Study Notes: Wealth Inequality

Regression Output

Source table: Model, Residual, and Total
Number of observations: 9
F-statistic: 21.00, p = 0.0025
R-squared: 0.7500, Adj R-squared: 0.7143
Root MSE: 0.84515

Coefficients Table

x: Coefficient: 0.5, Std. Err: 0.1091089, t: 4.58, p: 0.003
_cons: Coefficient: 1.5194625, Std. Err: 1.93, t: 0.096, p: -

Model as a Whole Effects

Model is statistically significant, F(1, 7) = 21.00, p = 0.003
R-squared: 0.75 (75%), a large amount of variance explained

Effect of X

The effect of X is statistically significant, t(7) = 4.58, p = 0.003
For every one-point increase in X, Y increases by 0.5 points (b = 0.5)

Intercept

The intercept (AKA constant term) is the predicted score on Y when X = 0, a = 1
The predicted score on Y when X = 0 is 1

Standardized Regression Coefficient

Standardized beta: 0.8232
Unstandardized beta is not comparable between different IVs on different scales

Using the Regression Equation to Predict Scores

Regression line predicts a score of Y for any given value of X
We can substitute in values of X to find predicted scores for Y

Wealth Inequality Example

Example from Open Stats Lab, study by Dawtry et al. (2015)
Examined why people differ in their assessments of the increasing wealth inequality within developed nations

Study Methods + Hypotheses

Design: cross-sectional online survey study
Sample: 305 US adults recruited from an online survey pool Amazon’s Mturk### Study on Wealth Inequality
Participants reported their attitudes toward redistribution of wealth, measured using a four-item scale (redist1 – redist4), which was converted into a single variable called support_for_redistribution.
Participants also reported their political orientation on a scale from 1 (extremely liberal) to 9 (extremely conservative), measured by the variable political_preference.
Additionally, participants reported their perceived fairness of the distribution of household income across the US population, measured by the variable fairness.

Hypotheses

Hypothesis 1: Support for redistribution of wealth is predicted by perceived fairness of wealth distribution, with individuals who think the current system is fair having less support for redistribution.
Hypothesis 2: Support for redistribution of wealth is predicted by political orientation, with more liberal individuals being more likely to support redistribution.

Regression Analysis

Simple Linear Regression (SLR) was used to test the hypotheses.
For Hypothesis 1, the independent variable (IV) was fairness, and the dependent variable (DV) was support for redistribution.
For Hypothesis 2, the independent variable (IV) was political preference, and the dependent variable (DV) was support for redistribution of wealth.
A negative predictive relationship was hypothesized for both hypotheses.

Results

Higher perceived fairness of wealth statistically significantly predicted less support for redistribution of wealth (F(1, 303) = 234.57, p < 0.05).
The results supported Hypothesis 1, indicating that individuals who perceived the current system as fair were less likely to support redistribution of wealth.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.