Statistics

Study Flashcards Play Quiz

Questions and Answers

What is the meaning of a z-score of 1?

The raw score is 1 standard deviation above the mean

What should be the shape of the distribution of z-scores?

Normal

What is the main purpose of a z-table?

To find the percentage of scores falling below a particular score

What is the formula to transform raw scores into z-scores?

(Raw score - mean of distribution) / SD of distribution Signup and view all the answers

What is a linear transformation?

When every score is changed through multiplication/division by a constant value, addition/subtraction by a constant value, or a combination Signup and view all the answers

What is a percentile?

The percentage of all scores that fall below an individual score Signup and view all the answers

When is a z-table not used?

When the raw score is a whole number Signup and view all the answers

How many ways can standard normal distribution be used?

3 ways Signup and view all the answers

Where is the value of Pearson's r displayed in the Correlations table?

In the Pearson Correlation row of each variable column Signup and view all the answers

What are the two components of normality?

Skewness and Kurtosis Signup and view all the answers

What type of skewness is represented by a peak to the left?

Negatively skewed Signup and view all the answers

What type of kurtosis is characterized by a too peaked distribution?

Leptokurtic Signup and view all the answers

What can cause an underestimation of variance in data analysis?

Kurtosis Signup and view all the answers

What type of relationship can lead to an underestimation of the correlation coefficient?

Inverted-U shaped Signup and view all the answers

What can occur when correlating variables with restricted ranges?

Either a or b Signup and view all the answers

What can outliers do to the correlation coefficient?

All of the above Signup and view all the answers

What is the goal of the least squares criterion?

To minimize the sum of residuals Signup and view all the answers

What is the residual in regression analysis?

The difference between the actual Y value and the Y value on the regression line Signup and view all the answers

What is the regression coefficient also known as?

Slope or b Signup and view all the answers

What is the model sum of squares (SSm) calculated by?

Finding the difference between the mean Y score and each actual score on the Y axis Signup and view all the answers

How is the goodness of fit assessed for the regression line?

By dividing the variability explained by the regression line by the total variability Signup and view all the answers

What does R² x 100 show?

The proportion of variability in Y explained by X Signup and view all the answers

What is the standardised regression coefficient also known as?

Beta (β) Signup and view all the answers

What is the main difference between correlational studies and experimental studies?

Correlational studies involve observation, while experimental studies involve manipulation Signup and view all the answers

What is statistical control used for?

To control for external variables in correlational studies Signup and view all the answers

What is the equation for multiple regression?

y = a + (b1)(x1) + (b2)(x2) Signup and view all the answers

When is hierarchical multiple regression used?

When the predictors are entered in steps Signup and view all the answers

Where is the goodness-of-fit found in the SPSS output?

In the ANOVA table Signup and view all the answers

What is the main advantage of transforming a value to a z-score?

To determine the percentile of the value Signup and view all the answers

How can standard normal distribution be used to compare scores on one variable with scores on another?

By comparing the percentiles of the scores using the z-table Signup and view all the answers

What is the formula for transforming a z-score back into a raw score?

X = (z) (Sx) + X Signup and view all the answers

What is a positive association?

A type of correlation where if one variable increases or decreases, the other does the same Signup and view all the answers

What is the purpose of a scatter plot in correlation analysis?

To assess the direction, strength, and linearity of the relationship between variables Signup and view all the answers

How do scatter plots assess the strength of a relationship between variables?

By looking at how closely the points are clustered to the line of best fit Signup and view all the answers

What is Pearson's r used for?

To assess the correlation between two interval variables Signup and view all the answers

How is Pearson's r calculated?

By converting raw scores to z-scores, multiplying the z-scores, and dividing by the number of participants minus 1 Signup and view all the answers

What does a correlation coefficient of 0 indicate?

No linear relationship between the variables Signup and view all the answers

How is correlation assessed using a correlation coefficient?

Using Pearson's r on interval data Signup and view all the answers

What type of measurement scale has a true zero point and equal intervals between consecutive values?

Ratio Signup and view all the answers

What is the main purpose of regression diagnostics?

To detect outliers and improve the accuracy of the model Signup and view all the answers

What is the most appropriate action to take when an outlier is detected due to a data entry error?

Correct the data entry error Signup and view all the answers

What is the purpose of Cook's distance in regression analysis?

To detect influential cases by measuring the change in predicted scores Signup and view all the answers

What can be observed from a scatterplot of standardized residuals to check for normality, linearity, and homoscedasticity?

A roughly rectangular distribution with scores clustered towards the centre Signup and view all the answers

What type of measurement scale has categories that are ranked or ordered in terms of magnitude or strength?

Ordinal Signup and view all the answers

What is the main difference between interval and ratio measurement scales?

Ratio scales have a true zero point, while interval scales do not Signup and view all the answers

What is another way to check for normality of residuals?

Creating a histogram of standardized residual scores Signup and view all the answers

What is the purpose of the Durbin-Watson test?

To check for independence of error Signup and view all the answers

What is the purpose of assumptions assessments in regression analysis?

To assess the assumptions of the regression model Signup and view all the answers

What should be done if an outlier is detected and it is not due to a data entry error?

All of the above Signup and view all the answers

What happens when an assumption is severely violated in regression analysis?

The regression should be re-run using robust methods Signup and view all the answers

What is multicollinearity in multiple regression?

When predictor variables are highly correlated Signup and view all the answers

What is the purpose of standardised residuals in regression analysis?

To detect outliers by searching for large residuals Signup and view all the answers

When does multicollinearity become a problem?

When the research is interested in the separate effects of different predictors on the outcome Signup and view all the answers

What is the main limitation of interval measurement scales?

They do not have a true zero point Signup and view all the answers

How is multicollinearity checked?

By inspecting the correlation matrix Signup and view all the answers

What is mediation analysis?

A test of the theoretical assumption that the effect of the predictor on the outcome exists because of another variable Signup and view all the answers

What are indirect effects in mediation analysis?

The effects of the mediator on the outcome Signup and view all the answers

What is path c in mediation analysis?

The effect of the predictor on the outcome Signup and view all the answers

In moderation analysis, what does the interaction term (predictor*moderator) represent?

The moderation effect, indicating if the relationship between the predictor and outcome changes as a function of the moderator Signup and view all the answers

What is a necessary condition for interpreting the interaction term in moderation analysis?

The model must include the predictor, moderator, and interaction term Signup and view all the answers

What type of graph is typically used to display the results of moderation analysis?

Simple slope Signup and view all the answers

What is a limitation of PROCESS in moderation analysis?

It requires complete data and a continuous outcome variable Signup and view all the answers

What is the purpose of moderation analysis?

To determine if the relationship between the predictor and outcome changes as a function of a third variable Signup and view all the answers

What type of data can be used for moderation analysis?

Either continuous or categorical variables for the moderator or predictor Signup and view all the answers

What does the moderation effect indicate?

The change in the relationship between the predictor and outcome as a function of the moderator Signup and view all the answers

Why are standardized regression coefficients not provided in PROCESS output for moderation analysis?

Because they are difficult to interpret in the presence of an interaction Signup and view all the answers

What is the primary purpose of hierarchical multiple regression?

To examine the relationship between the predictor and outcome variables while controlling for the effects of external predictors Signup and view all the answers

What is a categorical variable?

A variable with scores that can be placed into categories Signup and view all the answers

What is the process of representing categorical variables using only 1 and 0 as values?

Dummy coding Signup and view all the answers

When the predictor variable has more than two categories, how many dummy variables are created?

The total number of categories minus 1 Signup and view all the answers

What process should be used when there is one categorical predictor with three or more categories and one or more continuous predictors?

Hierarchical multiple regression Signup and view all the answers

What is the primary purpose of using a measurement scale?

To determine the type of statistical procedure to use Signup and view all the answers

What is the characteristic of a nominal measurement scale?

It consists of a set of categories with each having a specific quality or characteristic Signup and view all the answers

What is the purpose of using dummy variables in regression analysis?

To examine the relationship between a categorical predictor and an outcome variable Signup and view all the answers

When there are two or more categorical predictors, what process should be used?

Hierarchical multiple regression Signup and view all the answers

What is the result of using hierarchical multiple regression when there is one categorical predictor and one or more continuous predictors?

The categorical predictor is entered in a separate block from the continuous predictors Signup and view all the answers

Study Notes

Z-Scores and Standard Normal Distribution

Z-scores are a transformation of raw scores into units of standard deviation
A z-score of 1 means the raw score is 1 standard deviation above the mean, and a z-score of -1 means the raw score is 1 standard deviation below the mean
The distribution of z-scores should have the same shape as the distribution of raw scores, normally a normal distribution
The formula for transforming raw scores into z-scores is: (Raw score - mean of distribution) / SD of distribution
The standard normal distribution describes the normal distribution of z-scores

Using Z-Scores and Standard Normal Distribution

Z-scores can be used to know the percentage of scores below each value
Z-scores can be used to compare individual scores on one variable with scores on another variable
Z-scores can be used to know which score on a variable sits at a specific percentile
The formula for transforming a z-score back into a raw score is: X = (z) (Sx) + X

Correlation

A positive association is a type of correlation where if one variable increases or decreases, the other does the same
A negative association is a type of correlation where if one variable increases or decreases, the other does the opposite
An undefined association is a type of correlation where scores are organized horizontally on a scatter plot
A perfect association is a type of correlation that can be either positive or negative, where the change in one variable is exactly proportional to the change in the other
Correlation can be assessed pictorially using a scatter plot or numerically using a correlation coefficient

Scatter Plots

A scatter plot can be used to set out where each participant lies on the scale of each variable, compared to other participants
Scatter plots assess the direction of the relationship between variables, the strength of the relationship, and whether it is a linear relationship
If the line of participant scores goes diagonally upwards from the bottom left to the top right, then it is a positive correlation. If the line goes diagonally downwards from the top left to the bottom right, then it is a negative correlation
The more points are clustered close to the line of best fit, the stronger the relationship is

Correlation Coefficient

Pearson's r is used on interval data and indicates the correlative relationship on a scale of -1 (perfect negative relationship) to +1 (perfect positive relationship), with 0 indicating no relationship
Pearson's r is calculated by converting raw scores to z-scores, multiplying the z-scores for each participant, adding the products together, and dividing by the number of participants minus 1
In SPSS output, the value of Pearson's r is displayed in the Correlations table, and the statistical significance of this value is displayed in the Sig. (2-tailed) row for each variable

Normality

Normality consists of two components: skewness and kurtosis
Skewness is the symmetry of a distribution chart, with a bell-curve having no skew
Kurtosis is how sharply peaked a sample's distribution is
Skewness and kurtosis can affect data analysis, but a larger sample size can reduce this risk

Factors Affecting Correlation Coefficient

Inverted-U shaped relationships, restricted ranges, outliers, and the shape of X and Y distributions can all cause problems in the calculation of Pearson's r
An inverted-U shaped relationship can underestimate the size of the possible correlation
A restricted range can make the correlation appear reduced or inflated
Outliers can strongly influence the Pearson's r calculations
The shape of X and Y distributions can also affect the correlation coefficient

Regression Analysis

Regression analysis involves finding the best-fitting straight line through a scatter plot
The least squares criterion states that the line of best fit should be the line with the lowest possible sum of squared residuals
The regression coefficient (b) is the number of units that the line moves up the Y axis for each unit it moves along the X axis
The regression constant (a) is the point value of the place where the regression line meets the Y axis
The least squares criterion is satisfied when b is equal to the sample of covariance of x and y divided by the sample variance of x

Multiple Regression

Multiple regression involves more than one predictor variable
The equation for multiple regression is: y = a + (b1)(x1) + (b2)(x2) + ...
Hierarchical multiple regression is used when the researcher believes that the effects of the predictor on the outcome are not fully explained without the inclusion of the external predictor
In SPSS output, the goodness-of-fit is found in the ANOVA table, the overall effects of the predictor variables are found in the model summary table, and the effects of each predictor variable (separately) are found in the coefficients table### Outliers and Residuals
More than 5% of samples with a residual below -2.0 or above +2.0 indicate cause for concern.
Cook's distance determines the predicted scores for other cases if a specific case is not included in the analysis.
Distance scores above 1 indicate cause for concern.
Cook's distance can be found in the Maximum column of the Residual Statistics table in SPSS output.

Addressing Outliers

First, ensure outliers are not caused by data entry errors.
Options for addressing outliers: transforming the data, or deleting the case responsible for the outlier (if it produces a large distortion).

Assumptions of Residuals in Regression Analysis

Normality: residual values should be normally distributed.
Linearity: residuals should have a straight line relationship with the predicted outcomes.
Homoscedasticity: variance of the residuals should be approximately equal for all predicted scores.
Independence of error: errors of prediction should be uncorrelated.

Checking Assumptions

Normality, linearity, and homoscedasticity can be checked by inspecting the scatterplot of standardized residuals.
A histogram can be used to check the normality of residuals.
Independence of error can be investigated using the Durbin-Watson test.

Violating Assumptions

If an assumption is severely violated, the regression should be re-run using robust methods (e.g., bootstrapping).

Multicollinearity

Multicollinearity occurs when predictor variables in the model are highly correlated (>0.80).
Multicollinearity becomes a problem when the research aims to find the separate effects of different predictors on the outcome.
It can be checked by inspecting the correlation matrix.

Dealing with Multicollinearity

One or more variables can be deleted from the model if the correlation between two predictors is very high (>0.90).
Collinear variables can be combined into one composite variable if the correlation is around 0.80.

Mediation Analysis

Mediation analysis tests whether the effect of the predictor on the outcome exists because of another variable.
It reveals whether the third variable is relevant by reducing the effects of the predictor on the outcome.

Mediation Analysis Components

Indirect effects: the effects of the predictor on the outcome through the mediator variable.
Path a: the effect of the predictor on the mediator.
Path b: the effect of the mediator on the outcome.
Path c: the effect of the predictor on the outcome.

Calculating Indirect Effects

The standardised coefficient for the indirect effect is calculated by multiplying the regression coefficients for path a and path b.

Moderation Analysis

Moderation analysis establishes whether the relationship between the predictor and the outcome changes as a function of the level of a third variable (the moderator).

Moderation Analysis Components

The outcome is predicted by the predictor variable, the proposed moderator, and the interaction of the two.
The interaction effect indicates whether moderation has occurred.

Displaying Moderation Analysis Results

A simple slope graph is used to illustrate the effect of the predictor, the effect of the moderator, and the interaction/moderation effect.

PROCESS and Moderation Analysis

PROCESS outputs do not provide standardized regression coefficients when an interaction is present.
PROCESS requires complete data and a continuous outcome variable for moderation analysis.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.