Summary

This document is a summary of key takeaways, steps, rules, and important figures for the MMSR (MAN-MMA032A) course. It covers topics such as multivariate methods, data analysis, and regression analysis.

Full Transcript

Total overview - MMSR (MAN-MMA032A): summary of key take-aways, steps, rules, and important figures geschreven door vaymelis www.stuvia.com Gedownload door: jorisvanold...

Total overview - MMSR (MAN-MMA032A): summary of key take-aways, steps, rules, and important figures geschreven door vaymelis www.stuvia.com Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen MMSR Inhoud Overview of multivariate methods.........................................................................................................2 Examining your data...............................................................................................................................3 Rules...................................................................................................................................................3 Figures................................................................................................................................................4 Exploratory and confirmatory factor analysis.........................................................................................6 Rules...................................................................................................................................................7 Figures................................................................................................................................................8 (M)AN(C)OVA.......................................................................................................................................11 Rules.................................................................................................................................................12 Figures..............................................................................................................................................13 Multiple and logistic regression analysis..............................................................................................17 Rules.................................................................................................................................................18 Figures..............................................................................................................................................19 Partial Least Squares Structural Equation Modeling (PLS-SEM)............................................................23 Rules.................................................................................................................................................24 Figures..............................................................................................................................................24 Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Overview of multivariate methods Multivariate analysis = any simultaneous analysis of more than two variables. Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Examining your data Imputation method = process of estimating the missing data of an observation based on valid values of the other variables. Possibilities:  Cold-deck imputation: from data outside your database  EM: maximum likelihood (MAR)  Hot deck imputation: from existing observation deemed similar  Mean substitution: substituting by means  Multiple imputation: MAR  Regression imputation: calculating it based on regression models In case you are not going to replace the data with values, two options are possible:  Complete case approach: handling missing data based on complete cases, cases with no missing data. Also known as the listwise deletion approach.  All-available approach: handling missing data based on all available data, also known as pairwise approach. Positive kurtosis means a steep line, negatively kurtosis is a flatter line. Positively skewed is many observations on the left, negatively skewed is many observations on the right. The threshold values for kurtosis and skewness are -3 to 3 (without dividing it by the standard error!). Always consider both the practical and substantive impact of your missing data. Rules  How much missing data is too much? o Over 10% o Under 10% is acceptable, but assess the MAR/MCAR  When is a value an outlier? o For small samples a standard score of 2.5 o For large samples a standard score of 4 o If standard scores are not given, use the threshold values with standard deviations  Normality o Above 200 sample size, normality is often okay o Skewness and kurtosis values between -3 and 3. Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Figures Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Exploratory and confirmatory factor analysis Types of validity:  Content validity (=face validity)  Construct validity o Convergent validity: degree to which the measures of one construct (that thus should be related) are actually related to each other. o Nomological validity: scale is able to predict other concepts in a theoretical model. o Discriminant validity: degree to which constructs or measures that should be unrelated are actually unrelated. Oblique rotation does not remain at an angle of 90 degrees, whereas orthogonal rotation does. Common factor analysis takes in consideration the common and unique variance of a factor. In the correlation matrix, you will find commonalities on the diagonal. The goal here is to subtract underlying dimension and their common variance. Best method for theoretical applications. Principal component analysis takes the total variance in consideration. In the correlation matrix, you will find unities on the diagonal. The goal here is to have a minimal number of factors with a maximum of variance explained. Best method for data reduction. Factor analysis is a metric method, thus all variables should be of metric scale. Q-factor analysis is based on grouping cases/clusters, whereas R-factor analysis is based on grouping variables. Three ways of using factor outcomes are:  Surrogate variable = highest loading variable will represent the factor on its own.  Factor score = take together all loadings on a factor and create a score out of it (orthogonal oriented).  Summated scale = select variables that can be addressed to one factor and combine them into a new variable (non-orthogonal oriented). Unidimensional = a variable loads only and fully (factor loading = 1) on one factor. X2 = goodness of fit SRMR = badness of fit Under identified = more parameters to estimate than unique terms Just identified = the same amount parameters to estimate as unique terms (required) Overidentified = less parameters to estimate than unique terms  Three indicator rule: have at least three indicators per factor (to be at least just identified) Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Rules  Required sample size: o 4/5 respondents per variable, or o At least 50 respondents in total, preferably more than 100  Can I perform a factor analysis? o Barlett’s test of sphericity should be significant (p < 0.05) o KMO (MSA) should be at least 0.50. The closer to 1, the better (indicates the correlations)  Determining number of factors: o Kaiser rule = eigenvalue / latent root > 1 o Percentage of variance = cumulative% (right hand on the table) > 60% o Scree plot (on or before the knik) o A priori determinations o Split-half reliability  Rotating factors: o Oblique rotation when there are correlations, therefore, at least one value in correlation matrix above |0.30| o Orthogonal rotation in case there are no correlation, therefore, all values in correlation matrix are below |0.30|  Interpreting factors (loadings): o To speak of a significant factor loading, the loading should be minimal 0.50. o Communalities should strictly be above 0.50, but last year we worked with 0.20.  Double loaders: o Square the factor loading and divide the highest value by the one least highest value. Is the ratio above 2? Then no double loader. Below 1.5 is a double loader. In between 1.5 and 2 is a grey area.  Scale reliability: o Cronbach’s alpha should be > 0.70.  For confirmatory factor analysis: o Convergent validity (AVE) > 0.50 o Construct reliability > 0.70 (=cronbachs alpha) Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Figures Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen (M)AN(C)OVA Univariate:  An(c)ova o One way anova o N-way anova o Repeated measures anova Multivariate:  Man(c)ova For both: dependent variable is metric, independent variable (factor/treatment) is non-metric. The categories of independent variables are also referred to as levels. To insert a metric independent variable, we use covariates. Covariate is something that is not manipulated (like a treatment) but can just be measured (like prior knowledge). Covariates reduce error. When covariates are involved in a MANOVA model analyze the model both with and without the covariates. If the covariates do not improve the statistical power or have no effect on the significance of the treatment effect, then they can be dropped from the final analysis. Assumptions: 1. Independency of observations (only dependency for repeated measures anova) 2. Homogeneity of variance/covariance matrices; for all treatment groups a. Levene’s test for variances matrices  Do not reject H0 b. Box’s M test for covariances matrices  Do not reject H0 3. Normality of the dependent variables 4. (Linearity of dependent variables  test with regression) 5. (Multicollinearity  test with bartlett’s test of sphericity) 6. (Sensitivity to outliers) If an interaction effect is non-significant, use the main effects. If an interaction effect is significant, decide (based on a plot) whether it is an ordinal or disordinal interaction. In case it is an ordinal interaction, you still need to describe the main effect for each level of the treatment (by means of a post-hoc analysis). In case it is a disordinal interaction, this will interfere with the interpretation of main effects (thus, do no interpret them). Within disordinal interactions, we divide between non- crossover and crossover. F statistic = between group variance (SSx) / within group variance (SSerror) SSx = (deviations group mean from grand mean)2 SSerror = (deviations from group means)2 SSy (total variance in Y) = (deviations from grand means) 2 In case the independent variable has more than two groups, we compare multiple means for finding which exact means differ from each other (=contrasts), by means of:  A priori / planned comparisons o Deviation (group mean vs. grand mean) o Simple (group mean 1 vs. group mean 2)  Post hoc test o Tukey  homogeniteity (=equal variances) and equal group sizes (biggest group/smallest group must be smaller than 1,5 = equal group) o Hochberg  homogeniteity and unequal group sizes o Games-Howell  heterogeneity Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Confounders impact both the treatment and the control variables, but are not included in the model (external influence). Effect sizes are measured by:  Eta2 = strength of effect from x on y.  Multiple eta2  Partial eta2  Omega2 APA notatie F(2, 15) = 16.88, p <.001 Bij F tussen haakjes noteer je eerst df between en dan df within (=df error) Rules  Sample size: o Cells/groups are formed by the combination of independent variables. One variable with three categories and one with two categories thus results in 3x2=6 cells/groups. o Minimum size per group must be greater than the number of dependent variables o Recommended is 20 observations per cell (group)  Levene’s test: o Should be not significant for equal variances o In case it is significant; no problem in case group sizes are equal. If group sizes are not equal, continue by using the Welch statistic instead of the F statistic.  Normal distribution: o Skewness and kurtosis in range of -3 to 3, or o No problem if there are at least 30 observations per group  Effects: o F-statistic for between groups (main effects) or interaction effect need to be significant (p < 0.05) for there to be an effect. Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Figures Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Multiple and logistic regression analysis Dependent variable (DV) = Y = criterion variable Independent variable (IV) = X = predictor variable Methods for selecting the variables for inclusion in the regression model:  All-possible-subsets regression: run all models with all separate IV’s and combination of IV’s.  Sequential Search Methods: o Backward elimination: start with all IV’s and start eliminating the insignificant ones. (Data driven, thus not used much (as we want to work theory driven)). o Forward addition: adding one IV at the time based on its significance. (Data driven, thus not used much (as we want to work theory driven)). o Stepwise estimation: start with the best predictor of the DV and add IV’s selecting on their additional incremental explanatory power. Variables are not removed once included in regression equation. o Hierarchical: adding sets of IV’s after sets of IV’s (for example with control variables). Assumptions: 1. Normality 2. Linearity 3. Constant variance of error terms (homoscedasticity) 4. Independence of error terms 5. Appropriate sample size (most important!) Multiple regression analysis can be used for both explanation and prediction. You always want the chosen variables to be theory-driven. Theory is very important for MRA! Standard regression formula (for estimation): Y = B0 + B1X1 + B2X2 + … B0 = intercept. This is the constant term, so the value of Y if all IV’s are absent. Bn = beta/regression coefficient = a standardized measure to compare the impact of the associated variables:  Beta coefficient = standardized, express relative importance among IV’s  Regression coefficient = describes absolute change in DV for an increase in the IV E = residual/error Measures for influence of a single observation:  COVRATIO  on the entire set of estimated regression coefficients  DFBETA  on the change in a regression coefficient  DFFIT  on the overall model fit Influential observation = an observation that has a disproportionate influence on one or more aspects of the regression estimates. The amount of dummy variables you need is the amount of levels of the independent variable minus 1. If you code the dummy 0/1, you use indicator coding which will get you the group differences in the dependent variable from the reference category. If you code the dummy -1/1, you use effects coding and you will get group differences on the dependent variable from the overall mean of the dependent variable. Partial correlation coefficient = measures the strength of the relationship between the dependent variable and a single independent variable, when the effects of the other IV’s are held constant. Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Partial F (or t) values = gives the additional contribution of each variable above all others in the equation (cause if you add the last IV normally, the R2 would not go up much because of collinearity, but that doesn’t mean the IV has a weak impact on Y). Partial regression plot = graphical representation of the relationship between the DV and one IV. Prediction error (residual) = difference between actual and predicted values of the DV. Standardization = process whereby the original variable is transformed into a new variable with a mean of 0 and a standard deviation of 1. Assessing the significance of a polynomial or interaction term is accomplished by evaluating incremental R2, not the significance of individual coefficients, due to high multicollinearity. Adjusted R2 is which we assess, this one takes complexity in consideration as well. A moderator in MRA is inserted as a direct effect, and as an interaction term with the original IV. If both are metric, first mean-center them before multiplying one with another. If the moderator is dichotomous, just multiply them. Logistic regression is used when a DV is binary: non-metric with two categories. See example in figures. Rules  Sample size o Simple regression (1 IV); 20 o Multiple regression (more IV’s); minimal 50, preferably 100. o At least 5 observations per variable, but preferably 15/20  Multicollinearity o Tolerance values may not be below 0.10, or o VIF should be below 0.10 o Bivariate correlations of 0.70 or higher  F change o Must be significant significant Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Figures Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Partial Least Squares Structural Equation Modeling (PLS- SEM) SEM performs a series of dependence relationships simultaneously. Within SEM we distinguish: theoretical (structural) vs. observational (measurement) language. Correspondence rules define how the observable variables lead to the theoretical construct. The measurement model can be either reflective (latent) or composite (formative/emergent). Within formative constructs, multicollinearity between indicators should be minimal (therefore, internal consistency measures like Cronbach’s alpha do not have to be assessed!), whereas with reflective constructs it should be high. The following measures should be assessed: Reflective Composite Construct reliability Cronbach’s alpha / Dillon-Goldstein’s X rho / Dijkstra – Henseler’s rho (0-1) Indicator reliability Indicator loading2 (0-1) X Convergence validity AVE (>.50) X Discriminant validity HTMT (.50? Indicator relevance X Based on theorization Nomological validity X How well other constructs are measured based on this constr. External validity X If the transformation to a reflective construct would give similar outcomes Convergent validity = within construct collinearity Discriminant validity = between constructs exclusiveness Covariance based SEM vs. variance based SEM. Covariance is more confirmatory approach, you analyze the covariance matrix, most of the time maximum likelihood (ML) is the used algorithm. Variance based SEM is more an exploratory approach, you look at the correlation matrix, most of the time partial least squares (PLS) is used. The most used covariance based SEM is SEM-PLS, which is especially useful for and of which the main goal is making predictions. In terms of assumptions: SEM-PLS can work with non-normal data and with heteroscedastic data. It can also contain metric and non-metric data (work with dummies). It works well with both reflective and formative measurement models. Concluding, almost no requirements. Saturated model fit tests whether the predicted outcomes by the model differ from the observational outcomes. You don’t want a difference between these measures, therefore, the test of model fit should be non-significant (p>.05) R2 = coefficient of determination. How much of the variance of the endogenous construct is explained by the model. Acceptable size depends on the context. Q2 = predictive power (assessed by blindfolding). Should be > 0.0. F2 = effect size: 0.02 (small), 0.15 (medium), 0.35 (large) of an exogenous construct on the endogenous construct. Measured as: f2 = R2 included – R2 excluded / (1-R2 excluded) Within the structural model, we do not want the predictor constructs to have a high multicollinearity. Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Stuvia - Koop en Verkoop de Beste Samenvattingen Rules  Sample size o Preferably over 100, but at least 10 times the maximum arrowheads to one construct.  Every latent variable needs at least one assigned indicator. And each indicator can only be assigned once to a specific construct.  Indicator loadings o Minimum of.708 (HBAT) o Squared loadings (also referred to as item reliabilities) should have a minimum 0.50.  Validity o Convergent validity (=AVE) >0.50 (=unidimensionality). Below, the collinearity within the construct would be too little. o Discriminant validity (=HTMT).70 but <.90/.95. For exploratory >.60. (Same for other measures of construct reliability)  Indicator reliability o Should be above.50?  Model fit o We don’t want our estimation to be different than the empirical observation. Therefore, we don’t want this effect to be significant (>0.05) o SRMR (approximate model fit) < 0.08 Figures Gedownload door: jorisvanoldenbeek | [email protected] € 912 per jaar Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. extra verdienen? Powered by TCPDF (www.tcpdf.org)

Use Quizgecko on...
Browser
Browser