Correlation & Regression Analysis PDF
Document Details
Uploaded by AdventuresomeLimit
Tags
Summary
This document provides an overview of correlation and regression analysis, including definitions, concepts, and examples. It covers simple and multiple linear regression, as well as techniques for analyzing categorical variables.
Full Transcript
Correlation Analysis Correlation doesn't imply causation Correlation Analysis Correlation analysis: Measures the strength of the linear relationship between 2 variables Pearson Correlation Coefficient (r) ○ Measure the degree of Linear association between two continuous...
Correlation Analysis Correlation doesn't imply causation Correlation Analysis Correlation analysis: Measures the strength of the linear relationship between 2 variables Pearson Correlation Coefficient (r) ○ Measure the degree of Linear association between two continuous variables (x and y) Population correlation (p) or sample correlation r lies between -1 and 1 ○ Values on the left side of the scale means negative Therefore when one variable decrease the other increases ○ Positive on the right side Variables on the right side mean variables go up together or down together ○ If sample correlation (r ) or population correlation P is equal to 0, then there is absence of linear association ○ Called scatter plots Scatter Plots What is considered weak or strong correlation effect size ○ Weak correlation effect +- 0.1 ○ Moderate correlation effect +- 0.3 ○ Strong correlation effect +- 0.5 Correlation Analysis Example We can not state there is a independent variable or dependent variable We can not state there is a direction between one variable for another just correlation We cannot study directionality meaning we do not hypothesize a case effect direction Regression Analysis Even a regression analysis can measure only the nature and degree of association or (covariation) between variables To establish causation, along with mathematical models (regression), one needs underlying knowledge, theories and accounting for confounders Regression analysis: Statistical Technique used to related two or more variables ○ The goal is to create a regression model that relates one or more independent variable (IV) to the effect or change of the dependent variable (DV) ○ The independent variable can be either continuous or categorical, But, the DV can only be a Continuous variable ○ We use regression analysis to explain and predict the variable of interest, the dependent variable Applications of Regression Analysis ○ How does word of mouth impact sales of boxed wine? ○ Whether and how do cross promotions and pricing impact the use of Amazon’s videos streaming service ○ Is the effect of advertising on brand equity for Costco different for traditional vs digital media? Regression Equations Simple Linear Regression: when only one independent variable in the model (X variable) One one X represent the IV - Can only have 1 beta 1 B1, because there is only one IV - Coefficient, the estimated parameter that goes together with the variable Multiple ,linear regression when there is more than one independent variable (more than one X variable) There are multiple Xs meaning multiple IV Interpreting the Regression Parameters Testing Significance of Regression Parameters ★ If beta coefficient is 0, there is no effect of X and Y ★ If beta coefficient is significant different than 0, then there was an effect of x and y SImple linear regression Model - If beta is no equal to there then, X (IV) is impact Y in the simple linear equation model Multiple Linear Regression Model - Instead of only having one X, we have multiple IV, - Therefore we have one coefficient parameter (Beta) for each Independent variable and we need to test of each of them - For null hypothesis : beta is = 0 - Alternative hypothesis: not all betas are equal to zero ➔ There is at least one beta coefficient not equal to 0 Remember we always want lower p-values Coding Categorical Variables Only 2 categorical variables Multiple categorical variables - I have 3 categories, and i need to create a dummy variable - We code n-1 categories ➔ 3-1= 2, so we can only create 2 dummy variables and 1 will be chosen as the baseline ➔ Heavy user will be the baseline, became there is a row of 0s ➔ Light user and moderate use will be the dummy each one has a 1 and 0, not a double zero like in heavy user Simple Regression Analysis Example Regression equation explanation ○ Recommendation= Beta zero( the intercept) + plus Beta one, related to variable satisfaction + error Multiple Regression Analysis Example - When all the IVs, equal 0, the mean/average level of satisfaction is equal to 2.5 (when nothing is influencing the dependent variable - Beta 5, regarding the service: customer satisfaction increases by 2.5 units when “services increase by 1 unit - This was measured as not continuous, it was measured by categorical Interpreting a Regression Output Overall summary Regression Statistics ○ How well the regression model applies to the data we have at hand ○ R Square: Measure of model fit. This is how well the model fits the data R square ranges between 0-1, closer to one means the better the fitting model Base on the table we have.94, means that 94% of the variance on the dependent variable (Y) is explained by the independent variables (Xs) ○ Adjusted r-square: is the better metric, bc is accounts for shared variance ○ Adjusted r square and r-square should be the same ANOVA ○ F: F Statistic test, tests whether having a regression models for HA (alternative hypothesis) is better than having a null model (HO) ○ Significance F (is the P-Value): As a p-value related to F test is