Document Details

ExquisiteOnyx982

Uploaded by ExquisiteOnyx982

Partido State University

Rina Abner-Puerta

Tags

correlation analysis statistical analysis software application learning module

Summary

This learning module provides information about statistical correlation analysis with software application. It covers definitions, assumptions, and interpretation of correlation coefficients. The learning module is focused on correlation analysis using Gretl software.

Full Transcript

STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION CORRELATION with software application LEARNING MODULE Rina Abner-Puerta, DBA, CPA This learning material has adopted various resources, offline and online. Some discussions were lifted verbatim. The sources are properly recognized in...

STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION CORRELATION with software application LEARNING MODULE Rina Abner-Puerta, DBA, CPA This learning material has adopted various resources, offline and online. Some discussions were lifted verbatim. The sources are properly recognized in the references section. This material does not intend to infringe any copyrights and is for educational purposes alone. 2 AE9: Statistical Analysis with Software Application College of Business and Management Chapter 5 Correlation Analysis Name of Student: ______________________ Week Number: 10-11 Course Code: AE19 Name of Faculty: Rina A. Abner-Puerta Course Title: Statistical Analysis with Software Application I. OBJECTIVES This learning material has the following objectives: 1. Define correlation. 2. Enumerate the assumptions in doing a correlation analysis. 3. Interpret correlation coefficients. 4. Explain the test of significance. 5. Differentiate correlation from causation. 6. Perform correlation analysis using the Gretl software. II. LESSON Introduction Various fields of discipline use correlation analysis in interpreting the results of different studies, and our field, social science, is not an exception to that. In your future work, you may be required to conduct research studies or even simple endeavors that will involve analysis of relationships or associations. As such, it is important that you substantially understand how correlation works. In your previous statistics class, you may have already met some of the terms that are included in this module. We will review some of the basic concepts and add some more that may have not been covered in your previous classes. Further, we will be using the Gretl software to aid us in drawing inferences using correlation analysis. In the previous modules, you were asked to gather some data about the Philippine PLCs. For instance, the activities for Module 3 required you to Get data about the PLCs’ market capitalization, basic EPS, the total number of board directors, and independent directors' proportion to the total number of directors. As an example, we could ask whether there is a significant association between the total number of the board of directors and the company’s EPS. We may also test whether there is a significant relationship between the proportion of independent directors and the company’s EPS. Using correlation analysis, we may be able to answer those questions. However, it is important to emphasize as early as this stage that correlation does not imply causality. Meaning, that if we find out that there is a positive and significant relationship between the proportion of independent directors and the company’s EPS, it does not tell us that the increase in EPS is caused by having more independent directors. Correlation does not do that. We will be talking about causality in the next module, which is about regression analysis. AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module 3 AE9: Statistical Analysis with Software Application Definition and Assumptions of Correlation, and the Correlation Coefficient Correlation is “a term used to denote the association or relationship between two (or more) quantitative variables” (Gogtay & Thatte, 2017, p. 78). When we perform correlation analysis, three important measures are involved – strength, extent, and direction of relationship or association. The foundation of correlation analysis has been described by Akoglu (2018) As “the most basic form of mathematically connecting the dots between the known and unknown forms” (p. 91). When we talk about correlation analysis, it is essential that we fully understand the correlation coefficient, whose value ranges from -1 to +1. The correlation coefficient tells us the strength and direction of the relationship found. Several interpretations have been proposed as a basis to interpret the correlation coefficient. Figure 1 below was provided by Gogtay and Thatte (2017) In their article. Figure 1. The spectrum of correlation coefficient (adopted from Gogtay and Thatte, 2017). While Akoglu (2018) summarized the interpretations proposed by Dancey and Reidy, Quinnipiac University, and Chan YH. Table 1 shows such. AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module 4 AE9: Statistical Analysis with Software Application The correlation analysis typically starts from the construction of the scatterplot or scatter diagram. Hopefully, you can still recall what it looks like. If not anymore, it would typically look like Figure 2. Scatterplot of resiliency and economic dynamism 25 Y = 12.2 + 0.722X 20 Resiliency 15 10 5 0 2 4 6 8 10 12 Economic Dynamism Figure 2. Scatterplot of resiliency and economic dynamism. If you can still recall, in Module 4, you were asked to run some tests related to resiliency and economic dynamism. Figure 2 shows the scatterplot of resiliency and economic dynamism generated through the Gretl software. Can you see the relationship? The closer each point in the plot is to the straight line, the stronger the linear relationship between resiliency and economic dynamism. The line is called the regression line or least square line. By the way, a scatterplot like that can also be done through MS Excel. You may try it if you want. When we construct the scatterplot, we can have a visual presentation and see easily whether there are some outliers. Take note that some outliers can significantly affect the correlation results. In the scatterplot, we can see each point, and that’s one good thing about it. However, it does not give us a single value that we can use as a base in interpreting the results of the correlation. As such, it is necessary to compute the correlation coefficient, which is the “single value or number which establishes a relationship between the two variables being studied” (Gogtay & Thatte, 2017, p. 80). Based on the proposed interpretation of the correlation coefficients (Table 1 and Figure 1 above), we can give meaning to the figure. Pearson, Spearman, and Kendall Correlation The most common formula used to derive the correlation coefficient is Karl Pearson’s product-moment correlation coefficient, or simply Pearson’s correlation denoted by r. Although we are not going to use the formula manually, let’s still recall such. AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module 5 AE9: Statistical Analysis with Software Application Where x is the mean of variable x values, and y is the mean of variable y values. Gone are the days when you have to compute the value of r manually. We can easily do it in Gretl. Look at the following figures: Step 1 Step 2 Step 3 AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module 6 AE9: Statistical Analysis with Software Application As you can see in figure for the Step 3, the correlation coefficient is 0.31310442. That’s the direction and strength of the relationship. It’s positive, meaning as the economic dynamism of the locality goes up, its resiliency also goes up. If we are going to use the interpretations that we have provided earlier, we can say that we found a weak positive correlation between economic dynamism and resiliency. At this point, let’s point out that the Pearson correlation is applicable if the following assumptions are met by the sample: 1. The relationship between the variables is linear 2. The variables are independent of each other 3. The variables are normally distributed. The Pearson correlation is a parametric test, which means that there are statistical assumptions that must be satisfied. If the sample data does not meet the above assumptions, it would be better if the other measures of correlation were used, such as Spearman’s rank correlation (Spearman’s rho denoted by ρ) and Kendall’s tau, which is an extension of the Spearman’s rho. The following is the formula for Spearman’s rho. Where: ρ = Spearman rank correlation coefficient di = the difference between the ranks of corresponding values Xi and Yi n = number of values in each data set. The following is the formula for Kendall’s tau: Where: τ = Kendall rank correlation coefficient nc = number of concordant (Ordered in the same way) nd = Number of discordant (Ordered differently) At this point, you don’t need to worry about those formulas. It’s easy to perform in Gretl. Before we finally click Spearman’s rho and Kendall’s tau, let’s test our data from normality. The most common measure of normality is the Kolmogorov Smirnov. Look at the following: AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module 7 AE9: Statistical Analysis with Software Application Step 1 Step 2 In Gretl, the measures of normality that we have are the Doornik-Hansen test, Shapiro-Wilk W, Lilliefors test, and Jarque-Bera test. The normality test is done against the null hypothesis that the data are normally distributed. As such, if the p-value is lower than 0.05, we would typically conclude that the data is not normally distributed, as we have to reject the null hypothesis. In our case, all p-values are AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module 8 AE9: Statistical Analysis with Software Application less than 0.05. Thus, even if we are compliant with the numbers 1 and 2 parametric assumptions, we still couldn’t use Pearson’s correlation. Consequently, we proceed with running the correlation test using the Spearman and Kendall correlation. Refer to the following: Step 1 Step 2 S t e p 3 Using Spearman’s rho, we can see from the figure for Step 2 that the correlation coefficient is 0.36816679, a positive weak correlation. Next, let’s try Kendall’s tau. AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module 9 AE9: Statistical Analysis with Software Application Step 1 Step 2 S t e p 3 The correlation coefficient generated by Kendall’s tau formula through Gretl is 0.25459692, which still can be considered a weak positive. The sample that we used is not small, as such, we may rely more on the coefficient correlation that was generated using Spearman’s rho. Testing for Significance If you have observed, all of the figures generated by Gretl for the Pearson, Spearman, and Kendall correlation also contain p-values. Hopefully, you can still recall how we interpret a p-value. Refer to the previous figures, and you will see that we have a p-value of 0.0000 for all the correlation tests that we ran. At this point, it is important to note that it is not just the correlation coefficient that we have to interpret but also the p-value, which is what we call testing for significance. It answers the question of how reliable the correlation analysis is (Gogtay & Thatte, 2017). Simply put, it tells us how AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module 10 AE9: Statistical Analysis with Software Application likely the pattern of our data is due to chance. Zaid (2015) explained that in statistics language, “significant” means probably true and not due to chance. It is also to be emphasized that relationships may be true but not significant. The most common confidence degree (which we have already learned in the previous modules) is 95%, which implies that we are 95% confident that the result of our test is true and not due to chance. It also implies that our error term or alpha level is 5% (the finding has a 5% chance of not being true), which is the percentage shown in most of the statistical software (we have three – 1%, 5%, and 10%). If we go back to our results, we can conclude that our finding is significant even at a 1% alpha level, which implies that we have very strong evidence that economic dynamism and resiliency are significantly correlated (related or associated), and such association is weak. Correlation versus Causation There is confusion among the researchers of the concept of correlation and causation. Correlation tells us that the variables are related or associated, but it does not tell us that the other variable causes the change in the other variable (causation). Let’s adopt the example of Akoglu (2018) – “As the ice-cream sales increase, the rate of deaths from drownings, and the frequency of forest fires increase as well. These facts happen at the same period, don't cause one another” (p. 91). We may find a positive association between ice cream sales and the number of deaths from drowning, but we cannot conclude that the increase in ice cream sales causes an increase in deaths from drowning. There could be another variable that causes changes in the number of deaths from drowning. This is one major limitation of the correlation analysis – it does not tell us about causality. As such, if we are conducting a study, it would be shallow if what we have performed is only correlation analysis, especially if it is causation that we want to find out. This opens us to the next topic, which is about regression. III. ACTIVITIES Hands-on activity IV. ASSESSMENT A quiz will be given. V. REFERENCES Akoglu, H. (2018). User’s guide to correlation coefficients. Turkish Journal of Emergency Medicine, 18(August), 91–93. https://doi.org/10.1016/j.tjem.2018.08.001 Gogtay, N. J., & Thatte, U. M. (2017). Principles of correlation analysis. Journal of The Association of Physicians of India, 65(March), 78–81. Retrieved from https://www.kem.edu/wp-content/uploads/2012/06/9- Principles_of_correlation-1.pdf Zaid, M. A. (2015). Correlation and Regression Analysis. The Statistical, Economic and Social Research and Training Centre for Islamic Countries. AE9: STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION | Learning Module

Use Quizgecko on...
Browser
Browser