Correlation Explained PDF

# Chapter 7: Correlation ## Meaning of Correlation Correlation is the relationship that exists between two or more variables. If two variables are related to each other in such a way that change in one creates a corresponding change in the other, then the variables are said to be correlated. Some examples of such relationships are as follows: - Relationship between the heights and weights - Relationship between the quantum of rainfall and the yield of wheat - Relationship between the price of commodity and demand of commodity - Relationship between the age of husband and age of wife - Relationship between the dose of insulin and blood sugar - Relationship between the age of individuals and their blood pressure - Relationship between the advertising expenditure and sales Correlation is sometimes termed as "Covariation." The measure of correlation is called the coefficient of correlation. ## Meaning of Correlation Analysis Correlation Analysis is a statistical technique used to measure the degree and direction of relationship between the variables. ## Uses of Correlation The uses of correlation are as follows: 1. Economic theory and business studies show relationships between variables like price and quantity demanded, advertising expenditure and sales promotion measures etc. 2. Correlation analysis helps in deriving precisely the degree and the direction of such relationships. 3. The effect of correlation is to reduce the range of uncertainty of our prediction. The prediction based on correlation analysis will be more reliable and near to reality. 4. Correlation analysis contributes to the understanding of economic behaviour, aids in locating the critically important variables on which others depend, may reveal to the economist the connections by which disturbances spread and suggest to him the paths through which stabilizing forces may become effective. 5. The measure of coefficient of correlation is a relative measure of change. ## Positive and Negative Correlation Depending upon the direction of change of the variables, correlation may be positive or negative. ### 1. Positive Correlation If both the variables vary in the same direction, correlation is said to be positive. In other words, if one variable increases, the other also increases or, if one variable decreases, the other also decreases, then the correlation between the two variables is said to be a positive correlation. ### 2. Negative Correlation If both the variables vary in opposite direction, the correlation is said to be negative. In other words, if one variable increases, but the other variable decreases or, if one variable decreases but the other variable increases, then the correlation between the two variables is said to be a negative correlation. ## Simple and Multiple Correlation Depending upon the study of the number of variables, correlation may be simple or multiple. ### 1. Simple Correlation When only two variables are studied, it is a case of simple correlation. For example, when one studies the relationship between the yield of wheat per acre and the amount of rainfall, it is a problem of simple correlation. ### 2. Multiple Correlation When three or more variables are studied, it is a case of multiple correlation. For example, when one studies the relationship between the yield of wheat per acre, the amount of rainfall and the amount of fertilizers used, it is a problem of multiple correlation. ## Partial Multiple and Total Multiple Correlation Multiple correlation may be either partial or total. ### 1. Partial Multiple Correlation In case of Partial Multiple Correlation, one studies three or more variables but considers only two variables to be influencing each other and the effect of other influencing variables being held constant. Its order depends on the number of variables which are held constant e.g. if one variable is kept constant, it is called First order Partial Correlation. ### 2. Total Multiple Correlation In case of Total Multiple Correlation, one studies three or more variables without excluding the effect of any variable held as constant. ## Linear and Non-Linear (or Curvi-Linear) Correlation Depending upon the constancy of the ratio of change between the variables, the correlation may be Linear or Non-Linear: ### 1. Linear Correlation If the amount of change in one variable bears a constant ratio to the amount of change in the other variable, then correlation is said to be linear. If such variables are plotted on a graph paper all the plotted points would fall on a straight line. ### 2. Non-Linear (Curvilinear) Correlation If the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable, then the correlation is said to be non-linear. If such variables are plotted on a graph, the points would fall on a curve and not on a straight line. For example, if we double the amount of advertising expenditure, the sales would not necessarily be doubled. ## Scatter Diagram Method Scatter Diagram is a diagrammatic representation of bivariate data to ascertain the correlation between two variables. **Practical Steps Involved in the Preparation of a Scatter Diagram** 1. Show one of the variables say X along the horizontal axis OX and the other variable Y along the vertical axis OY. 2. Plot a dot for each pair of X and Y values on a graph paper. 3. Observe the scatter of the plotted points and form an idea about the degree and direction of correlation. **Interpretation** - **Perfect Positive Correlation**: If all the plotted points lie on a straight line rising from the lower left-hand corner to the upper right-hand corner, correlation is said to be perfectly positive (i.e. r = + 1). - **Perfect Negative Correlation**: If all the plotted points lie on a straight line falling from the upper left-hand corner to the lower right-hand, correlation is said to be perfectly negative (i.e. r = -1). - **High Degree of Positive Correlation**: If the plotted points fall in a narrow band and show a rising tendency from the lower left-hand corner to the upper right-hand corner, it indicates the existence of a high degree of positive correlation. - **High Degree of Negative Correlation**: If the plotted points fall in a narrow band and show a declining tendency from the upper left-hand corner to the right-hand corner, it indicates the existence of a high degree of negative correlation. - **Low Degree of Positive Correlation**: If the plotted points are widely scattered over the diagram and show a rising tendency from the lower left-hand corner to the upper right-hand corner, it indicates the existence of a low degree of positive correlation. - **Low Degree of Negative Correlation**: If the plotted points are widely scattered over the diagram and show a declining tendency from the upper left-hand side to the lower right-hand side, it indicates a low degree of negative correlation. - **No Correlation**: If the plotted points lie on a straight line parallel to the x-axis or y-axis or in a haphazard manner, it indicates the absence of any relationship between the variables (i.e. r = 0). ## Merits of Scatter Diagram - It is a simple and non-mathematical method of studying correlation between the variables. - It is not influenced by the size of extreme items. - It is the first step in investigating the relationship between two variables. - It gives an indication of the degree of linear correlation between the variables. ## Limitations of Scatter Diagram Scatter diagram is only a very rough measure of correlation since the exact magnitude of correlation cannot be known. To remedy this limitation, there are some precise methods such as Co-variance, Ranks etc. ## Graphic Method Graphic method is a diagrammatic representation of bivariate data to ascertain the correlation between two variables. **Practical Steps Involved in the Preparation of a Graph** 1. Show Time Horizon along the horizontal axis OX and the variable X and Y along the vertical axis OΥ. 2. Plot the dot for each of the individual values of X variable and join these plotted dots to obtain a curve. 3. Plot the dots for each of the individual values of Y variable and join these plotted dots to obtain a curve. 4. Observe both the curves and form an idea about the direction of correlation. **Interpretation** - If both the curves move in the same direction (either upward or downward), the correlation is said to be positive and if both the curves move in the opposite direction, correlation is said to be negative. ## Covariance **Definition** - Given a set of N pairs of observations (X1,Y1), (X2,Y2), ..., (X, Y) relating to two variables X and Y, the covariance of X and Y, usually represented by Cov. (X, Y), , is defined as: - $Cov. (X, Y) = \frac{\sum(X – \bar{X}) (Y- \bar{Y})}{N}$ - or = $\frac{\sum xy}{}$ **Where**: - $\bar{x} = \frac{\sum (X - \bar{X})}{N}$, $\bar{y} = \frac{\sum (Y - \bar{Y})}{N}$ **Properties of Co-variance** - **Independent of choice of origin**: Covariance is independent of the choice of origin. In other words, the value of covariance is not affected even if each of the individual values of X or Y is increased or decreased by some constant. - **Independent of choice of scale**: Covariance is independent of the choice of scale of observations. In other words, the value of covariance is not affected even if each of the individual values of X or Y is multiplied or divided by some constant. - **Covariance can vary from -∞ (Negative infinity) to +∞ (Positive infinity)**. In other words, Covariance may be positive, negative or zero. ## Limitation - Covariance is a direct measure of correlation between two variables but it cannot be used for meaningful measuring of the strength of the relationship between two variables. This is because covariance can take all values from an infinitely large negative to an infinitely large positive value. In the language of mathematics, it can vary from -∞ (Negative infinity) to +∞ (Positive infinity). ## Difference Between Variance and Co-Variance - Variance must always be positive whereas covariance may be positive, negative or zero. ## Karl Pearson's Coefficient of Correlation **Definition** - Given a set of N pairs of observations (X1,Y1), (X2,Y2), ... (X, Y) relating to two variables X and Y, the Coefficient of Correlation between X and Y, denoted by the symbol 'r' is defined as: - $r = \frac{Cov. (X,Y)}{\sigma_x\sigma_y}$ **Where**: - $Cov. (X, Y) = $ Covariance of X and Y - $\sigma_x = $ Standard Deviation of X variable - $\sigma_y = $ Standard Deviation of Variable Y - This expression is known as Pearson's product-moment formula and is used as a measure of linear correlation between X and Y. **Expanded forms of the above formula** - **Expanding the formula of Cov. (X, Y)** - $r = \frac{\sum(X – \bar{X}) (Y- \bar{Y})}{N\sigma_x\sigma_y}$ - [Note: $ Cov. (X, Y) = \sum (X-X) (Y-Y)$] - $r = \frac{\sum xy}{N\sigma_x\sigma_y}$ - [Note: $ Cov. (X, Y) = \frac{\sum xy}{N}$] - **Expanding the formula of Standard Deviation** - $r= \frac{\sum xy}{\sqrt{\frac{\sum x²}{N} × \frac{\sum y²}{N}}}$ - [Note: $\sigma_x = \sqrt{\frac{\sum x²}{N}}$, $\sigma_y = \sqrt{\frac{\sum y²}{N}}$] - $r = \frac{\sum xy}{\sqrt{\sum x² . \sum y²}}$ **Properties of Coefficient of Correlation (r)** 1. **Independent of choice of origin**: The Coefficient of Correlation (r) is independent of the choice of origin. In other words, the value of 'r' is not affected even if each of the individual values of X and Y is increased or decreased by some constant. 2. **Independent of choice of scale**: The Coefficient of Correlation (r) is independent of the choice of scale of observations. In other words, the value of 'r' is not affected even if each of the individual values of X or Y is multiplied or divided by some constant. 3. **Independent of Units of Measurement**: The Correlation Coefficient r is a pure number and is independent of the units of measurement. This means that if, for example, X represents height in inches and Y weight in lbs., then the correlation coefficient between X and Y will neither be in inches nor in lbs. or any other unit, but only a number. 4. **The Correlation Coefficient r lies between – 1 and + 1.** 5. **The Coefficient of Correlation is the geometric mean of two regression coefficients.** - $r = \sqrt{b_{xy} b_{yx}}$ ## Interpretation The value of r lies between – 1 and + 1. | **Value of r** | **Interpretation** | |---|---| | r = + 1 | There exists perfect positive correlation between the variables. | | r = -1 | There exists perfect negative correlation between the variables. | | r = 0 | There exists no relationship between the variables. | | + 0.75 ≤ r < + 1 | There exists high positive correlation between the variables. | | -0.75 ≥r>-1 | There exists high negative correlation between the variables. | | + 0.50 ≤ r < 0.75 | There exists moderate positive correlation between the variables. | | -0.50 ≥ r > - 0.75 | There exists moderate negative correlation between the variables. | | r <+0.50 | There exists low positive correlation between the variables. | | r>-0.50 | There exists low negative correlation between the variables. | ## Assumptions of Karl Pearson's Coefficient of Correlation 1. **Linear Relationship between Variables**: There is a linear relationship between the variables. It means that if pair items of both the variables are plotted on a scatter diagram, the plotted points will form a straight line. 2. **Cause & Effect Relationship**: There is a cause and effect relationship between the forces affecting the distribution of the items of both the series. Correlation is meaningless if there is no such relationship. For example, there is no relationship between no. of crimes and no. of political leaders. 3. **Normality**: The two variables are affected to a great extent by a large number of independent causes so that they form a normal distribution. Variables like demand, supply, price, height, weight etc. are affected by such forces that a normal distribution is formed. ## Merits of Karl Pearson's Coefficient of Correlation 1. Correlation coefficient gives direction as well as the degree of relationship between the variables. 2. Correlation coefficient alongwith other information helps in estimating the value of the dependent variable from the known value of independent variables. ## Limitations of Karl Pearson's Coefficient of Correlation 1. **Assumption of Linear Relationship**: The assumption of linear relationship between the variables may or may not always hold true. 2. **Time Consuming**: Its computation is time consuming as compared to other methods. 3. **Affected by Extreme Values**: It is affected by the value of extreme items. 4. **Requires Careful Interpretation**: It is to be interpreted after taking into consideration other factors as well. The investigator should reach a conclusion based on logical reasoning and intelligent investigation on significantly related matters. ## Practical Steps Involved in the Calculation of Karl Pearson's Coefficient of Correlation When Deviations are Taken From the Actual Mean 1. Calculate the deviations from the actual mean of X series and denote these deviations by x. 2. Square these deviations and obtain the total, i.e. Σx². 3. Calculate the deviations from the actual mean of Y series and denote these deviations by y. 4. Square these deviations and obtain the total, i.e. Σy². 5. Multiply the deviation of each variable of X series by the respective deviation of each variable of Y series and obtain the total, i.e. Σxy. 6. Calculate Coefficient of Correlation as follows: - $r = \frac{\sum xy}{\sqrt{\sum x² . \sum y²}}$ - Or $r = \frac{\sum xy}{\sqrt{\frac{\sum x²}{N} × \frac{\sum y² }{N}}}$ - [Note: This formula has been used in the book] - $r = \frac{N\sum xy - (\sum x . \sum y)}{\sqrt{N\sum x² - (\sum x)²} . \sqrt{N\sum y² - (\sum y)²}}$ - $r = \frac{\sum d_x d_y - \frac{(\sum d_x)(\sum d_y)}{N}}{\sqrt{N\sum d_x² - (\sum d_x)²} . \sqrt{N\sum d_y² - (\sum d_y)²} }$ - $r= \frac{\sum d_x d_y - N(\bar{X} - A_x)(\bar{Y} - A_y)}{N\sigma_x\sigma_y}$ 7. Interpret the Coefficient of Correlation. ## Practical Steps Involved in the Calculation of Correlation Coefficient When Deviations are Taken From the Assumed Mean 1. Obtain the total of values of variable of X series, i.e. ΣΧ. 2. Square the value of each variable of X series, denote these squared values by X² and obtain the total, i.e. ΣΧ². 3. Obtain the total of values of variable of Y series, i.e. ΣΥ. 4. Square the value of each variable of Y series, denote these squared values by Y² and obtain the total, i.e. ΣΥ². 5. Multiply the value of each variable of X series by the respective value of each variable of Y series and obtain the total, i.e. ΣΧΥ. 6. Calculate Coefficient of Correlation as follows: - $r = \frac{\sum xy - \frac{\sum x . \sum y}{N}}{\sqrt{\sum x² - \frac{(\sum x)² }{N} . \sqrt{\sum y² - \frac{(\sum y)² }{N}}}}$ - [Note: This formula has been used in the book] - $r = \frac{\sum xy - \frac{\sum x . \sum y}{N}}{\sqrt{\frac{\sum x²}{N} - \frac{(\sum x)² }{N}} . \sqrt{\frac{\sum y²}{N} - \frac{(\sum y)² }{N}}}$ - $r = \frac{N\sum xy - \sum x.\sum y}{\sqrt{N\sum x² - (\sum x)²} . \sqrt{N\sum y² - (\sum y)²}}$ - $r= \frac{N\sum xy - (\sum x)(\sum y)}{\sqrt{N\sum x² - (\sum x)² } . \sqrt{N\sum y² - (\sum y)²}}$ 7. Interpret the Coefficient of Correlation. ## Practical Steps Involved in the Calculation of Correlation Coefficient for a Bivariate Frequency Distribution [Grouped Data] 1. Prepare a Frequency Distribution Table (if not given). 2. List the class intervals for Y series in the column headings and those for X series in the row headings. [Note: Their order can also be reversed.] 3. Calculate the mid-point of each class-interval of X series and Y series. 4. Calculate the step deviations of variable X and denote these deviations by $d_x$. 5. Multiply the frequencies of the variable X by the deviation of X and obtain the total Σfdx. 6. Take the squares of the deviations of the variable X and multiply them by the respective frequencies and obtain Σfd²x. 7. Calculate the step deviations of the variable Y and denote these deviations by $d_y$. 8. Multiply the frequencies of the variable Y by the deviations of Y and obtain the total Σfdy. 9. Take the squares of the deviations of the variable Y and multiply them by the respective frequencies and obtain Σfd²y. 10. Multiply $d_x$ $d_y$ and the respective frequency of each cell and write the figure obtained in the right-hand upper corner of each cell. 11. Add together all the cornered values as calculated in Step 10 and obtain the total Σfdxdy. 12. Calculate Coefficient of Correlation as follows: - $r = \frac{\sum fd_x d_y}{(\sum fd_x) × (\sum fd_y)}$ - $r = \frac{\sum fd_x d_y}{\sqrt{\sum fd_x² × \sum fd_y²}}$ - $r = \frac{\sum fd_x d_y - \frac{(\sum fd_x)(\sum fd_y)}{N}}{\sqrt{\sum fd_x² - \frac{(\sum fd_x)²}{N}} . \sqrt{\sum fd_y² - \frac{(\sum fd_y)² }{N}}}$ 13. Interpret the Coefficient of Correlation. ## Practical Steps Involved in the Calculation of Rank Correlation When Actual Ranks are Given But Equal Ranks Have Not Been assigned to Some Entries 1. Calculate the differences between two ranks i.e. (R₁ - R₂) and denote these differences by D. 2. Square these differences and obtain the total i.e. ΣD². 3. Calculate Rank Correlation as follows: - $R = 1- \frac{6\sum D²}{N³-N}$ 4. Interpret the Rank Correlation. ## Practical Steps Involved in the Calculation of Rank Correlation When Actual Ranks are Given But Equal Ranks Have Been Assigned to Some Entries 1. Calculate the differences between two ranks i.e. (R₁ - R₂) and denote these differences by D. 2. Square these differences and obtain the total i.e. ΣD². 3. Calculate rank correlation as follows: - $R = 1- \frac{6 [\sum D² + \frac{1}{12}(m³ - m) + \frac{1}{12}(m³ - m) + ...]}{N³-N}$ - Where, m = Number of items whose ranks are common 4. Interpret the Rank Correlation. ## Concurrent Deviation Method **Meaning of Concurrent Deviation Method** - Concurrent Deviation Method is based on the direction of change in the two paired variables. The Correlation Coefficient between two series of direction of change is called Coefficient of Concurrent Deviation. It is given by the formula: - $r_c = ± \sqrt{\frac{2c - n}{n}}$ - Where, $r_c$ = Coefficient of Concurrent Deviation - c = Number of Positive signs after multiplying the direction of change of X series and Y series - n = Number of pairs of observations compared **Significance of using ± sign in the above formula** - To take the underroot of $\frac{2c - n}{n}$, $\frac{2c - n}{n}$ is multiplied with the minus sign to make it positive since one cannot take the underroot of minus item. **Merits of Concurrent Deviation Method** 1. **Simple to understand and Easy to apply**: It is the simplest to understand and easiest to apply as compared to all other methods. 2. **Suitable for large N** This method can be used to form a quick idea about the degree of relationship before making use of more complicated methods when the number of items is very large. **Limitations of Concurrent Deviation Method** 1. **This method does not differentiate between small and big changes**. For example, if X increases from 20 to 21 the sign will be plus and if Y increases from 20 to 2000 the sign will be plus. Thus, both get equal weight when they vary in the same direction. 2. **Approximate**: The results obtained by this method are only an approximate indicator of the presence or absence of correlation. **Practical Steps Involved in Concurrent Deviation Method** 1. Find out the direction of change of X variable by comparing the current value with the previous value and put the sign '+' (in case increasing) or sign '-' (in case decreasing) in the column denoted by $D_x$. 2. Assign the ranks by taking either highest value as 1 or lowest value as 1 in case of both the variables. 3. Multiply the signs of column $D_x$ by signs of column $D_y$ and determine the number of positive signs and denote these numbers by 'c'. 4. Calculate the concurrent deviation as follows: - $r_c = ± \sqrt{\frac{2c - n}{n}}$ ## Lag and Lead in Correlation The time gap before a cause and effect relationship is established is called lag. Where variables do not show simultaneous changes, this time gap must be considered while computing correlation otherwise fallacious conclusions may be drawn. For example, an advertisement expenditure may have an impact on sales after two months. Hence, an advertisement expenditure of April should be linked with the sales for June. ## Practical Steps Involved in the Calculation of Correlation Coefficient 1. Adjust the pair of items according to the time lag. - For example, if it is given that advertisement expenditure has its impact on sales after two months, pairs of items will be adjusted as follows: - Month | Advertisement Expenditure (Rs.) | Sales (Rs.) ------- | -------- | -------- April | 1,00,000 | 50,000 May | 1,20,000 | 50,000 June | 1,40,000 | 60,000 July | 1,60,000 | 80,000 Aug. | 1,80,000 | 1,00,000 Sept. | 2,00,000 | 1,20,000 2. Calculate coefficient of correlation for the adjusted pairs of values according to any of the methods discussed earlier. ## Correlation in Time Series To study correlation in time series (i.e. set of observations in relation to time), correlation should be found out separately for long-term changes and short-term changes of both the series since there may be positive correlation between long-term changes and negative correlation between short-term changes or vice -versa. ## Practical Steps Involved in the Calculation of Correlation of Long-Term Changes 1. Calculate trend values for both series by the moving average method or method of least squares. 2. Calculate coefficient of correlation of the trend values of both the series by any of the methods discussed earlier (viz. Karl Pearson's Coefficient of Correlation). ## Practical Steps Involved in the Calculation of Correlation for Short-Term Changes 1. Calculate the trend values by the moving average method. 2. Calculate short-term fluctuations by deducting the trend values (as per Step 1) from the actual values and denote these short-term fluctuations by the symbol x for X series and y for Y series. 3. Square the short-term fluctuations for X series and obtain the total, i.e. Σα². 4. Square the short-term fluctuations for Y series and obtain the total, i.e. Σy². 5. Multiply x with y for each value and obtain the total, i.e. Exy. 6. Calculation of correlation coefficient for short-term fluctuations. - $r = \frac{\sum xy}{\sqrt{\sum x² . \sum y²}}$ - where, x = deviation of X series from moving average and not from arithmetic mean. - y = deviation of Y series from moving average and not from arithmetic mean. **Tutorial Note**: The only difference between correlation explained earlier and correlation in short-term fluctuations is that whereas in the former deviations are taken from arithmetic mean, in the latter deviations are taken from the trend values. ## Standard error and Probable Error of Coefficient of Correlation - **Standard Error**: The Standard Error of Coefficient of Correlation is calculated as follows: - Standard Error = $\frac{1-r²}{\sqrt{N}}$ - Where, r = Coefficient of Correlation, N = Number of pairs of observations. - **Probable Error**: The probable error of coefficient of correlation is an amount which if added to and subtracted from the value of r gives the upper and lower limits within which coefficient of correlation in the population can be expected to lie. It is .6745 times the standard error of r. - Computation - The probable error of r is calculated as follows: - Probable Error (P.E) = .6745 Standard Error - = .6745 $\frac{1-r²}{N}$ - Where, r = Coefficient of Correlation, N = Number of pairs of observations. **Uses** - (i) P.E. is used for determining the reliability of the value of r in so far as it depends on the conditions of random sampling. Interpretation is done as follows: | Case | Interpretation | |---|---| | |r|< 6 P.E. | The value of r is not at all significant. There is no evidence of correlation. | | |r|> 6 P.E. | The value of r is significant. The existence of correlation is practically certain. | - (ii) P.E. is used for determining correlation in the population as follows: - Correlation in population (ρ) = r ± Probable Error (P.E) **Conditions for the Use of Probable Error** - According to Riggleman and Frisbee, The measure of probable error can be properly used only when the following three conditions exist: - The data must approximate a normal frequency curve (bell-shaped curve). - The statistical measure for which the P.E. is computed must have been calculated from a sample. - The sample must have been selected in an unbiased manner and the individual items must be independent. ## Coefficient of Determination - **Meaning**: The coefficient of determination is defined as the ratio of the explained variance to the total variance: - Coefficient of Determination = $\frac{Explained Variance}{Total Variance}$ - **Calculation**: The coefficient of determination is calculated by squaring the Coefficient of Correlation. Thus, Coefficient of Determination = Square of r = r² - **Merit**: The coefficient of determination indicates the degree of dependence of a dependent variable. It expresses the % of total variance which is explained by the given independent variable. - **Demerit**: The Coefficient of determination simply shows the degree of dependence and not the nature of dependence since r² shall always be positive. It does not indicate whether the relationship between the two variables is positive or negative. - **Maximum Value of r²**: The maximum value of r² is unity because it is possible to explain all of the variation in the dependent variable but it is not possible to explain more than all of it. **Relationship between r and r²** 1. The value of r² decreases much more rapidly as the value of r decreases as shown below: | r | r² | r | r² | r | r² | r | r² | r | r² | |---|---|---|---|---|---|---|---|---|---| | 0.90 | 0.81 | 0.80 | 0.64 | 0.70 | 0.49 | 0.60 | 0.36 | 0.50 | 0.25 | 0.40 | 0.16 | 0.30 | 0.09 | 0.20 | 0.04 | 2. The value of r will always be larger than r² unless r = 1 or 0. 3. The value of r = value of r² when r = 1 or 0. 4. The values of r can be positive or negative but value of r² is always a positive number. ## Coefficient of Non-Determination - **Meaning**: The coefficient of non-determination is defined as the rate of unexplained variance to the total variance: - Coefficient of Non-Determination = $\frac{Unexplained Variance} {Total Variance}$ - **Calculation**: The coefficient of non-determination is calculated by subtracting coefficient of determination from 1. It is denoted by K². - Coefficient of Non-Determination (K²) = 1-r² - **Merit**: K² indicates the lack of dependence of the dependent variable on the given independent variable. It expresses the % of total variance which is not explained by the given independent variable. ## Spearman's Rank Correlation **Meaning of Spearman's Rank Correlation** - Spearman's Rank Correlation uses ranks rather than actual observation and makes no assumptions about the population from which actual observations are drawn. The correlation coefficient between two series of ranks is called 'Rank Correlation Coefficient.' It is given by the formula: - $R = 1 - \frac{6\sum D²}{N³ - N}$ - Where, - R = Rank correlation coefficient - D = Difference of the ranks between paired items in two series - N = Number of pairs of ranks **In case of Tied Ranks:** - In case there is more than one item with the same value in the series, usually the average rank is allotted to each of these items and the factor $ \frac{m³-m}{12}$ is added for each such tied item to ΣD². Thus, in the case of tied ranks, the modified formula for rank correlation coefficient becomes: - $R =1- \frac{6 [ΣD² + \frac{1}{12}(m³ -m) + \frac{1}{12}(m³ -m) + ...]}{N³ - N}$ - Where, m = Number of items whose ranks are common **Features of Spearman's Correlation Coefficient** 1. Spearman's correlation coefficient is based on ranks rather than on actual observations. 2. Spearman's correlation coefficient is distribution-free or non-parametric because no strict assumptions are made about the form of population from which sample observations are drawn. 3. The sum of the differences of ranks between two variables shall be zero (i.e. ΣD = 0). 4. It can be interpreted like Karl Pearson's Correlation Coefficient. 5. It lies between -1 and +1. - -1≤R≤ +1 **Merits of Spearman's Rank Correlation Coefficient** 1. **Simple to understand and Easy to apply**: Rank Method is simple to understand and easy to apply as compared to Karl Pearson's Method. 2. **Suitable for Qualitative Data**: Rank Method can be conveniently used as a measure degree of association between variables which are not capable of being quantifiable but can only be ranked in some order. For example, it may be possible for the two judges to rank by preference 10 girls in terms of beauty whereas it may be difficult to give them a numerical grade in terms of beauty. 3. **Suitable for Abnormal Data**: Rank Method can conveniently be used when data are abnormal because rank correlation coefficient is not based on the assumption of normality of data like Karl Pearson. 4. **Only method for ranks**: Rank Method is the only method where only ranks are given and not the actual data. 5. **Applicable even for actual data**: Rank Method can be applied even where actual data are given. **Limitations of Spearman's Rank Correlation Coefficient** 1. **Unsuitable for Grouped Data**: Rank Method cannot be applied in the case of grouped frequency distribution. 2. **Tedious calculations**: Calculations become quite tedious where N exceeds 30. 3. **Approximation**: Rank Method's result is only approximation since actual data are not taken into account. **When to use Rank Method** - Number of pairs of observations is fairly small (say not more than 30). - The original data are in the form of ranks. **When is Rank Correlation Coefficient preferred to Karl Pearson's Coefficient of Correlation?** Rank Correlation coefficient is preferred to Karl Pearson's coefficient of correlation when: - the distribution is not normal, or - the behaviour of distribution is not known, or - only qualitative data (i.e. ranks) are given and not the actual data ## Concurrent Deviation Method **Meaning of Concurrent Deviation Method** - Concurrent Deviation Method is based on the direction of change in the two paired variables. The Correlation Coefficient between two series of direction of change is called Coefficient of Concurrent Deviation. It is given by the formula: - $r_c = ± \sqrt{\frac{2c - n}{n}}$ - Where, - $r_c$ = Coefficient of Concurrent Deviation - c = Number of Positive signs after multiplying the direction of change of X series and Y series - n = Number of pairs of observations compared **Significance of using ± sign in the above formula** - To take the underroot of $\frac{2c - n}{n}$, $ \frac{2c - n}{n}$ is multiplied with the minus sign to make it positive since one cannot take the underroot of minus item. **Merits of Concurrent Deviation Method** 1. **Simple to understand and Easy to apply**: It is the simplest to understand and easiest to apply as compared to all other methods. 2. **Suitable for large N**: This method can be used to

Correlation Explained PDF

Document Details

Tags

Related

Summary

Full Transcript