Curve Fitting and Correlation Ratio PDF
Document Details
Uploaded by RespectfulBerkelium
Dr. B.R. Ambedkar Open University
Tags
Summary
This document explains curve fitting methods, focusing on the principle of least squares for finding the best fit line or curve. It also introduces the correlation ratio, a measure used in situations with curvilinear relationships between variables. The document provides formulas and equations relevant to these statistical concepts.
Full Transcript
# Correlation Ratio If there is a curvilinear relationship between two variables _X_ and _Y_. To measure the correlation between these two variables, we use the concept Correlation Ratio. Denoted by η (eta) or in some cases (mew). The concentration of the points about the curve. If _y = r_, then...
# Correlation Ratio If there is a curvilinear relationship between two variables _X_ and _Y_. To measure the correlation between these two variables, we use the concept Correlation Ratio. Denoted by η (eta) or in some cases (mew). The concentration of the points about the curve. If _y = r_, then the curve is linear. Otherwise, it is _n _≠ _r_ ## Formula $η_{yx} = \frac{\sum_{i=1}^n f_{ij} y_{ij}^2 - \frac{T^2}{N}}{\sum_{i=1}^n \sum_{j=1}^n f_{ij} y_{ij}^2 - \frac{T^2}{N}}$ ## Properties of Correlation Ratio 1. Correlation Ratio is independent of shifting of origin and scale $Π^2ν_x = Π^2ν_y$. $U = x - a$, $V = y - b$ 2. Limits of Correlation Ration is 0 to 1 $$0 ≤ η_{yx} ≤ 1$$ # Curve Fitting In a bivariate distribution _X₁Y₁, X₂Y₂, ..., XₙYₙ_ of n pairs of observations, we have the x as an independent variable and y as a dependent variable. The suitable function relationship between the random variables of the given data is in the form _y = f(x)_. This method of finding the relationship called Curve fitting. - The relationship may be polynomial, exponential, logarithm etc. - It is useful in correlation and regression analysis. The another important feature is to estimate the value of one variable corresponding to a specified value of other variable. # Principle of Least Squares Let _(X₁,Y₁) (X₂,Y₂).... (Xₙ, Yₙ)_ observation be a set of n pairs of observation Consider the relationship between _x_ and _y_ as _y = f(x)_. In the pair of observations, the values of _y_ are given values. Using functional relationship, obtain _f(x)_ values which are called as estimated values of _y_. The error (difference) between actual values of _y_ and their estimated value of _y_ is given by _y - f(x)_ is called residual of _Y_. Denote 'E' as the sum of squares of the residuals, then 'E' is E = & [Yi-f(x)]² The principle of least square is minimizing the sum of squares of the deviations of the actual valves of _y_ from their estimated value, that is residuals. # Fitting of a straight line This principle was discovered by 'Legendre'. 'E; should be minimised to obtain the best fit of the given data. _y = a + bx_ is the equation of the straight line for the given data of n pairs of observations _x₁y₁, x₂y₂,..., xₙyₙ_. To find the best fit of the straight line by using the principle of least squares. Residual sum of squares is given by $E = \sum_{i=1}^n (y_i - y)^2$ $E = \sum_{i=1}^n (y_i - (a + bx_i))^2$ We have to minimize 'E' for the best possible values of a and b. By using principles of maxima & minima, equating the partial derivatives of E with respect to A & B to zero, we get: $\frac{dE}{da} = 0$ $\frac{dE}{db} = 0$ This means we need to minimize the error sum of the squares of the error. $E =\sum_{i=1}^n (y_i - y)^2$ $E = \sum_{i=1}^n (y_i - (a + bx_i))^2$ $\frac{dE}{da} = 0$ $\frac{dE}{db} = 0$ $0 = 2 \sum_{i=1}^n (y_i - (a + bx_i)) (-1)$ $0 = -2 \sum_{i=1}^n (y_i - (a + bx_i))$ $0 = 2 \sum_{i=1}^n (y_i - a - bx_i)$ $0 = \sum_{i=1}^n y_i - \sum_{i=1}^n a - \sum_{i=1}^n bx_i$ $0 = \sum_{i=1}^n y_i - na - b \sum_{i=1}^n x_i$ $\sum_{i=1}^n y_i = na + b \sum_{i=1}^n x_i$ also $\frac{dE}{db} = 2 \sum_{i=1}^n (y_i - (a + bx_i)) (-x_i)$ $0= -2 \sum_{i=1}^n (y_i - a - bx_i) (x_i)$ $0 = \sum_{i=1}^n (x_i \cdot y_i - a \cdot x_i - b \cdot x_i^2)$ $0= \sum_{i=1}^n x_i \cdot y_i - a \sum_{i=1}^n x_i - b \sum_{i=1}^n x_i^2$ $\sum_{i=1}^n x_i \cdot y_i = a \sum_{i=1}^n x_i + b \sum_{i=1}^n x_i^2$ $\sum_{i=1}^n x_i \cdot y_i = b \sum_{i=1}^n x_i^2 + a \sum_{i=1}^n x_i$ # Fitting of a 2nd degree Parabola Let the 2nd degree parabola be _y = a + bx + cx_². The problem is to find the best fit of the 2nd degree parabola using the principle of least squares. Which residual sum of squares of _y_ is: $Y = a + bx + cx^2$ $$E = \sum_{i=1}^n [Y_i - (a + bx_i + cx_i^2)]^2 $$ We have to minimize 'E' for the best possible values of a, b & c. By using principle of maxima & minima , equating the partial derivatives of E with respect to a, b &c to 0, we get $\frac{dE}{da} = 2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](-1) =0$ $0 = -2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)]$ $0 = \sum_{i=1}^n (y_i-(a + bx_i + cx_i^2))$ $0 = \sum_{i=1}^n (y_i - a - bx_i - cx_i^2)$ $0 = \sum_{i=1}^n y_i - \sum_{i=1}^n a - \sum_{i=1}^n bx_i - \sum_{i=1}^n cx_i^2$ $0 = \sum_{i=1}^n y_i - na - b \sum_{i=1}^n x_i - c \sum_{i=1}^n x_i^2$ $\sum_{i=1}^n y_i = na + b \sum_{i=1}^n x_i + c \sum_{i=1}^n x_i^2$ — ① $\frac{dE}{db} = 2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](-x_i) =0$ $0= -2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](x_i)$ $0 = 2 \sum_{i=1}^n [x_i \cdot y_i - a \cdot x_i - b \cdot x_i^2 - c \cdot x_i^3]$ $0 = \sum_{i=1}^n x_i \cdot y_i - a \sum_{i=1}^n x_i - b \sum_{i=1}^n x_i^2 - c \sum_{i=1}^n x_i^3$ $\sum_{i=1}^n x_i \cdot y_i = a \sum_{i=1}^n x_i + b \sum_{i=1}^n x_i^2 + c \sum_{i=1}^n x_i^3$ — ② $\frac{dE}{dc} = 2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](-x_i^2) =0$ $0= -2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](x_i^2)$ $0 = 2 \sum_{i=1}^n [x_i^2 \cdot y_i - a \cdot x_i^2 - b \cdot x_i^3 - c \cdot x_i^4]$ $0= \sum_{i=1}^n x_i^2 \cdot y_i - a \sum_{i=1}^n x_i^2 - b \sum_{i=1}^n x_i^3 - c \sum_{i=1}^n x_i^4$ $\sum_{i=1}^n x_i^2 \cdot y_i = a \sum_{i=1}^n x_i^2 + b \sum_{i=1}^n x_i^3 + c \sum_{i=1}^n x_i^4$ — ③ Equating eq 1, 2, 3 are called normal equations to estimate the values of a, b & c. With these values of a, b & c equation is the best fit of 2nd degree parabola of the given parabola. # Fitting of Power Curve Let the power curve be y = ax^b — ① The problem is to find the best fit of power curve by using the principle of least squares. Taking logarithm on both sides of the equations: $logy = log(ax^b)$ $logy = loga + logx^b$ $logy = loga + blogx$ $logax = xloga$ $yi = logy$ $A = loga$ $Xi = logX$ $yi = a + bXi$ — ② Residual sum of squares y is given by: $E = \sum_{i=1}^n [y_i - (a + bx_i)]^2$ $\frac{dE}{dA} = 0$ $\frac{DE}{dA} = 2 \sum_{i=1}^n [y_i - A - Bxi] (-1)$ $0 = (-2) \sum_{i=1}^n [y_i - A - Bxi]$ $0 = \sum_{i=1}^n y_i - \sum_{i=1}^n A - \sum_{i=1}^n Bxi$ $\sum_{i=1}^n y_i = nA + B \sum_{i=1}^n xi$ — ③ $\frac{dE}{dB} = 0$ $\frac{dE}{dB} = 2 \sum_{i=1}^n [y_i - A - Bxi] [-xi]$ $0 = -2 \sum_{i=1}^n [y_i - A - Bxi] [xi]$ $0 = 2{\sum_{i=1}^n}[xiyi - Axi - Bxi²]$ $0 = \sum_{i=1}^n xiyi - A \sum_{i=1}^n xi - B \sum_{i=1}^n xi²$ $\sum_{i=1}^n xiyi = A \sum_{i=1}^n xi - B \sum_{i=1}^n xi²$ — ④ By solving eq 3 & 4 we get the values of A & B $a = antilog(A)$ # FITTING OF EXPONENTIAL CURVE There are two types of exponential curves 1) _y = ab^x_ 2) _y = ae^bx_ ## Type-I: y = ab^x Taking log on Both Sides $logy = loga + logb^x$ $logy = loga + xlogb$ $logy = y$ $loga = A$ $logb = B$ $Y = A + Bx$ By Principle of least squares $E = \sum_{i=1}^n [Y_i - (A + x_iB)]^2$ Partially differentiate w.r.t A & B we get $\frac{dE}{dA} = 0$ $\frac{dE}{dA} = 2 \sum_{i=1}^n [Y_i - (A + x_iB)] (-1)$ $0 = -2 \sum_{i=1}^n [Y_i - A - Bx_i]$ $0 = \sum_{i=1}^n Y_i - \sum_{i=1}^n A - \sum_{i=1}^n Bx_i$ $\sum_{i=1}^n Y_i = nA + B \sum_{i=1}^n x_i$ — ② $\frac{dE}{dB} = 0$ $\frac{dE}{dB} = 2 \sum_{i=1}^n [Y_i - (A + x_iB)] (-x_i)$ $0 = -2 \sum_{i=1}^n [Y_i - (A + x_iB)] (x_i)$ $0 = 2 \sum_{i=1}^n [Y_ixi - Axi - Bx_i^2]$ $0 = \sum_{i=1}^n Y_ixi - \sum_{i=1}^n Axi - \sum_{i=1}^n Bx_i^2$ $\sum_{i=1}^n Y_ixi = A \sum_{i=1}^n xi + B \sum_{i=1}^n x_i^2$ — ③ By solving eq 2 & 3 we get values of A & B. The we obtain a & b: $a = Antilog (A)$ $b = Antilog (B)$ With the values of a, d & b, eqn 1 is the best fit of exponential curves for the given data. ## Type-II: y = ae^bx Taking log on Both sides $logy = loga + loge^bx$ $logy = loga + bxloge$ $logy = y$ $loga = A$ $logb = B$ $Y = A + xb$ — ① By using the principle of least squares $E = \sum_{i=1}^n [y_i - 2]^2$ $E = \sum_{i=1}^n [y_i - (A + x_iB)]^2$ Partially differentiate w.r.t. A & B , we get: $\frac{dE}{dA} = 0$ $\frac{dE}{dA} = 2 \sum_{i=1}^n [Y_i - (A + x_iB)] (-1)$ $0 = -2 \sum_{i=1}^n [Y_i -A - x_iB]$ $0 = \sum_{i=1}^n Y_i - \sum_{i=1}^n A - \sum_{i=1}^n x_iB$ $\sum_{i=1}^n Y_i = nA + B \sum_{i=1}^n x_i$ — ② $\frac{dE}{dB} = 0$ $\frac{dE}{dB} = 2 \sum_{i=1}^n [Y_i - (A + x_iB)] (-x_i)$ $0 = -2 \sum_{i=1}^n [Y_i - A - x_iB] (x_i)$ $0 = 2 \sum_{i=1}^n [Y_ixi - Axi - Bx_i^2]$ $0 = \sum_{i=1}^n Y_ixi - \sum_{i=1}^n Axi - \sum_{i=1}^n Bx_i^2$ $\sum_{i=1}^n Y_ixi = A \sum_{i=1}^n xi + B \sum_{i=1}^n x_i^2$ — ③ By solving eq’n 2 & 3 we get the values of A & B. Then we obtain $a = AntilogA$ $b = \frac{B}{Antilog(e)}$ With the values of a, d & b, eqn 1 is the best fit of exponential curves for the given data.