Curve Fitting and Correlation Ratio PDF
Document Details
Uploaded by RespectfulBerkelium
Dr. B.R. Ambedkar Open University
Tags
Related
- BSCIT-403 Computer Oriented Numerical Methods PDF
- CHECAL-TERMS PDF
- ELEN-30073 Numerical Methods Instructional Material PDF
- UEM Mathematics-1 (BSCM103) - Curve Fitting and Method of Least Squares PDF
- BIOT6002 Lecture 16 Data Handling - Part 4 PDF
- Lesson 5: Curve Fitting - Computational Methods in Physics I - PDF
Summary
This document explains curve fitting methods, focusing on the principle of least squares for finding the best fit line or curve. It also introduces the correlation ratio, a measure used in situations with curvilinear relationships between variables. The document provides formulas and equations relevant to these statistical concepts.
Full Transcript
# Correlation Ratio If there is a curvilinear relationship between two variables _X_ and _Y_. To measure the correlation between these two variables, we use the concept Correlation Ratio. Denoted by η (eta) or in some cases (mew). The concentration of the points about the curve. If _y = r_, then...
# Correlation Ratio If there is a curvilinear relationship between two variables _X_ and _Y_. To measure the correlation between these two variables, we use the concept Correlation Ratio. Denoted by η (eta) or in some cases (mew). The concentration of the points about the curve. If _y = r_, then the curve is linear. Otherwise, it is _n _≠ _r_ ## Formula $η_{yx} = \frac{\sum_{i=1}^n f_{ij} y_{ij}^2 - \frac{T^2}{N}}{\sum_{i=1}^n \sum_{j=1}^n f_{ij} y_{ij}^2 - \frac{T^2}{N}}$ ## Properties of Correlation Ratio 1. Correlation Ratio is independent of shifting of origin and scale $Π^2ν_x = Π^2ν_y$. $U = x - a$, $V = y - b$ 2. Limits of Correlation Ration is 0 to 1 $$0 ≤ η_{yx} ≤ 1$$ # Curve Fitting In a bivariate distribution _X₁Y₁, X₂Y₂, ..., XₙYₙ_ of n pairs of observations, we have the x as an independent variable and y as a dependent variable. The suitable function relationship between the random variables of the given data is in the form _y = f(x)_. This method of finding the relationship called Curve fitting. - The relationship may be polynomial, exponential, logarithm etc. - It is useful in correlation and regression analysis. The another important feature is to estimate the value of one variable corresponding to a specified value of other variable. # Principle of Least Squares Let _(X₁,Y₁) (X₂,Y₂).... (Xₙ, Yₙ)_ observation be a set of n pairs of observation Consider the relationship between _x_ and _y_ as _y = f(x)_. In the pair of observations, the values of _y_ are given values. Using functional relationship, obtain _f(x)_ values which are called as estimated values of _y_. The error (difference) between actual values of _y_ and their estimated value of _y_ is given by _y - f(x)_ is called residual of _Y_. Denote 'E' as the sum of squares of the residuals, then 'E' is E = & [Yi-f(x)]² The principle of least square is minimizing the sum of squares of the deviations of the actual valves of _y_ from their estimated value, that is residuals. # Fitting of a straight line This principle was discovered by 'Legendre'. 'E; should be minimised to obtain the best fit of the given data. _y = a + bx_ is the equation of the straight line for the given data of n pairs of observations _x₁y₁, x₂y₂,..., xₙyₙ_. To find the best fit of the straight line by using the principle of least squares. Residual sum of squares is given by $E = \sum_{i=1}^n (y_i - y)^2$ $E = \sum_{i=1}^n (y_i - (a + bx_i))^2$ We have to minimize 'E' for the best possible values of a and b. By using principles of maxima & minima, equating the partial derivatives of E with respect to A & B to zero, we get: $\frac{dE}{da} = 0$ $\frac{dE}{db} = 0$ This means we need to minimize the error sum of the squares of the error. $E =\sum_{i=1}^n (y_i - y)^2$ $E = \sum_{i=1}^n (y_i - (a + bx_i))^2$ $\frac{dE}{da} = 0$ $\frac{dE}{db} = 0$ $0 = 2 \sum_{i=1}^n (y_i - (a + bx_i)) (-1)$ $0 = -2 \sum_{i=1}^n (y_i - (a + bx_i))$ $0 = 2 \sum_{i=1}^n (y_i - a - bx_i)$ $0 = \sum_{i=1}^n y_i - \sum_{i=1}^n a - \sum_{i=1}^n bx_i$ $0 = \sum_{i=1}^n y_i - na - b \sum_{i=1}^n x_i$ $\sum_{i=1}^n y_i = na + b \sum_{i=1}^n x_i$ also $\frac{dE}{db} = 2 \sum_{i=1}^n (y_i - (a + bx_i)) (-x_i)$ $0= -2 \sum_{i=1}^n (y_i - a - bx_i) (x_i)$ $0 = \sum_{i=1}^n (x_i \cdot y_i - a \cdot x_i - b \cdot x_i^2)$ $0= \sum_{i=1}^n x_i \cdot y_i - a \sum_{i=1}^n x_i - b \sum_{i=1}^n x_i^2$ $\sum_{i=1}^n x_i \cdot y_i = a \sum_{i=1}^n x_i + b \sum_{i=1}^n x_i^2$ $\sum_{i=1}^n x_i \cdot y_i = b \sum_{i=1}^n x_i^2 + a \sum_{i=1}^n x_i$ # Fitting of a 2nd degree Parabola Let the 2nd degree parabola be _y = a + bx + cx_². The problem is to find the best fit of the 2nd degree parabola using the principle of least squares. Which residual sum of squares of _y_ is: $Y = a + bx + cx^2$ $$E = \sum_{i=1}^n [Y_i - (a + bx_i + cx_i^2)]^2 $$ We have to minimize 'E' for the best possible values of a, b & c. By using principle of maxima & minima , equating the partial derivatives of E with respect to a, b &c to 0, we get $\frac{dE}{da} = 2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](-1) =0$ $0 = -2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)]$ $0 = \sum_{i=1}^n (y_i-(a + bx_i + cx_i^2))$ $0 = \sum_{i=1}^n (y_i - a - bx_i - cx_i^2)$ $0 = \sum_{i=1}^n y_i - \sum_{i=1}^n a - \sum_{i=1}^n bx_i - \sum_{i=1}^n cx_i^2$ $0 = \sum_{i=1}^n y_i - na - b \sum_{i=1}^n x_i - c \sum_{i=1}^n x_i^2$ $\sum_{i=1}^n y_i = na + b \sum_{i=1}^n x_i + c \sum_{i=1}^n x_i^2$ — ① $\frac{dE}{db} = 2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](-x_i) =0$ $0= -2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](x_i)$ $0 = 2 \sum_{i=1}^n [x_i \cdot y_i - a \cdot x_i - b \cdot x_i^2 - c \cdot x_i^3]$ $0 = \sum_{i=1}^n x_i \cdot y_i - a \sum_{i=1}^n x_i - b \sum_{i=1}^n x_i^2 - c \sum_{i=1}^n x_i^3$ $\sum_{i=1}^n x_i \cdot y_i = a \sum_{i=1}^n x_i + b \sum_{i=1}^n x_i^2 + c \sum_{i=1}^n x_i^3$ — ② $\frac{dE}{dc} = 2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](-x_i^2) =0$ $0= -2 \sum_{i=1}^n [y_i-(a + bx_i + cx_i^2)](x_i^2)$ $0 = 2 \sum_{i=1}^n [x_i^2 \cdot y_i - a \cdot x_i^2 - b \cdot x_i^3 - c \cdot x_i^4]$ $0= \sum_{i=1}^n x_i^2 \cdot y_i - a \sum_{i=1}^n x_i^2 - b \sum_{i=1}^n x_i^3 - c \sum_{i=1}^n x_i^4$ $\sum_{i=1}^n x_i^2 \cdot y_i = a \sum_{i=1}^n x_i^2 + b \sum_{i=1}^n x_i^3 + c \sum_{i=1}^n x_i^4$ — ③ Equating eq 1, 2, 3 are called normal equations to estimate the values of a, b & c. With these values of a, b & c equation is the best fit of 2nd degree parabola of the given parabola. # Fitting of Power Curve Let the power curve be y = ax^b — ① The problem is to find the best fit of power curve by using the principle of least squares. Taking logarithm on both sides of the equations: $logy = log(ax^b)$ $logy = loga + logx^b$ $logy = loga + blogx$ $logax = xloga$ $yi = logy$ $A = loga$ $Xi = logX$ $yi = a + bXi$ — ② Residual sum of squares y is given by: $E = \sum_{i=1}^n [y_i - (a + bx_i)]^2$ $\frac{dE}{dA} = 0$ $\frac{DE}{dA} = 2 \sum_{i=1}^n [y_i - A - Bxi] (-1)$ $0 = (-2) \sum_{i=1}^n [y_i - A - Bxi]$ $0 = \sum_{i=1}^n y_i - \sum_{i=1}^n A - \sum_{i=1}^n Bxi$ $\sum_{i=1}^n y_i = nA + B \sum_{i=1}^n xi$ — ③ $\frac{dE}{dB} = 0$ $\frac{dE}{dB} = 2 \sum_{i=1}^n [y_i - A - Bxi] [-xi]$ $0 = -2 \sum_{i=1}^n [y_i - A - Bxi] [xi]$ $0 = 2{\sum_{i=1}^n}[xiyi - Axi - Bxi²]$ $0 = \sum_{i=1}^n xiyi - A \sum_{i=1}^n xi - B \sum_{i=1}^n xi²$ $\sum_{i=1}^n xiyi = A \sum_{i=1}^n xi - B \sum_{i=1}^n xi²$ — ④ By solving eq 3 & 4 we get the values of A & B $a = antilog(A)$ # FITTING OF EXPONENTIAL CURVE There are two types of exponential curves 1) _y = ab^x_ 2) _y = ae^bx_ ## Type-I: y = ab^x Taking log on Both Sides $logy = loga + logb^x$ $logy = loga + xlogb$ $logy = y$ $loga = A$ $logb = B$ $Y = A + Bx$ By Principle of least squares $E = \sum_{i=1}^n [Y_i - (A + x_iB)]^2$ Partially differentiate w.r.t A & B we get $\frac{dE}{dA} = 0$ $\frac{dE}{dA} = 2 \sum_{i=1}^n [Y_i - (A + x_iB)] (-1)$ $0 = -2 \sum_{i=1}^n [Y_i - A - Bx_i]$ $0 = \sum_{i=1}^n Y_i - \sum_{i=1}^n A - \sum_{i=1}^n Bx_i$ $\sum_{i=1}^n Y_i = nA + B \sum_{i=1}^n x_i$ — ② $\frac{dE}{dB} = 0$ $\frac{dE}{dB} = 2 \sum_{i=1}^n [Y_i - (A + x_iB)] (-x_i)$ $0 = -2 \sum_{i=1}^n [Y_i - (A + x_iB)] (x_i)$ $0 = 2 \sum_{i=1}^n [Y_ixi - Axi - Bx_i^2]$ $0 = \sum_{i=1}^n Y_ixi - \sum_{i=1}^n Axi - \sum_{i=1}^n Bx_i^2$ $\sum_{i=1}^n Y_ixi = A \sum_{i=1}^n xi + B \sum_{i=1}^n x_i^2$ — ③ By solving eq 2 & 3 we get values of A & B. The we obtain a & b: $a = Antilog (A)$ $b = Antilog (B)$ With the values of a, d & b, eqn 1 is the best fit of exponential curves for the given data. ## Type-II: y = ae^bx Taking log on Both sides $logy = loga + loge^bx$ $logy = loga + bxloge$ $logy = y$ $loga = A$ $logb = B$ $Y = A + xb$ — ① By using the principle of least squares $E = \sum_{i=1}^n [y_i - 2]^2$ $E = \sum_{i=1}^n [y_i - (A + x_iB)]^2$ Partially differentiate w.r.t. A & B , we get: $\frac{dE}{dA} = 0$ $\frac{dE}{dA} = 2 \sum_{i=1}^n [Y_i - (A + x_iB)] (-1)$ $0 = -2 \sum_{i=1}^n [Y_i -A - x_iB]$ $0 = \sum_{i=1}^n Y_i - \sum_{i=1}^n A - \sum_{i=1}^n x_iB$ $\sum_{i=1}^n Y_i = nA + B \sum_{i=1}^n x_i$ — ② $\frac{dE}{dB} = 0$ $\frac{dE}{dB} = 2 \sum_{i=1}^n [Y_i - (A + x_iB)] (-x_i)$ $0 = -2 \sum_{i=1}^n [Y_i - A - x_iB] (x_i)$ $0 = 2 \sum_{i=1}^n [Y_ixi - Axi - Bx_i^2]$ $0 = \sum_{i=1}^n Y_ixi - \sum_{i=1}^n Axi - \sum_{i=1}^n Bx_i^2$ $\sum_{i=1}^n Y_ixi = A \sum_{i=1}^n xi + B \sum_{i=1}^n x_i^2$ — ③ By solving eq’n 2 & 3 we get the values of A & B. Then we obtain $a = AntilogA$ $b = \frac{B}{Antilog(e)}$ With the values of a, d & b, eqn 1 is the best fit of exponential curves for the given data.