Chapter 3 Least Squares PDF

Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy Chapter 3...

Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy Chapter 3 Least Squares 3.1 “Solution” of Overdetermined Systems Suppose that we are given a linear system of the form Ax = b, where A ∈ R m×n and b ∈ R m. Assume that the system is overdetermined, meaning that m > n. In addition, we assume that A has a full column rank; that is, rank(A) = n. In this setting, the system is usually inconsistent (has no solution) and a common approach for finding an approxi- mate solution is to pick the solution resulting with the minimal squared l2 -norm of the residual r = Ax − b: (LS) min ∥Ax − b∥22. x∈Rn Problem (LS) is a problem of minimizing a quadratic function over the entire space. The qua- dratic objective function is given by f (x) = xT AT Ax − 2bT Ax + ∥b∥2. Since A is of full column rank, it follows that for any x ∈ Rn it holds that ∇2 f (x) = 2AT A ≻ 0. (Otherwise, if A is not of full column rank, only positive semidefiniteness can be guaranteed.) Hence, by Lemma 2.41 the unique stationary point xLS = (AT A)−1 AT b (3.1) is the optimal solution of problem (LS). The vector xLS is called the least squares solution or the least squares estimate of the system Ax = b. It is quite common not to write the explicit expression for xLS but instead to write the associated system of equations that defines it: (AT A)xLS = AT b. The above system of equations is called the normal system. We can actually omit the assumption that the system is overdetermined and just keep the assumption that A is of full column rank. Under this assumption m ≥ n, and in the case when m = n, the matrix A is nonsingular and the least squares solution is actually the solution of the linear system, that is, A−1 b. 43 44 Chapter 3. Least Squares Example 3.1. Consider the inconsistent linear system Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy x1 + 2x2 = 0, 2x1 + x2 = 1, 3x1 + 2x2 = 1. We will denote by A and b the coefficients matrix and right-hand-side vector of the system. The least squares problem can be explicitly written as min(x1 + 2x2 )2 + (2x1 + x2 − 1)2 + (3x1 + 2x2 − 1)2. x1 ,x2 Essentially, the solution to the above problem is the vector that yields the minimal sum of squares of the errors corresponding to the three equations. To find the least squares solution, we will solve the normal equations,  T    T   1 2 1 2 1 2 0 2 1 2 1 x1 = 2 1 1 , x2 3 2 3 2 3 2 1 which are the same as 14 10 x1 5 =. 10 9 x2 3 The solution of the above system is the least squares estimate: 15/26 0.5769 xLS = =. −8/26 −0.3077 Note that   −0.038 AxLS =  0.846  , 1.115 so that the residual vector containing the errors in each of the equations is   −0.038 AxLS − b = −0.154. 0.115 The total sum of squares of the errors, which is the optimal value of the least squares problem, is (−0.038)2 + (−0.154)2 + 0.1152 = 0.038. MATLAB Implementation To find the least squares solution of an overdetermined linear system Ax = b in MATLAB, the backslash operator \ should be used. Therefore, Example 3.1 can be solved by the following commands: A=[1,2;2,1;3,2]; b=[0;1;1]; A\b ans = 0.5769 -0.3077 3.2. Data Fitting 45 Python Implementation Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy There are several ways to find the least squares solution of a linear system in Python. We will use the function lstsq which is a part of the celebrated NumPy module. The least squares solution in Example 3.1 can be solved via the commands import numpy as np A = np.array([[1, 2], [2, 1], [3, 2]]) b = np.array([0, 1, 1]) xls = np.linalg.lstsq(A, b) print(xls) [ 0.57692308 -0.30769231] 3.2 Data Fitting One area in which least squares is being frequently used is data fitting. We begin by describ- ing the problem of linear fitting. Suppose that we are given a set of data points (si , ti ), i = 1, 2,... , m, where si ∈ Rn and ti ∈ R, and assume that a linear relation of the form ti = sTi x, i = 1, 2,... , m, approximately holds. In the least squares approach the objective is to find the parameters vector x ∈ Rn that solves the problem Xm minn (sTi x − ti )2. x∈R i =1 We can alternatively write the problem as min ∥Sx − t∥2 , x∈Rn where — sT1 —     t1  T   — s2 —   t2  S= ..  , t = . .    .  ..  — sTm — tm Example 3.2. Consider the 30 points in R2 described in the left image of Figure 3.1. The 30 x- coordinates are xi = (i −1)/29, i = 1, 2,... , 30, and the corresponding y-coordinates are defined by yi = 2xi + 1 + ϵi , where for every i, ϵi = 0.15 sin(10i 3 ). At this point we note that we refrain here and throughout the book from using the available random generators of MATLAB and Python since they generate different sequences of random numbers. The commands that generated the points and plotted them are MATLAB Python import numpy as np import matplotlib.pyplot as plt x=linspace(0,1,30)’; x = np.linspace(0, 1, num = 30) eps=sin(10*[1:30].^3)’; eps = np.sin(10 * np.arange(1, 31)**3) y=2*x+1+0.15*eps; y = 2 * x + 1 + 0.15 * eps plot(x,y,’*’) plt.plot(x, y, ’*’) plt.show() 46 Chapter 3. Least Squares Given the 30 points, the objective is to find a line of the form y = a x + b that best fits them. Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy The corresponding linear system that needs to be “solved” is     x1 1 y1  x2 1 y   a  2 ....  b = .. .  .. .  x30 1 y30 | {z } | {z } X y The least squares solution of the above system is (XT X)−1 XT y. The parameters a and b can be extracted via the commands MATLAB Python A = np.block([x.reshape((30, 1)),np.ones((30, 1))]) u=[x,ones(30,1)]\y; u = np.linalg.lstsq(A, y) a=u(1),b=u(2) a, b = u print(a,b) plot(x,y,’*’) hold on plot(x,a*x+b,’--’,... plt.plot(x, y, ’*’, x, a * x + b, ’--’) ’LineWidth’,2); plt.show() a = 2.080547663698539 0.9534536399905529 2.0805 b = 0.9535 Note that the obtained estimates of a and b are close to the “true” a and b (2 and 1, respec- tively) that were used to generate the data. The least squares line as well as the 30 points are described in the right image of Figure 3.1. 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Figure 3.1. Left image: 30 points in the plane. Right image: the points and the corresponding least squares line. 3.3. Regularized Least Squares 47 The least squares approach can be used also in nonlinear fitting. Suppose, for example, that we are given a set of points in R2 : (ui , yi ), i = 1, 2,... , m, and that we know a priori that these Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy points are approximately related via a polynomial of degree at most d ; i.e., there exist a0 ,... , ad such that d j X a j ui ≈ yi , i = 1,... , m. j =0 The least squares approach to this problem seeks a0 , a1 ,... , ad that are the least squares solution to the linear system 1 u1 u12 · · · u1d      a y 2 d   a0   y0   1 u2 u2 · · · u2   1   1   .  = . . ........   .. .... .  .  1 um um 2 · · · um d ad ym | {z } U The least squares solution is of course well-defined if the m × (d + 1) matrix is of a full column rank. This of course suggests in particular that m ≥ d + 1. The matrix U consists of the first d + 1 columns of the so-called Vandermonde matrix, 1 u1 u12 ··· u1m−1   1 u2 u22 ··· u2m−1  ,   .......  .....  1 um 2 um ··· u mm−1 which is known to be invertible when all the ui s are different from each other. Thus, when m ≥ d + 1, and all the ui s are different from each other, the matrix U is of a full column rank. 3.3 Regularized Least Squares There are several situations in which the least squares solution does not give rise to a good es- timate of the “true” vector x. For example, when A is underdetermined, that is, when there are fewer equations than variables, there are several optimal solutions to the least squares prob- lem, and it is unclear which of these optimal solutions is the one that should be considered. In these cases, some type of prior information on x should be incorporated into the optimization model. One way to do this is to consider a penalized problem in which a regularization func- tion R(·) is added to the objective function. The regularized least squares (RLS) problem has the form (RLS) min ∥Ax − b∥2 + λR(x). (3.2) x The positive constant λ is the regularization parameter. As λ gets larger, more weight is given to the regularization function. In many cases, the regularization is taken to be quadratic. In particular, R(x) = ∥Dx∥2 where D ∈ R p×n is a given matrix. The quadratic regularization function aims to control the norm of Dx and is formulated as follows: min ∥Ax − b∥2 + λ∥Dx∥2. x 48 Chapter 3. Least Squares To find the optimal solution of this problem, note that it can be equivalently written as Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy min{ fRLS (x) ≡ xT (AT A + λDT D)x − 2bT Ax + ∥b∥2 }. x Since the Hessian of the objective function is ∇2 fRLS (x) = 2(AT A + λDT D) ⪰ 0, it follows by Lemma 2.41 that any stationary point is a global minimum point. The stationary points are those satisfying ∇ fRLS (x) = 0, that is, (AT A + λDT D)x = AT b. Therefore, if AT A + λDT D ≻ 0, then the RLS solution is given by xRLS = (AT A + λDT D)−1 AT b. (3.3) Example 3.3. Let A ∈ R3×3 be given by 2 + 10−3   3 4 A=  3 5 + 10−3 7 . 4 7 10 + 10−3 The matrix can be constructed via the code MATLAB Python import numpy as np B=[1,1,1;1,2,3]; B = np.array([[1, 1, 1],[1, 2, 3]]) A=B’*B+0.001*eye(3); A = B.T @ B + 0.001 * np.eye(3) The “true” vector was chosen to be xtrue = (1, 2, 3)T , and b is a corrupted measurement of Axtrue : MATLAB Python x_true=[1;2;3]; x_true = [1, 2, 3] b=A*x_true+[0.001;0.002;0.02] b = A @ x_true + [0.001, 0.002, 0.02] print(b) b = [20.002 34.004 48.023] 20.0020 34.0040 48.0230 The matrix A is in fact of a full column rank since it is invertible, and the least squares solution is given by xLS , whose value can be computed by MATLAB Python A\b print(np.linalg.lstsq(A, b)) ans = [ 3.81352803 -3.6704325 5.84560697] 3.8135 -3.6704 5.8456 xLS is rather far from the true vector xtrue. One difference between the solutions is that the squared norm ∥xLS ∥2 = 62.1862 is much larger then the correct squared norm ∥xtrue ∥2 = 14. 3.4. Denoising 49 In order to control the norm of the solution we will add the quadratic regularization function ∥x∥2. The regularized solution will thus have the form (see (3.3)) Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy xRLS = (AT A + λI)−1 AT b. Picking the regularization parameter as λ = 1, the RLS solution becomes MATLAB Python x_rls=(A’*A+eye(3))\(A’*b) x_rls = np.linalg.solve\ (A.T @ A + np.eye(3), A.T @ b) print(x_rls) x_rls = [1.17622884 2.03187035 2.88752887] 1.1762 2.0319 2.8875 which is a much better estimate for xtrue than xLS. 3.4 Denoising One application area in which regularization is commonly used is denoising. Suppose that a noisy measurement of a signal x ∈ Rn is given: b = x + w. Here x is an unknown signal, w is an unknown noise vector, and b is the known measurements vector. The denoising problem is the following: Given b, find a “good” estimate of x. The least squares problem associated with the approximate equations x ≈ b is min ∥x − b∥2. However, the optimal solution of this problem is obviously x = b, which is meaningless. This is a case in which the least squares solution is not informative even though the associated matrix— the identity matrix—is of a full column rank. To find a more relevant problem, we will add a regularization term. For that, we need to exploit some a priori information on the signal. For example, we might know in advance that the signal is smooth in some sense. In that case, it is very natural to add a quadratic penalty, which is the sum of the squares of the differences of consecutive components of the vector; that is, the regularization function is n−1 X R(x) = (xi − xi +1 )2. i =1 This quadratic function can also be written as R(x) = ∥Lx∥22 , where L ∈ R(n−1)×n is given by 1 −1 0 0 ··· 0 0   0  1 −1 0 ··· 0 0   L = 0  0 1 −1 · · · 0 0 .  ........... .. ..... 0 0 0 0 ··· 1 −1 50 Chapter 3. Least Squares The resulting RLS problem is (with λ a given regularization parameter) Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy min ∥x − b∥2 + λ∥Lx∥2 , x and its optimal solution is given by xRLS (λ) = (I + λLT L)−1 b. (3.4) Example 3.4. Consider the signal x ∈ R300 constructed by the following commands: MATLAB Python import numpy as np import numpy.linalg as la import matplotlib.pyplot as plt t=linspace(0,4,300)'; t = np.linspace(0, 4, num = 300) x=sin(t)+t.*(cos(t).^2); x = np.sin(t) + t * (np.cos(t)**2) i−1 i −1 i −1 This is the signal given by xi = sin(4 299 ) + (4 299 ) cos2 (4 299 ), i = 1, 2,... , 300. A noise element is added to each of the components: MATLAB Python w=1:300; w = np.arange(1, 301) b=x+0.1*sin(10*w.^3)’; b = x + 0.1 * np.sin(10 * w**3) The true and noisy signals are given in Figure 3.2, which was constructed by the commands3 MATLAB Python figure(1) fig = plt.figure() subplot(1,2,1); fig, ax = plt.subplots(1, 2) plot(w,x,’LineWidth’,2); ax.plot(w, x) subplot(1,2,2); plot(w,b,’LineWidth’,2); ax.plot(w, b) 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -0.5 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Figure 3.2. A signal (left image) and its noisy version (right image). In order to denoise the signal b, we look at the optimal solution of the RLS problem given by (3.4) for four different values of the regularization parameter: λ = 1, 10, 100, 1000. The original true signal is denoted by a dotted line. As can be seen in Figure 3.3, as λ gets larger, the RLS solution becomes smoother. For λ = 1 the RLS solution xRLS (1) is not smooth enough and is very close to the noisy signal b. For λ = 10 the RLS solution is a rather good estimate of the original vector x. For λ = 100 we get a smoother RLS signal, but evidently it is less accurate 3 Here and throughout the book we provide the figures produced by MATLAB and not Python. 3.5. Nonlinear Least Squares 51 than xRLS (10), especially near the boundaries. The RLS solution for λ = 1000 is very smooth, Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy but it is a rather poor estimate of the original signal. In any case, it is evident that the parameter λ is chosen via a trade-off between data fidelity (closeness of x to b) and smoothness (size of Lx). The four plots where produced by the commands MATLAB Python L=zeros(299,300); L = np.zeros((299, 300)) for i=1:299 for i in range(299): L(i,i)=1; L[i, i] = 1 L(i,i+1)=-1; L[i, i+1] = -1 end I=eye(300); I = np.eye(300) LTL=L’*L; LTL = L.T @ L x_rls0=(I+1*LTL)\b; x_rls0 = la.solve(I + LTL, b) x_rls1=(I+10*LTL)\b; x_rls1 = la.solve(I + 10 * LTL, b) x_rls2=(I+100*LTL)\b; x_rls2 = la.solve(I + 100 * LTL, b) x_rls3=(I+1000*LTL)\b; x_rls3 = la.solve(I + 1000 * LTL, b) figure(2) fig= plt.figure() subplot(2,2,1); fig, ax = plt.subplots(2,2) plot(w,x_rls0,’LineWidth’,2); ax.plot(w, x_rls0) hold on plot(w,x,’:r’,’LineWidth’,2); ax.plot(w, x,’r--’) hold off title(’\lambda=1’); ax.title.set_text(’$\lambda$=1’) subplot(2,2,2); plot(w,x_rls1,’LineWidth’,2); ax.plot(w, x_rls1) hold on plot(w,x,’:r’,’LineWidth’,2); ax.plot(w, x,’r--’) hold off title(’\lambda=10’); ax.title.set_text(’$\lambda$=10’) subplot(2,2,3); plot(w,x_rls2,’LineWidth’,2); ax.plot(w, x_rls2) hold on plot(w,x,’:r’,’LineWidth’,2); ax.plot(w, x,’r--’) hold off title(’\lambda=100’); ax.title.set_text(’$\lambda$=100’) subplot(2,2,4); plot(w,x_rls3,’LineWidth’,2); ax.plot(w, x_rls3) hold on plot(w,x,’:r’,’LineWidth’,2); ax.plot(w, x,’r--’) hold off title(’\lambda=1000’); ax.title.set_text(’$\lambda$=1000’) 3.5 Nonlinear Least Squares The least squares problem considered so far is also referred to as “linear least squares” since it is a method for finding a solution to a set of approximate linear equalities. There are of course situations in which we are given a system of nonlinear equations fi (x) ≈ ci , i = 1, 2,... , m. In this case, the appropriate optimization problem is the nonlinear least squares (NLS) problem, which is formulated as m X min ( fi (x) − ci )2. (3.5) i =1 52 Chapter 3. Least Squares =1 =10 3.5 3.5 Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0 0.5 -0.5 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 =100 =1000 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Figure 3.3. Four reconstructions of a noisy signal by RLS solutions. As opposed to linear least squares, there is no easy way to solve NLS problems. In Section 4.5 we will describe the Gauss–Newton method which is specifically devised to solve NLS problems of the form (3.5), although it is not guaranteed to converge to the global optimal solution of (3.5) but rather to a stationary point. 3.6 Circle Fitting Suppose that we are given m points a1 , a2 ,... , a m ∈ Rn. The circle fitting problem seeks to find a circle C (x, r ) = {y ∈ Rn : ∥y − x∥ = r } that best fits the m points. Note that we use the term “circle,” although this terminology is usually used in the plane (n = 2), and here we consider the general n-dimensional space Rn. Additionally, note that C (x, r ) is the boundary set of the corresponding ball B(x, r ). An illus- tration of such a fit is given in Figure 3.4. The circle fitting problem has applications in many areas, such as archeology, computer graphics, coordinate metrology, petroleum engineering, statistics, and more. The nonlinear (approximate) equations associated with the problem are ∥x − ai ∥ ≈ r, i = 1, 2,... , m. Since we wish to deal with differentiable functions, and the norm function is not differentiable, we will consider the squared version: ∥x − ai ∥2 ≈ r 2 , i = 1, 2,... , m. The NLS problem associated with these equations is m X min n (∥x − ai ∥2 − r 2 )2. (3.6) x∈R ,r ∈R+ i =1 3.6. Circle Fitting 53 15 Downloaded 02/03/25 to 5.198.138.118. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy 10 5 0 25 35 40 45 50 55 60 65 70 75 Figure 3.4. The best circle fit (the optimal solution of problem (3.6)) of 10 points denoted by asterisks. From a first glance, problem (3.6) seems to be a standard NLS problem, but in this case we can show that it is in fact equivalent to a linear least squares problem, and therefore the global optimal solution can be easily obtained. We begin by noting that problem (3.6) is the same as m ¨ « X min (−2aTi x + ∥x∥2 2 2 2 − r + ∥ai ∥ ) : x ∈ R , r ∈ R.n (3.7) x,r i =1 Making the change of variables R = ∥x∥2 − r 2 , the above problem reduces to m ¨ «

Chapter 3 Least Squares PDF

Document Details

Tags

Related

Summary

Full Transcript