Elementary Linear Algebra 10th Edition-664-700
Elementary Linear Algebra 10th Edition-664-700
Given a linear system of m equations in n unknowns, find a vector x that minimizes with respect to the
Euclidean inner product on . We call such an x a least squares solution of the system, we call the least squares
error vector, and we call the least squares error.
The term “least squares solution” results from the fact that minimizing also minimizes
.
Best Approximation
Suppose that b is a fixed vector in that we would like to approximate by a vector w that is required to lie in some subspace W
of . Unless b happens to be in W, then any such approximation will result in an “error vector” that cannot be made equal
to 0 no matter how w is chosen (Figure 6.4.1a). However, by choosing
Figure 6.4.1
These geometric ideas suggest the following general theorem.
If W is a finite-dimensional subspace of an inner product space V, and if b is a vector in V, then is the best
approximation to b from W in the sense that
(1)
But being a difference of vectors in W is itself in W; and since is orthogonal to W, the two terms on the
right side of 1 are orthogonal. Thus, it follows from the Theorem of Pythagoras (Theorem 6.2.3) that
Since , it follows that the second term in this sum is positive, and hence that
(2)
(3)
Since is the component of b that is orthogonal to the column space of A, it follows from Theorem 4.8.9b that this
vector lies in the null space of , and hence that
Thus, 3 simplifies to
(4)
This is called the normal equation or the normal system associated with . When viewed as a linear system, the individual
equations are called the normal equations associated with .
THEOREM 6.4.2
(5)
is consistent, and all solutions of 5 are least squaressolutions of . Moreover, if W is the column space of A, and x is
any least squares solution of , then the orthogonal projection of b on W is
(6)
Solution
(a) It will be convenient to express the system in the matrix form , where
It follows that
Find the orthogonal projection of the vector on the subspace of spanned by the vectors
Solution We could solve this problem by first using the Gram–Schmidt process to convert into an
orthonormal basis and then applying the method used in Example 6 of Section 6.3 . However, the following method
is more efficient.
Thus, if u is expressed as a column vector, we can find the orthogonal projection of u on W by finding a least
squares solution of the system and then calculating from the least squares solution. The
computations are as follows: The system is
so
THEOREM 6.4.3
Proof We will prove that and leave the proof that as an exercise.
Assume that A has linearly independent column vectors. The matrix has size , so we can prove that this
matrix is invertible by showing that the linear system has only the trivial solution. But if x is any solution of this
system, then is in the null space of and also in the column space of A. By Theorem 4.8.9b these spaces are orthogonal
complements, so part (b) of Theorem 6.2.4 implies that . But A is assumed to have linearly independent column vectors, so
by Theorem 1.3.1.
The next theorem, which follows directly from Theorem 6.4.2 and Theorem 6.4.3, gives an explicit formula for the least squares
solution of a linear system in which the coefficient matrix has linearly independent column vectors.
THEOREM 6.4.4
If A is an matrix with linearly independent column vectors, then for every matrix b, the linearsystem
has a unique least squares solution. This solution is given by
(7)
Moreover, if W is the column space of A, then the orthogonalprojection of b on W is
(8)
OPTIONAL
The Role of QR-Decomposition in Least Squares Problems
Formulas 7 and 8 have theoretical use but are not well suited for numerical computation. In practice, least squares solutions of
are typically found by using some variation of Gaussian elimination to solve the normal equations or by using
QR-decomposition and the following theorem.
THEOREM 6.4.5
If A is an matrix with linearly independent column vectors, and if A = QR is a QR-decomposition of A (see Theorem
6.3.7), then for each b in the system has a unique least squares solution given by
(9)
A proof of this theorem and a discussion of its use can be found in many books on numerical methods of linear algebra. However,
you can obtain Formula 9 by making the substitution in 7 and using the fact that to obtain
DEFINITION 1
If W is a subspace of , then the linear transformation that maps each vector x in into its orthogonal
m
projection in W is called the orthogonal projection of R on W
It follows from Formula 7 that the standard matrix for the transformation P is
(10)
is the standard matrix for the orthogonal projection on the line W through the origin of that makes an angle θ with
the positive x-axis. Derive this result using Formula 10.
Solution The column vectors of A can be formed from any basis for W. Since W is one-dimensional, we can take
as the basis vector (Figure 6.4.2), so
We leave it for you to show that is the identity matrix. Thus, Formula 10 simplifies to
Figure 6.4.2
where and are the orthogonal projections of x on the row space of A and the null space of A, and the vectors
and are the orthogonal projections of b on the null space of and the column space of A.
In Figure 6.4.3 we have represented the fundamental spaces of A by perpendicular lines in and on which we indicated the
orthogonal projections of x and b. (This, of course, is only pictorial since the fundamental spaces need not be one-dimensional.)
The figure shows as a point in the column space of A and conveys that is the point in col(A) that is closest to b. This
illustrates that the least squares solutions of are the exact solutions of the equation .
Figure 6.4.3
The proof of part (u) follows from part (h) of this theorem and Theorem 6.4.3 applied to square matrices.
OPTIONAL
We now have all the ingredients needed to prove Theorem 6.3.3 in the special case where V is the vector space .
Proof of Theorem 6.3.3 We will leave the case where as an exercise, so assume that . Let
be any basis for W, and form the matrix M that has these basis vectors as successive columns. This makes
W the column space of M and hence the null space of . We will complete the proof by showing that every vector u in
can be written in exactly one way as
where is in the column space of M and . However, to say that is in the column space of M is equivalent to saying
for some vector x in , and to say that is equivalent to saying that . Thus, if we can
show that the equation
(11)
has a unique solution for x, then and will be uniquely determined vectors with the required properties. To
do this, let us rewrite 11 as
Since the matrix M has linearly independent column vectors, the matrix is invertible by Theorem 6.4.6 and hence the
equation has a unique solution as required to complete the proof.
Concept Review
• Least squares problem
• Least squares solution
• Least squares error vector
• Least squares error
• Best approximation
• Normal equation
• Orthogonal projection
Skills
• Find the least squares solution of a linear system.
• Find the error and error vector associated with a least squares solution to a linear system.
• Use the techniques developed in this section to compute orthogonal projections.
• Find the standard matrix of an orthogonal projection.
Answer:
(a)
(b)
In Exercises 2–4, find the least squares solution of the linear equation .
2. (a)
;
(b)
;
3. (a)
(b)
Answer:
(a)
(b)
4. (a)
(b)
In Exercises 5–6, find the least squares error vector resulting from the least squares solution x and verify that it is
orthogonal to the column space of A.
Answer:
(a)
(b)
7. Find all least squares solutions of andconfirm that all of the solutions have the same error vector. Compute the least
squares error.
(a)
;
(b)
;
(c)
;
Answer:
8. Find the orthogonal projection of u on the subspace of spanned by the vectors and .
(a)
(b)
9. Find the orthogonal projection of u on the subspace of spanned by the vectors , , and .
(a) ; , ,
(b) ; , ,
Answer:
(a) (7, 2, 9, 5)
(b)
10. Find the orthogonal projection of on the solution space of the homogeneous linear system
11. In each part, find , and apply Theorem 6.4.3 to determine whether A has linearly independent column vectors.
(a)
(b)
Answer:
12. Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection onto
(a) the x-axis.
(b) the y-axis.
[Note: Compare your results to Table 3 of Section 4.9.]
13. Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection onto
(a) the xz-plane.
(b) the yz-plane.
[Note: Compare your results to Table 4 of Section 4.9.]
Answer:
(a)
(b)
14. Show that if is a nonzero vector, then the standard matrix for the orthogonal projection of on the line
is
Answer:
(a)
(b)
(c)
(d)
Let P be a point on l, and let Q be a point on m. Find the values of t and s that minimize the distance between the lines by
minimizing the squared distance .
Answer:
18. Prove: If A has linearly independent column vectors, and if is consistent, then the least squares solution of and
the exact solution of are the same.
19. Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of A, then the least squares
solution of is .
20. Let be the orthogonal projection of onto a subspace W.
(a) Prove that .
(b) What does the result in part (a) imply about the composition ?
(c) Show that [P] is symmetric.
21. Let A be an matrix with linearly independent row vectors. Find a standard matrix for the orthogonal projection of
onto the row space of A. [Hint: Start with Formula 10.]
Answer:
True-False Exercises
In parts (a)–(h) determine whether the statement is true or false, and justify your answer.
Answer:
True
(b) If is invertible, then A is invertible.
Answer:
False
(c) If A is invertible, then is invertible.
Answer:
True
(d) If is a consistent linear system, then is also consistent.
Answer:
True
(e) If is an inconsistent linear system, then is also inconsistent.
Answer:
False
(f) Every linear system has a least squares solution.
Answer:
True
(g) Every linear system has a unique least squares solution.
Answer:
False
(h) If A is an matrix with linearly independent columns and b is in , then has a unique least squares solution.
Answer:
True
Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.
6.5 Least Squares Fitting to Data
In this section we will use results about orthogonal projections in inner product spaces to obtain a technique
for fitting a line or other polynomial curve to a set of experimentally determined points in the plane.
On the basis of theoretical considerations or simply by observing the pattern of the points, the experimenter
decides on the general form of the curve to be fitted. Some possibilities are (Figure 6.5.1)
(a) A straight line:
(b) A quadratic polynomial:
Because the points are obtained experimentally, there is often some measurement “error” in the data, making
it impossible to find a curve of the desired form that passes through all the points. Thus, the idea is to choose
the curve (by determining its coefficients) that “best” fits the data. We begin with the simplest and most
common case: fitting a straight line to data points.
Figure 6.5.1
If the data points were collinear, the line would pass through all n points, and the unknown coefficients a and
b would satisfy the equations
We can write this system in matrix form as
or more compactly as
(1)
where
(2)
If the data points are not collinear, then it is impossible to find coefficients a and b that satisfy system 1
exactly; that is, the system is inconsistent. In this case we will look for a least squares solution
We call a line whose coefficients come from a least squares solution a regression line or a
least squares straight line fit to the data. To explain this terminology, recall that a least squares solution of 1
minimizes
(3)
(4)
If we now let
(5)
As illustrated in Figure 6.5.2, the number can be interpreted as the vertical distance between the line
and the data point . This distance is a measure of the “error” at the point
resulting from the inexact fit of to the data points, the assumption being that the are known
exactly and that all the error is in the measurement of the . Since 3 and 5 are minimized by the same vector
, the least squares straight line fit minimizes the sum of the squares of the estimated errors , hence the
name least squares straight line fit.
Figure 6.5.2 measures the vertical error in the least squares straight line.
Normal Equations
Recall from Theorem 6.4.2 that the least squares solutions of 1 can be obtained by solving the associated
normal system
In the exercises it will be shown that the column vectors of M are linearly independent if and only if the n data
points do not lie on a vertical line in the xy-plane. In this case it follows from Theorem 6.4.4 that the least
squares solution is unique and is given by
Let be a set of two or more data points, not all lying on a vertical
line, and let
(6)
which expresses the fact that is the unique solution of the normal equations
(7)
Find the least squares straight line fit to the four points , , , and . (See
Figure 6.5.3.)
Figure 6.5.3
Solution We have
Hooke's law in physics states that the length x of a uniform spring is a linear function of the
force y applied to it. If we express this relationship as , then the coefficient b is
called the spring constant. Suppose a particular unstretched spring has a measured length of 6.1
inches (i.e., when ). Forces of 2 pounds, 4 pounds, and 6 pounds are then applied
to the spring, and the corresponding lengths are found to be 7.6 inches, 8.7 inches, and 10.4
inches (see Figure 6.5.4). Find the spring constant.
Figure 6.5.4
Solution We have
and
where the numerical values have been rounded to one decimal place. Thus, the estimated value
of the spring constant is pounds/inch.
Historical Note On October 5, 1991 the Magellan spacecraft entered the atmosphere of Venus and
transmitted thetemperature T in kelvins (K) versus the altitude h in kilometers (km) until its signal
was lost at an altitude of about 34 km. Discounting theinitial erratic signal, the data strongly
suggested a linear relationship, so a least squares straight line fit was used on the linear part of the
data to obtain the equation
(8)
to n points
(9)
where
(10)
Conditions that guarantee the invertibility of are discussed in the exercises (Exercise 7). If is
invertible, then the normal equations have a unique solution , which is given by
(11)
According to Newton's second law of motion, a body near the Earth's surface falls vertically
downward according to the equation
(12)
where
s = vertical displacement downward relative to some fixed point
= initial displacement at time
= initial velocity at time
g = acceleration of gravity at the Earth's surface
from Equation 12 by releasing a weight with unknown initial displacement and velocity and
measuring the distance it has fallen at certain times relative to a fixed reference point. Suppose
that a laboratory experiment is performed to evaluate g. Suppose it is found that at times
, and .5 seconds the weight has fallen , and 3.73
feet, respectively, from the reference point. Find an approximate value of g using these data.
(13)
If desired, we can also estimate the initial displacement and initial velocity of the weight:
In Figure 6.5.5 we have plotted the five data points and the approximating polynomial.
Figure 6.5.5
Concept Review
• Least squares straight line fit
• Regression line
• Least squares polynomial fit
Skills
• Find the least squares straight line fit to a set of data points.
• Find the least squares polynomial fit to a set of data points.
• Use the techniques of this section to solve applied problems.
Answer:
2. Find the least squares straight line fit to the four points , , , and .
3. Find the quadratic polynomial that best fits the four points , , , and .
Answer:
4. Find the cubic polynomial that best fits the five points , , , , and
.
5. Show that the matrix M in Equation 2 has linearly independent columns if and only if at least two of the
numbers are distinct.
6. Show that the columns of the matrix M in Equation 10 are linearly independent if and
at least of the numbers are distinct. [Hint: A nonzero polynomial of degreem has at
most m distinct roots.]
7. Let M be the matrix in Equation 10. Using Exercise 6, show that a sufficient condition for the matrix
to be invertible is that and that at least of the numbers are distinct.
8. The owner of a rapidly expanding business finds that for the first five months of the year the sales (in
thousands) are , and $8.0. The owner plots these figures on a graph and conjectures
that for the rest of the year, the sales curve can be approximated by a quadratic polynomial. Find the least
squares quadratic polynomial fit to the sales curve, and use it to project the sales for the twelfth month of
the year.
9. A corporation obtains the following data relating the number of sales representatives on its staff to annual
sales:
Explain how you might use least squares methods to estimate the annual sales with 45 representatives, and
discuss the assumptions that you are making. (You need not perform the actual computations.)
10. Pathfinder is an experimental, lightweight,remotely piloted,solar-powered aircraft that was used in aseries
of experiments by NASA to determine the feasibilityof applyingsolar power for long-duration,high-
altitude flight. In August 1997 Pathfinder recordedthe data in the accompanying table relating altitude H
and temperature T. Show that a linear model is reasonable by plotting the data, and then find theleast
squares line of best fit.
Table Ex-10
11. Find a curve of the form that best fits the data points , , by making the
substitution . Draw the curve and plot the data points in the same coordinate system.
Answer:
True-False Exercises
In parts (a)–(d) determine whether the statement is true or false, and justify your answer.
(a) Every set of data points has a unique least squares straight line fit.
Answer:
False
(b) If the data points are not collinear, then 1 is an inconsistent system.
Answer:
True
(c) If is the least squares line fit to the data points , then
is minimal for every .
Answer:
False
(d) If is the least squares line fit to the data points , then
is minimal.
Answer:
True
Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.
6.6 Function Approximation; Fourier Series
In this section we will show orthogonal projections can be used to approximate certain types of functions by
simpler functions that are easier to work with. The ideas explained here have important applications in
engineering and science. Calculus is required.
Best Approximations
All of the problems that we will study in this section will be special cases of the following general problem.
APPROXIMATION PROBLEM
Given a function f that is continuous on an interval , find the “best possible approximation” to f
using only functions from a specified subspace W of .
Measurements of Error
To solve approximation problems of the preceding types, we first need to make the phrase “best
approximation over ” mathematically precise. To do this we will need some way of quantifying the
error that results when one continuous function is approximated by another over an interval . If we
were to approximate by , and if we were concerned only with the error in that approximation at a
single point , then it would be natural to define the error to be
sometimes called the deviation between f and g at (Figure 6.6.1). However, we are not concerned simply
with measuring the error at a single point but rather with measuring it over the entire interval . The
problem is that an approximation may have small deviations in one part of the interval and large deviations in
another. One possible way of accounting for this is to integrate the deviation over the interval
and define the error over the interval to be
(1)
Geometrically, 1 is the area between the graphs of and over the interval (Figure 6.6.2); the
greater the area, the greater the overall error.
Figure 6.6.2 The area between the graphs of f and g over [a, b] measures the error in approximating f
by g over [a, b]
Although 1 is natural and appealing geometrically, most mathematicians and scientists generally favor the
following alternative measure of error, called the mean square error:
Mean square error emphasizes the effect of larger errors because of the squaring and has the added advantage
that it allows us to bring to bear the theory of inner product spaces. To see how, suppose that f is a continuous
function on that we want to approximate by a function g from a subspace W of , and suppose
that is given the inner product
It follows that
so minimizing the mean square error is the same as minimizing . Thus the approximation problem
posed informally at the beginning of this section can be restated more precisely as follows.
Let f be a function that is continuous on an interval , let have the inner product
Since and are minimized by the same function g, this problem is equivalent to looking for a
function g in W that is closest to f. But we know from Theorem 6.4.1 that is such a function
(Figure 6.6.3).
Figure 6.6.3
THEOREM 6.6.1
(2)
is called a trigonometric polynomial; if and are not both zero, then is said to have order n. For
example,
It is evident from 2 that the trigonometric polynomials of order n or less are the various possible linear
combinations of
(3)
It can be shown that these functions are linearly independent and thus form a basis for a
-dimensional subspace of .
Let us now consider the problem of finding the least squares approximation of a continuous function
over the interval by a trigonometric polynomial of order n or less. As noted above, the least squares
approximation to f from W is the orthogonal projection of f on W. To find this orthogonal projection, we must
find an orthonormal basis for W, after which we can compute the orthogonal projection on W
from the formula
(4)
(see Theorem 6.3.4b). An orthonormal basis for W can be obtained by applying the Gram–Schmidt process to
the basis vectors in 3 using the inner product
(5)
(7)
where
In short,
(8)
Solution
(a)
(9a)
(9c)
The graphs of and some of these approximations are shown in Figure 6.6.4.
Figure 6.6.4
It is natural to expect that the mean square error will diminish as the number of terms in the
least squares approximation
increases. It can be proved that for functions f in , the mean square error
approaches zero as ; this is denoted by writing
The right side of this equation is called the Fourier series for f over the interval .
Such series are of major importance in engineering, science, and mathematics.
Historical Note Fourier was a French mathematician and physicist who discovered
the Fourier series and related ideas while working on problems of heat diffusion. This
discovery was one of the most influential in the history of mathematics; it is the
cornerstone of many fields of mathematical research and a basic tool in many branches
of engineering. Fourier, a political activist during the French revolution, spent time in
jail for his defense of many victims during the Terror. He later became a favorite of
Napoleon and was named a baron.
[Image: The Granger Collection, New York]
Concept Review
• Approximation of functions
• Mean square error
• Least squares approximation
• Trigonometric polynomial
• Fourier coefficients
• Fourier series
Skills
• Find the least squares approximation of a function.
• Find the mean square error of the least squares approximation of a function.
• Compute the Fourier series of a function.
Exercise Set 6.6
1. Find the least squares approximation of over the interval by
(a) a trigonometric polynomial of order 2 or less.
(b) a trigonometric polynomial of order n or less.
Answer:
(a)
(b)
3. (a) Find the least squares approximation of x over the interval by a function of the form .
(b) Find the mean square error of the approximation.
Answer:
(a)
(b)
4. (a) Find the least squares approximation of over the interval by a polynomial of the form
.
(b) Find the mean square error of the approximation.
5. (a) Find the least squares approximation of over the interval [−1, 1] by a polynomial of the form
.
(b) Find the mean square error of the approximation.
Answer:
(a)
(b)
6. Use the Gram–Schmidt process to obtain the orthonormal basis 5 from the basis 3.
7. Carry out the integrations indicated in Formulas 9a, 9b, and 9c.
8. Find the Fourier series of over the interval .
9. Find the Fourier series of and , over the interval .
Answer:
True-False Exercises
In parts (a)–(e) determine whether the statement is true or false, and justify your answer.
(a) If a function f in is approximated by the function g, then the mean square error is the same as the
area between the graphs of and over the interval .
Answer:
False
(b) Given a finite-dimensional subspace W of , the function g = projW f minimizes the mean square
error.
Answer:
True
(c) is an orthogonal subset of the vector space with respect to the
inner product .
Answer:
True
(d) is an orthonormal subset of the vector space with respect to the
inner product .
Answer:
False
(e) is a linearly independent subset of .
Answer:
True
Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.
Chapter 6 Supplementary Exercises
1. Let have the Euclidean inner product.
(a) Find a vector in that is orthogonal to and and makes equal
angles with and .
(b) Find a vector of length 1 that is orthogonal to and above and such that the
cosine of the angle between x and is twice the cosine of the angle between x and .
Answer:
(a) with
(b)
Answer:
(a) The subspace of all matrices in with only zeros on the diagonal.
(b) The subspace of all skew-symmetric matrices in .
is a solution of this system if and only if the vector is orthogonal to every row vector
of A with respect to the Euclidean inner product on .
5. Use the Cauchy–Schwarz inequality to show that if are positive real numbers, then
6. Show that if x and y are vectors in an inner product space and c is any scalar, then
7. Let have the Euclidean inner product. Find two vectors of length 1 that are orthogonal to all three of
the vectors , , and .
Answer:
Answer:
No
10. If u and v are vectors in an inner product space , then u, v, and can be regarded as sides of a
“triangle” in V (see the accompanying figure). Prove that the law of cosines holds for any such triangle;
that is,
Figure Ex-10
11. (a) As shown in Figure 3.2.6, the vectors (k, 0, 0), (0, k, 0), and (0, 0, k) form the edges of a cube in
with diagonal . Similarly, the vectors
can be regarded as edges of a “cube” in with diagonal . Show that each of the above
edges makes an angle of θ with the diagonal, where .
(b) Calculus required What happens to the angle θ inpart (a) as the dimension of approaches ?
Answer:
(b) approaches
13. Let u be a vector in an inner product space V, and let be an orthonormal basis for V.
Show that if is the angle between u and , then
14. Prove: If and are two inner products on a vector space V, then the quantity
is also an inner product.
15. Prove Theorem 6.2.5.
16. Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of A,then
the least squares solution of is .
17. Is there any value of s for which and is the leastsquares solution of the following linear
system?
Answer:
No
18. Show that if p and q are distinct positive integers, then the functions and are
orthogonal with respect to the inner product
19. Show that if p and q are positive integers, then the functions and are
orthogonal with respect to the inner product
Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.