Objectives
- Learn examples of best-fit problems.
- Learn to turn a best-fit problem into a least-squares problem.
- Recipe: find a least-squares solution.
- Picture: geometry of a least-squares solution.
- Vocabulary words: least-squares solution.
In this section, we answer the following important question:
Suppose that does not have a solution. What is the best approximate solution?
For our purposes, the best approximate solution is called the least-squares solution. We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.
We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation
Let be an matrix and let be a vector in A least-squares solution of the matrix equation is a vector in such that
for all other vectors in
Recall that is the distance between the vectors and The term “least squares” comes from the fact that is the square root of the sum of the squares of the entries of the vector So a least-squares solution minimizes the sum of the squares of the differences between the entries of and In other words, a least-squares solution solves the equation as closely as possible, in the sense that the sum of the squares of the difference is minimized.
Suppose that the equation is inconsistent. Recall from this note in Section 2.3 that the column space of is the set of all other vectors such that is consistent. In other words, is the set of all vectors of the form Hence, the closest vector of the form to is the orthogonal projection of onto This is denoted following this notation in Section 6.3.
A least-squares solution of is a solution of the consistent equation
If is consistent, then so that a least-squares solution is the same as a usual solution.
Where is in this picture? If are the columns of then
Hence the entries of are the “coordinates” of with respect to the spanning set of
If is consistent, then so that a least-squares solution is the same as a usual solution.
We learned to solve this kind of orthogonal projection problem in Section 6.3.
Let be an matrix and let be a vector in The least-squares solutions of are the solutions of the matrix equation
In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:
Let be an matrix and let be a vector in Here is a method for computing a least-squares solution of
To reiterate: once you have found a least-squares solution of then is equal to
The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of are linearly dependent, then has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of this corollary in Section 6.3.
Let be an matrix and let be a vector in The following are equivalent:
In this case, the least-squares solution is
In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.
Suppose that we have measured three data points
and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?
The general equation for a (non-vertical) line is
If our three data points were to lie on this line, then the following equations would be satisfied:
In order to find the best-fit line, we try to solve the above equations in the unknowns and As the three points do not actually lie on a line, there is no actual solution, so instead we compute a least-squares solution.
Putting our linear equations into matrix form, we are trying to solve for
We solved this least-squares problem in this example: the only least-squares solution to is so the best-fit line is
What exactly is the line minimizing? The least-squares solution minimizes the sum of the squares of the entries of the vector The vector is the left-hand side of (6.5.1), and
In other words, is the vector whose entries are the -coordinates of the graph of the line at the values of we specified in our data points, and is the vector whose entries are the -coordinates of those data points. The difference is the vertical distance of the graph from the data points:
The best-fit line minimizes the sum of the squares of these vertical distances.
All of the above examples have the following form: some number of data points are specified, and we want to find a function
that best approximates these points, where are fixed functions of Indeed, in the best-fit line example we had and in the best-fit parabola example we had and and in the best-fit linear function example we had and (in this example we take to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns —once we evaluate the they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of to our original data points.
To emphasize that the nature of the functions really is irrelevant, consider the following example.
The next example has a somewhat different flavor from the previous ones.
Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.