Learn to turn a best-fit problem into a least-squares problem.

Recipe: find a least-squares solution (two ways).

Picture: geometry of a least-squares solution.

Vocabulary words:least-squares solution.

In this section, we answer the following important question:

Suppose that does not have a solution. What is the best approximate solution?

For our purposes, the best approximate solution is called the least-squares solution. We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.

Subsection6.5.1Least-Squares Solutions

We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation

Definition

Let be an matrix and let be a vector in A least-squares solution of the matrix equation is a vector in such that

for all other vectors in

Recall that is the distance between the vectors and The term “least squares” comes from the fact that is the square root of the sum of the squares of the entries of the vector So a least-squares solution minimizes the sum of the squares of the differences between the entries of and In other words, a least-squares solution solves the equation as closely as possible, in the sense that the sum of the squares of the difference is minimized.

Least Squares: Picture

Suppose that the equation is inconsistent. Recall from this note in Section 2.3 that the column space of is the set of all other vectors such that is consistent. In other words, is the set of all vectors of the form Hence, the closest vector of the form to is the orthogonal projection of onto This is denoted following this notation in Section 6.3.

A least-squares solution of is a solution of the consistent equation

Note

If is consistent, then so that a least-squares solution is the same as a usual solution.

Where is in this picture? If are the columns of then

Hence the entries of are the “coordinates” of with respect to the spanning set of (They are honest -coordinates if the columns of are linearly independent.)

Note

If is consistent, then so that a least-squares solution is the same as a usual solution.

We learned to solve this kind of orthogonal projection problem in Section 6.3.

Theorem

Let be an matrix and let be a vector in The least-squares solutions of are the solutions of the matrix equation

By this theorem in Section 6.3, if is a solution of the matrix equation then is equal to We argued above that a least-squares solution of is a solution of

In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:

Recipe 1: Compute a least-squares solution

Let be an matrix and let be a vector in Here is a method for computing a least-squares solution of

Compute the matrix and the vector

Form the augmented matrix for the matrix equation and row reduce.

This equation is always consistent, and any solution is a least-squares solution.

To reiterate: once you have found a least-squares solution of then is equal to

The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of are linearly dependent, then has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of this corollary in Section 6.3.

Theorem

Let be an matrix and let be a vector in The following are equivalent:

The set of least-squares solutions of is the solution set of the consistent equation which is a translate of the solution set of the homogeneous equation Since is a square matrix, the equivalence of 1 and 3 follows from the invertible matrix theorem in Section 5.1. The set of least squares-solutions is also the solution set of the consistent equation which has a unique solution if and only if the columns of are linearly independent by this important note in Section 2.5.

As usual, calculations involving projections become easier in the presence of an orthogonal set. Indeed, if is an matrix with orthogonal columns then we can use the projection formula in Section 6.4 to write

In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.

Example(Best-fit line)

Suppose that we have measured three data points

and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

The general equation for a (non-vertical) line is

If our three data points were to lie on this line, then the following equations would be satisfied:

(6.5.1)

In order to find the best-fit line, we try to solve the above equations in the unknowns and As the three points do not actually lie on a line, there is no actual solution, so instead we compute a least-squares solution.

Putting our linear equations into matrix form, we are trying to solve for

We solved this least-squares problem in this example: the only least-squares solution to is so the best-fit line is

What exactly is the line minimizing? The least-squares solution minimizes the sum of the squares of the entries of the vector The vector is the left-hand side of (6.5.1), and

In other words, is the vector whose entries are the -coordinates of the graph of the line at the values of we specified in our data points, and is the vector whose entries are the -coordinates of those data points. The difference is the vertical distance of the graph from the data points:

The best-fit line minimizes the sum of the squares of these vertical distances.

All of the above examples have the following form: some number of data points are specified, and we want to find a function

that best approximates these points, where are fixed functions of Indeed, in the best-fit line example we had and in the best-fit parabola example we had and and in the best-fit linear function example we had and (in this example we take to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns —once we evaluate the they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of to our original data points.

To emphasize that the nature of the functions really is irrelevant, consider the following example.

Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.