Section6.5The Method of Least Squares¶ permalink

Objectives

Learn examples of best-fit problems.
Learn to turn a best-fit problem into a least-squares problem.
Recipe: find a least-squares solution.
Picture: geometry of a least-squares solution.
Vocabulary words: least-squares solution.

In this section, we answer the following important question:

Suppose that does not have a solution. What is the best approximate solution?

For our purposes, the best approximate solution is called the least-squares solution. We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.

Subsection6.5.1Least-Squares Solutions

We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation

Definition

Let be an matrix and let be a vector in A least-squares solution of the matrix equation is a vector in such that

for all other vectors in

Recall that is the distance between the vectors and The term “least squares” comes from the fact that is the square root of the sum of the squares of the entries of the vector So a least-squares solution minimizes the sum of the squares of the differences between the entries of and In other words, a least-squares solution solves the equation as closely as possible, in the sense that the sum of the squares of the difference is minimized.

Least Squares: Picture

Suppose that the equation is inconsistent. Recall from this note in Section 2.3 that the column space of is the set of all other vectors such that is consistent. In other words, is the set of all vectors of the form Hence, the closest vector of the form to is the orthogonal projection of onto This is denoted following this notation in Section 6.3.

A least-squares solution of is a solution of the consistent equation

Note

If is consistent, then so that a least-squares solution is the same as a usual solution.

Where is in this picture? If are the columns of then

Hence the entries of are the “coordinates” of with respect to the spanning set of





Figure4The violet plane is The closest that can get to is the closest vector on to which is the orthogonal projection (in blue). The vectors are the columns of and the coefficients of are the lengths of the green lines. Click and drag to move it.

Note

If is consistent, then so that a least-squares solution is the same as a usual solution.

We learned to solve this kind of orthogonal projection problem in Section 6.3.

Theorem

Let be an matrix and let be a vector in The least-squares solutions of are the solutions of the matrix equation

Proof

In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:

Recipe: Compute a least-squares solution

Let be an matrix and let be a vector in Here is a method for computing a least-squares solution of

Compute the matrix and the vector
Form the augmented matrix for the matrix equation and row reduce.
This equation is always consistent, and any solution is a least-squares solution.

To reiterate: once you have found a least-squares solution of then is equal to

Example

The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of are linearly dependent, then has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of this corollary in Section 6.3.

Theorem

Let be an matrix and let be a vector in The following are equivalent:

has a unique least-squares solution.
The columns of are linearly independent.
is invertible.

In this case, the least-squares solution is

Proof

Example(Infinitely many least-squares solutions)

Subsection6.5.2Best-Fit Problems

In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.

Example(Best-fit line)

Suppose that we have measured three data points

and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

The general equation for a (non-vertical) line is

If our three data points were to lie on this line, then the following equations would be satisfied:

(6.5.1)

In order to find the best-fit line, we try to solve the above equations in the unknowns and As the three points do not actually lie on a line, there is no actual solution, so instead we compute a least-squares solution.

Putting our linear equations into matrix form, we are trying to solve for

We solved this least-squares problem in this example: the only least-squares solution to is so the best-fit line is

What exactly is the line minimizing? The least-squares solution minimizes the sum of the squares of the entries of the vector The vector is the left-hand side of (6.5.1), and

In other words, is the vector whose entries are the -coordinates of the graph of the line at the values of we specified in our data points, and is the vector whose entries are the -coordinates of those data points. The difference is the vertical distance of the graph from the data points:

The best-fit line minimizes the sum of the squares of these vertical distances.

Interactive: Best-fit line

Example(Best-fit parabola)

Example(Best-fit linear function)

All of the above examples have the following form: some number of data points are specified, and we want to find a function

that best approximates these points, where are fixed functions of Indeed, in the best-fit line example we had and in the best-fit parabola example we had and and in the best-fit linear function example we had and (in this example we take to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns —once we evaluate the they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of to our original data points.

To emphasize that the nature of the functions really is irrelevant, consider the following example.

Example(Best-fit trigonometric function)

The next example has a somewhat different flavor from the previous ones.

Example(Best-fit ellipse)

Note

Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.

Comments, corrections or suggestions?(Free GitHub account required)