# Section6.5The Method of Least Squares¶ permalink

##### Objectives
1. Learn examples of best-fit problems.
2. Learn to turn a best-fit problem into a least-squares problem.
3. Recipe: find a least-squares solution (two ways).
4. Picture: geometry of a least-squares solution.
5. Vocabulary words: least-squares solution.

In this section, we answer the following important question:

Suppose that does not have a solution. What is the best approximate solution?

For our purposes, the best approximate solution is called the least-squares solution. We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.

# Subsection6.5.1Least-Squares Solutions

We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation

##### Definition

Let be an matrix and let be a vector in A least-squares solution of the matrix equation is a vector in such that

for all other vectors in

Recall that is the distance between the vectors and The term “least squares” comes from the fact that is the square root of the sum of the squares of the entries of the vector So a least-squares solution minimizes the sum of the squares of the differences between the entries of and In other words, a least-squares solution solves the equation as closely as possible, in the sense that the sum of the squares of the difference is minimized.

##### Least Squares: Picture

Suppose that the equation is inconsistent. Recall from this note in Section 2.3 that the column space of is the set of all other vectors such that is consistent. In other words, is the set of all vectors of the form Hence, the closest vector of the form to is the orthogonal projection of onto This is denoted following this notation in Section 6.3.

A least-squares solution of is a solution of the consistent equation

##### Note

If is consistent, then so that a least-squares solution is the same as a usual solution.

Where is in this picture? If are the columns of then

Hence the entries of are the “coordinates” of with respect to the spanning set of (They are honest -coordinates if the columns of are linearly independent.)

##### Note

If is consistent, then so that a least-squares solution is the same as a usual solution.

We learned to solve this kind of orthogonal projection problem in Section 6.3.

##### Proof

In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:

##### Recipe 1: Compute a least-squares solution

Let be an matrix and let be a vector in Here is a method for computing a least-squares solution of

1. Compute the matrix and the vector
2. Form the augmented matrix for the matrix equation and row reduce.
3. This equation is always consistent, and any solution is a least-squares solution.

To reiterate: once you have found a least-squares solution of then is equal to

The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of are linearly dependent, then has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of this corollary in Section 6.3.

##### Proof

As usual, calculations involving projections become easier in the presence of an orthogonal set. Indeed, if is an matrix with orthogonal columns then we can use the projection formula in Section 6.4 to write

Note that the least-squares solution is unique in this case, since an orthogonal set is linearly independent.

##### Recipe 2: Compute a least-squares solution

Let be an matrix with orthogonal columns and let be a vector in Then the least-squares solution of is the vector

This formula is particularly useful in the sciences, as matrices with orthogonal columns often arise in nature.

# Subsection6.5.2Best-Fit Problems

In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.

##### Example(Best-fit line)

Suppose that we have measured three data points

and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

The general equation for a (non-vertical) line is

If our three data points were to lie on this line, then the following equations would be satisfied:

(6.5.1)

In order to find the best-fit line, we try to solve the above equations in the unknowns and As the three points do not actually lie on a line, there is no actual solution, so instead we compute a least-squares solution.

Putting our linear equations into matrix form, we are trying to solve for

We solved this least-squares problem in this example: the only least-squares solution to is so the best-fit line is

What exactly is the line minimizing? The least-squares solution minimizes the sum of the squares of the entries of the vector The vector is the left-hand side of (6.5.1), and

In other words, is the vector whose entries are the -coordinates of the graph of the line at the values of we specified in our data points, and is the vector whose entries are the -coordinates of those data points. The difference is the vertical distance of the graph from the data points:

The best-fit line minimizes the sum of the squares of these vertical distances.

All of the above examples have the following form: some number of data points are specified, and we want to find a function

that best approximates these points, where are fixed functions of Indeed, in the best-fit line example we had and in the best-fit parabola example we had and and in the best-fit linear function example we had and (in this example we take to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns —once we evaluate the they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of to our original data points.

To emphasize that the nature of the functions really is irrelevant, consider the following example.

The next example has a somewhat different flavor from the previous ones.

##### Note

Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.