Skip to main content

Section7.3Orthogonal Projection

Objectives
  1. Understand the orthogonal decomposition of a vector with respect to a subspace.
  2. Understand the relationship between orthogonal decomposition and orthogonal projection.
  3. Understand the relationship between orthogonal decomposition and the closest vector on / distance to a subspace.
  4. Learn the basic properties of orthogonal projections as linear transformations and as matrix transformations.
  5. Recipes: orthogonal projection onto a line, orthogonal decomposition by solving a system of equations, orthogonal projection via a complicated matrix product.
  6. Pictures: orthogonal decomposition, orthogonal projection.
  7. Vocabulary words: orthogonal decomposition, orthogonal projection.

Let W be a subspace of R n and let x be a vector in R n . In this section, we will learn to compute the closest vector x W to x in W . The vector x W is called the orthogonal projection of x onto W .

Subsection7.3.1Orthogonal Decomposition

We begin by fixing some notation.

Notation

Let W be a subspace of R n and let x be a vector in R n . We denote the closest vector to x on W by x W .

To say that x W is the closest vector to x on W means that the difference x x W is orthogonal to the vectors in W :

W x W x x x W

In other words, if x W = x x W , then we have x = x W + x W , where x W is in W and x W is in W . The first order of business is to prove that the closest vector always exists.

Proof
Definition

Let W be a subspace of R n and let x be a vector in R n . The expression

x = x W + x W

for x W in W and x W in W , is called the orthogonal decomposition of x with respect to W , and the closest vector x W is the orthogonal projection of x onto W .

Since x W is the closest vector on W to x , the distance from x to the subspace W is the length of the vector from x W to x , i.e., the length of x W . To restate:

Closest vector and distance

Let W be a subspace of R n and let x be a vector in R n .

  • The orthogonal projection x W is the closest vector to x in W .
  • The distance from x to W is A x W A .

Now we turn to the problem of computing x W and x W . Of course, since x W = x x W , really all we need is to compute x W . The following theorem gives a method for computing the orthogonal projection in terms of a spanning set.

Proof

Let x = x W + x W be the orthogonal decomposition with respect to W . We have A T x W = 0 by this proposition in Section 7.2, so

A T x = A T ( x W + x W )= A T x W + A T x W = A T x W .

Since x W is in W , we can write x W = c 1 v 1 + c 2 v 2 + ··· + c m v m for some scalars c 1 , c 2 ,..., c m . Let c be the vector with entries c 1 , c 2 ,..., c m . Then Ac = x W , so

A T x = A T x W = A T Ac .

This proves that the matrix equation A T Ac = A T x is consistent, and that x W = Ac for a solution c .

Example(Orthogonal projection onto a line)

Let L = Span { u } be a line in R n and let x be a vector in R n . By the theorem, to find x L we must solve the matrix equation u T uc = u T x , where we regard u as an n × 1 matrix. But u T u = u · u and u T x = u · x , so c =( u · x ) / ( u · u ) is a solution of u T uc = u T x , and hence x L = uc =( u · x ) / ( u · u ) u .

L u x x L = u · x u · uu x L

To reiterate:

Recipe: Orthogonal projection onto a line

If L = Span { u } is a line, then

x L = u · x u · uu and x L = x x L

for any vector x .

When W = Span { v 1 , v 2 ,..., v m } has dimension greater than one, computing the orthogonal projection of x onto W means solving the matrix equation A T Ac = A T x , where A has columns v 1 , v 2 ,..., v m . In other words, we can compute the closest vector by solving a system of linear equations. To be explicit, we state the theorem as a recipe:

Recipe: Compute an orthogonal decomposition

Let W = Span { v 1 , v 2 ,..., v m } and let A be the matrix with columns v 1 , v 2 ,..., v m . Here is a method to compute the orthogonal decomposition of a vector x with respect to W :

  1. Compute the matrix A T A and the vector A T x .
  2. Form the augmented matrix for the matrix equation A T Ac = A T x in the unknown vector c , and row reduce.
  3. This equation is always consistent; choose one solution c . Then
    x W = Acx W = x x W .

In the context of the above theorem, if we start with a basis of W , then it turns out that the square matrix A T A is automatically invertible! (It is always the case that A T A is square and the equation A T Ac = A T x is consistent, but A T A need not be invertible in general.)

Proof

We will show that Nul ( A T A )= { 0 } , which implies invertibility by the invertible matrix theorem in Section 6.1. Suppose that A T Ac = 0. Then A T Ac = A T 0, so 0 W = Ac by the theorem. But 0 W = 0 (the orthogonal decomposition of the zero vector is just 0 = 0 + 0 ) , so Ac = 0, and therefore c is in Nul ( A ) . Since the columns of A are linearly independent, we have c = 0, so Nul ( A T A )= 0, as desired.

Let x be a vector in R n and let c be a solution of A T Ac = A T x . Then c =( A T A ) 1 A T x , so x W = Ac = A ( A T A ) 1 A T x .

Subsection7.3.2Orthogonal Projection

In this subsection, we change perspective and think of the orthogonal projection x W as a function of x . This function turns out to be a linear transformation with many nice properties, and is a good example of a linear transformation which is not originally defined as a matrix transformation.

We compute the standard matrix of the orthogonal projection in the same way as for any other transformation: by evaluating on the standard coordinate vectors. In this case, this means projecting the standard coordinate vectors onto the subspace.

In the previous example, we could have used the fact that

EC 10 1 D , C 110 DF

forms a basis for W , so that

T ( x )= x W = A A ( A T A ) 1 A T B x for A = C 1101 10 D

by the corollary. In this case, we have already expressed T as a matrix transformation with matrix A ( A T A ) 1 A T . See this example.

Let W be a subspace of R n with basis v 1 , v 2 ,..., v m , and let A be the matrix with columns v 1 , v 2 ,..., v m . Then the standard matrix for T ( x )= x W is

A ( A T A ) 1 A T .

We can translate the above properties of orthogonal projections into properties of the associated standard matrix.