Skip to main content

Section4.4Matrix Multiplication

Objectives
  1. Understand compositions of transformations.
  2. Understand the relationship between matrix products and compositions of matrix transformations.
  3. Become comfortable doing basic algebra involving matrices.
  4. Recipe: matrix multiplication (two ways).
  5. Picture: composition of transformations.
  6. Vocabulary word: composition.

In this section, we study compositions of transformations: that is, chaining transformations together. The composition of matrix transformations corresponds to a notion of multiplying two matrices together. We also discuss addition and scalar multiplication of transformations and of matrices.

Subsection4.4.1Transformation algebra

In this subsection we describe three operations that one can perform on transformations: addition, scalar multiplication, and composition. In the next subsection, we will translate these operations into the language of matrices, for matrix transformations.

Definition

  • Let T , U : R n R m be two transformations. Their sum is the transformation T + U : R n R m defined by
    ( T + U )( x )= T ( x )+ U ( x ) .
    Note that addition of transformations is only defined when both transformations have the same domain and codomain.
  • Let T : R n R m be a transformation, and let c be a scalar. The scalar product of c with T is the transformation cT : R n R m defined by
    ( cT )( x )= c · T ( x ) .

The sum of two transformations T , U : R n R m is another transformation called T + U ; its value on an input vector x is the sum of the outputs of T and U . Similarly, the product of T with a scalar c is another transformation called cT ; its value on an input vector x is the vector c · T ( x ) .

Properties of addition and scalar multiplication

Let S , T , U : R n R m be transformations and let c , d be scalars. The following properties are easily verified:

T + U = U + TS +( T + U )=( S + T )+ Uc ( T + U )= cT + cU ( c + d ) T = cT + dTc ( dT )=( cd ) TT + 0 = T

In one of the above properties, we used 0 to denote the transformation R n R m that is zero on every input vector: 0 ( x )= 0 for all x . This is called the zero transformation.

Definition

Let T : R n R m and U : R p R n be transformations. Their composition is the transformation T U : R p R m defined by

( T U )( x )= T ( U ( x )) .

Composing two transformations means chaining them together: T U is the transformation that first applies U , then applies T (note the order of operations). More precisely, to evaluate T U on an input vector x , first you evaluate U ( x ) , then you take this output vector of U and use it as an input vector of T : that is, ( T U )( x )= T ( U ( x )) . Of course, this only makes sense when the outputs of U are valid inputs of T .

R p x R n U ( x ) R m T U ( x ) U T T U

Here is a picture of the composition T U as a “machine” that first runs U , then takes its output and feeds it into T ; there is a similar picture in this subsection in Section 4.1.

T U U T R p x R m T U ( x ) U ( x ) R n
Domain and codomain of a composition

  • In order for T U to be defined, the codomain of U must equal the domain of T .
  • The domain of T U is the domain of U .
  • The codomain of T U is the codomain of T .

Recall from this definition in Section 4.1 that the identity transformation is the transformation Id R n : R n R n defined by Id R n ( x )= x for every vector x .

Properties of composition

Let S , T , U be transformations and let c be a scalar. Suppose that T : R n R m , and that in each of the following identities, the domains and the codomains are compatible when necessary for the composition to be defined. The following properties are easily verified:

S ( T + U )= S T + S U ( S + T ) U = S U + T Uc ( T U )=( cT ) Uc ( T U )= T ( cU ) if T islinear T Id R n = T Id R m T = TS ( T U )=( S T ) U

The final property is called associativity; it simply says that

S ( T U )( x )= S ( T U ( x ))= S ( T ( U ( x )))= S T ( U ( x ))=( S T ) U ( x ) .

In other words, both S ( T U ) and ( S T ) U are the transformation defined by first applying U , then T , then S .

Composition of transformations is not commutative in general. That is, in general, T U B = U T , even when both compositions are defined.

Subsection4.4.2Matrix algebra

In this subsection, we translate the algebra of linear transformations from the previous subsection into the language of matrices. First we need some terminology.

Notation

Let A be an m × n matrix. We will generally write a ij for the entry in the i th row and the j th column. It is called the i , j entry of the matrix.

a 11 ··· a 1 j ··· a 1 n ... ... ... a i 1 ··· a ij ··· a in ... ... ... a m 1 ··· a mj ··· a mn EIIIG FJJJH j thcolumn i throw
Definition

  • The sum of two m × n matrices is the matrix obtained by summing the entries of A and B individually:
    K a 11 a 12 a 13 a 21 a 22 a 23 L + K b 11 b 12 b 13 b 21 b 22 b 23 L = K a 11 + b 11 a 12 + b 12 a 13 + b 13 a 21 + b 21 a 22 + b 22 a 23 + b 23 L
    In other words, the i , j entry of A + B is the sum of the i , j entries of A and B . Note that addition of matrices is only defined when both matrices have the same dimensions.
  • The scalar product of a scalar c with a matrix A is obtained by scaling all entries of A by c :
    c K a 11 a 12 a 13 a 21 a 22 a 23 L = K ca 11 ca 12 ca 13 ca 21 ca 22 ca 23 L
    In other words, the i , j entry of cA is c times the i , j entry of A .

In view of the above fact, the following properties are consequences of the corresponding properties of transformations. They are easily verified directly from the definitions as well.

Properties of addition and scalar multiplication

Let A , B , C be m × n matrices and let c , d be scalars. Then:

A + B = B + AC +( A + B )=( C + A )+ Bc ( A + B )= cA + cB ( c + d ) A = cA + dAc ( dA )=( cd ) AA + 0 = A

In one of the above properties, we used 0 to denote the m × n matrix whose entries are all zero. This is the standard matrix of the zero transformation, and is called the zero matrix.

Definition(Matrix multiplication)

Let A be an m × n matrix and let B be an n × p matrix. Denote the columns of B by v 1 , v 2 ,..., v p :

B = C ||| v 1 v 2 ··· v p ||| D .

The product AB is the m × p matrix with columns Av 1 , Av 2 ,..., Av p :

AB = C ||| Av 1 Av 2 ··· Av p ||| D .

In other words, matrix multiplication is defined column-by-column, or “distributes over the columns of B .

In order for the vectors Av 1 , Av 2 ,..., Av p to be defined, the numbers of rows of B has to equal the number of columns of A .

Dimensions of the matrix product

  • In order for AB to be defined, the number of rows of B has to equal the number of columns of A .
  • m × n matrix and an n × p matrix is an m × p matrix.

If B has only one column, then AB also has one column. A matrix with one column is the same as a vector, so the definition of the matrix product generalizes the definition of the matrix-vector product.

If A is a square matrix, then we can multiply it by itself; we define its powers to be

A 2 = AAA 3 = AAA etc.
The row-column rule for matrix multiplication

Recall from this definition in Section 3.3 that the product of a row vector and a column vector is the scalar

A a 1 a 2 ··· a n BEIIG x 1 x 2 ... x n FJJH = a 1 x 1 + a 2 x 2 + ··· + a n x n .

The following procedure for finding the matrix product is much better adapted to computations by hand; the previous definition is more suitable for proving the theorem below.

Recipe: The row-column rule for matrix multiplication

Let A be an m × n matrix, let B be an n × p matrix, and let C = AB . Then the ij entry of C is the i th row of A times the j th column of B :

c ij = a i 1 b 1 j + a i 2 b 2 j + ··· + a in b nj .

Here is a diagram:

a 11 ··· a 1 k ··· a 1 n ... ... ... a i 1 ··· a ik ··· a in ... ... ... a m 1 ··· a mk ··· a mn EIIIG FJJJH i throw b 11 ··· b 1 j ··· b 1 p ... ... ... b k 1 ··· b kj ··· b kp ... ... ... b n 1 ··· b nj ··· b np EIIIIG FJJJJH j thcolumn = c 11 ··· c 1 j ··· c 1 p ... ... ... c i 1 ··· c ij ··· c ip ... ... ... c m 1 ··· c mj ··· c mp EIIIG FJJJH ij entry

Subsection4.4.3Composition and Matrix Multiplication

The point of this subsection is to show that matrix multiplication corresponds to composition of transformations.

Proof

The theorem justifies our choice of definition of the matrix product. This is the one and only reason that matrix products are defined in this way. To rephrase:

Products and compositions

The matrix of the composition of two linear transformations is the product of the matrices of the transformations.

Recall from this definition in Section 4.3 that the identity transformation is the n × n matrix I n whose columns are the standard coordinate vectors in R n . The identity matrix is the standard matrix of the identity transformation: that is, x = Id R n ( x )= I n x for all vectors x in R n .

In view of the above theorem, the following properties are consequences of the corresponding properties of transformations.

Properties of matrix multiplication

Let A , B , C be matrices and let c be a scalar. Suppose that A has dimensions m × n , and that in each of the following identities, the dimensions of B and C are compatible when necessary for the product to be defined. Then:

C ( A + B )= CA + CB ( A + B ) C = AC + BCc ( AB )=( cA ) Bc ( AB )= A ( cB ) AI n = AI m A = A ( AB ) C = A ( BC )

Most of the above properties are easily verified directly from the definitions. The associativity property ( AB ) C = A ( BC ) , however, is not (try it!). It is much easier to prove by relating matrix multiplication to composition of transformations, and using the obvious fact that composition of transformations is associative.

Although matrix multiplication satisfies many of the properties one would expect, one must be careful when doing matrix arithmetic, as there are several properties that are not satisfied in general.

Matrix multiplication caveats

  • Matrix multiplication is not commutative: AB is not usually equal to BA , even when both products are defined and have the same size. See this example.
  • Matrix multiplication does not satisfy the cancellation law: AB = AC does not imply B = C , even when A B = 0. For example,
    K 1000 LK 1234 L = K 1200 L = K 1000 LK 1256 L .
  • It is possible for AB = 0, even when A B = 0 and B B = 0. For example,
    K 1010 LK 0011 L = K 0000 L .