W W L CHEN

(1)

W W L CHEN

c

W W L Chen, 1982, 2008.

This chapter originates from material used by the author at Imperial College, University of London, between 1981 and 1990.

It is available free to all individuals, on the understanding that it is not to be used for financial gain,

and may be downloaded and/or photocopied, with or without permission from the author.

However, this document may not be kept on any information storage and retrieval system without permission

from the author, unless such system is not accessible to any individuals other than its owners.

Chapter 2

MATRICES

2.1. Introduction

A rectangular array of numbers of the form





a11 . . . a1n ..

. ...

am1 . . . amn



 (1)

is called anm×nmatrix, withmrows andncolumns. We count rows from the top and columns from the left. Hence

(ai1 . . . ain) and



 

a1j .. . amj



 

represent respectively the i-th row and thej-th column of the matrix (1), and aij represents the entry in the matrix (1) on thei-th row andj-th column.

Example 2.1.1._{Consider the 3}_×_{4 matrix}





2 4 3 −1

3 1 5 2

−1 0 7 6



.

Here

( 3 1 5 2 ) and 

 3 5 7



(2)

represent respectively the 2-nd row and the 3-rd column of the matrix, and 5 represents the entry in the matrix on the 2-nd row and 3-rd column.

We now consider the question of arithmetic involving matrices. First of all, let us study the problem of addition. A reasonable theory can be derived from the following deﬁnition.

Definition._{Suppose that the two matrices}

A= 



a11 . . . a1n ..

. ...

am1 . . . amn



 and B= 



b11 . . . b1n ..

. ...

bm1 . . . bmn





both havemrows andncolumns. Then we write

A+B = 



a11+b11 . . . a1n+b1n ..

. ...

am1+bm1 . . . amn+bmn





and call this the sum of the two matricesAandB.

Example 2.1.2._{Suppose that}

A= 



2 4 3 −1

3 1 5 2

−1 0 7 6



 and B= 



1 2 −2 7

0 2 4 −1

−2 1 3 3



.

Then

A+B = 



2 + 1 4 + 2 3−2 −1 + 7 3 + 0 1 + 2 5 + 4 2−1

−1−2 0 + 1 7 + 3 6 + 3 

= 



3 6 1 6

3 3 9 1

−3 1 10 9 

.

Example 2.1.3._{We do not have a deﬁnition for “adding” the matrices}

2 4 3 −1

−1 0 7 6

and 



2 4 3

3 1 5

−1 0 7 

.

PROPOSITION 2A. (MATRIX ADDITION) Suppose that A, B, C are m×n matrices. Suppose further that O represents them×n matrix with all entries zero. Then

(a) A+B =B+A;

(b) A+ (B+C) = (A+B) +C; (c) A+O=A; and

(d) there is an m×nmatrixA′ _{such that} _A₊_A′ ₌_O.

Proof._{Parts (a)–(c) are easy consequences of ordinary addition, as matrix addition is simply entry-wise}

addition. For part (d), we can consider the matrixA′ _{obtained from}_A_{by multiplying each entry of} _A

by−1.

The theory of multiplication is rather more complicated, and includes multiplication of a matrix by a scalar as well as multiplication of two matrices.

(3)

Definition._{Suppose that the matrix}

A= 



a11 . . . a1n ..

. ...

am1 . . . amn





hasmrows andncolumns, and thatc∈R. Then we write

cA= 



ca11 . . . ca1n ..

. ...

cam1 . . . camn





and call this the product of the matrixAby the scalarc.

Example 2.1.4._{Suppose that}

A= 



2 4 3 −1

3 1 5 2

−1 0 7 6 

.

Then

2A= 



4 8 6 −2

6 2 10 4

−2 0 14 12 

.

PROPOSITION 2B.(MULTIPLICATION BY SCALAR)Suppose that A, Barem×nmatrices, and that c, d∈R. Suppose further thatO represents them×nmatrix with all entries zero. Then

(a) c(A+B) =cA+cB; (b) (c+d)A=cA+dA; (c) 0A=O; and

(d) c(dA) = (cd)A.

Proof._{These are all easy consequences of ordinary multiplication, as multiplication by scalar}_c_{is simply}

entry-wise multiplication by the numberc.

The question of multiplication of two matrices is rather more complicated. To motivate this, let us consider the representation of a system of linear equations

a11x1+. . .+ a1nxn = b1,

.. .

am1x1+. . .+amnxn =bm,

(2)

in the formAx=b, where

A= 



a11 . . . a1n ..

. ...

am1 . . . amn



 and b= 

 b1

.. . bm



 (3)

represent the coeﬃcients and

x= 

 x1

.. . xn



(4)

represents the variables. This can be written in full matrix notation by





a11 . . . a1n ..

. ...

am1 . . . amn



 

 x1

.. . xn



= 

 b1

.. . bm



.

Can you work out the meaning of this representation?

Now let us deﬁne matrix multiplication more formally.

Definition._{Suppose that}

A= 



a11 . . . a1n ..

. ...

am1 . . . amn



 and B= 

 

b11 . . . b1p ..

. ...

bn1 . . . bnp



 

are respectively anm×n matrix and ann×pmatrix. Then the matrix productAB is given by the m×pmatrix

AB= 

 

q11 . . . q1p ..

. ...

qm1 . . . qmp



 ,

where for everyi= 1, . . . , mandj = 1, . . . , p, we have

qij = n X

k=1

aikbkj=ai1b1j+. . .+ainbnj.

Remark._{Note ﬁrst of all that the number of columns of the ﬁrst matrix must be equal to the number}

of rows of the second matrix. On the other hand, for a simple way to work outqij, the entry in thei-th row andj-th column ofAB, we observe that thei-th row ofAand thej-th column ofBare respectively

(ai1 . . . ain) and



 

b1j .. . bnj



 .

We now multiply the corresponding entries – fromai1 withb1j, and so on, untilainwithbnj – and then

add these products to obtainqij.

Example 2.1.5._{Consider the matrices}

A= 



2 4 3 −1

3 1 5 2

−1 0 7 6



 and B= 

 

1 4

2 3

0 −2

3 1



 .

Note thatA is a 3×4 matrix andB is a 4×2 matrix, so that the productAB is a 3×2 matrix. Let us calculate the product

AB= 



q11 q12

q21 q22

q31 q32



(5)

Consider ﬁrst of allq11. To calculate this, we need the 1-st row ofAand the 1-st column ofB, so let us

cover up all unnecessary information, so that





2 4 3 −1

× × × ×



 

 

1 ×

2 ×

0 ×

3 ×



 =



 q11 ×

× ×



.

From the deﬁnition, we have

q11= 2·1 + 4·2 + 3·0 + (−1)·3 = 2 + 8 + 0−3 = 7.

Consider nextq12. To calculate this, we need the 1-st row ofAand the 2-nd column ofB, so let us cover

up all unnecessary information, so that





2 4 3 −1

× × × ×



 

 

× 4

× 3

× −2

× 1



 =





× q12

× ×



.

q12= 2·4 + 4·3 + 3·(−2) + (−1)·1 = 8 + 12−6−1 = 13.

Consider nextq21. To calculate this, we need the 2-nd row ofAand the 1-st column ofB, so let us cover





× × × ×

3 1 5 2

× × × ×



 

 

1 ×

2 ×

0 ×

3 ×



 =





× ×

q21 ×

× ×



.

q21= 3·1 + 1·2 + 5·0 + 2·3 = 3 + 2 + 0 + 6 = 11.

Consider next q22. To calculate this, we need the 2-nd row of A and the 2-nd column of B, so let us





× × × ×

3 1 5 2

× × × ×



 

 

× 4

× 3

× −2

× 1



 =





× ×

× q22

× ×



.

q22= 3·4 + 1·3 + 5·(−2) + 2·1 = 12 + 3−10 + 2 = 7.

Consider nextq31. To calculate this, we need the 3-rd row ofAand the 1-st column ofB, so let us cover





× × × ×

−1 0 7 6



 

 

1 ×

2 ×

0 ×

3 ×



 =





× ×

q31 ×



.

(6)

Consider ﬁnallyq32. To calculate this, we need the 3-rd row ofA and the 2-nd column ofB, so let us





× × × ×

−1 0 7 6



 

 

× 4

× 3

× −2

× 1



 =





× ×

× q32



.

q32= (−1)·4 + 0·3 + 7·(−2) + 6·1 =−4 + 0 +−14 + 6 =−12.

We therefore conclude that

AB= 



2 4 3 −1

3 1 5 2

−1 0 7 6



 

 

1 4

2 3

0 −2

3 1



 =





7 13

11 7

17 −12 

.

Example 2.1.6._{Consider again the matrices}

A= 



2 4 3 −1

3 1 5 2

−1 0 7 6



 and B= 

 

1 4

2 3

0 −2

3 1



 .

Note that B is a 4×2 matrix and A is a 3×4 matrix, so that we do not have a deﬁnition for the “product”BA.

We leave the proofs of the following results as exercises for the interested reader.

PROPOSITION 2C.(ASSOCIATIVE LAW)Suppose thatAis anm×nmatrix,B is ann×pmatrix andC is an p×r matrix. ThenA(BC) = (AB)C.

PROPOSITION 2D.(DISTRIBUTIVE LAWS)

(a) Suppose that Ais anm×nmatrix andB andC aren×pmatrices. ThenA(B+C) =AB+AC. (b) Suppose thatAandB arem×nmatrices andC is ann×pmatrix. Then(A+B)C=AC+BC.

PROPOSITION 2E.Suppose thatAis anm×nmatrix,B is ann×pmatrix, and thatc∈R. Then c(AB) = (cA)B=A(cB).

2.2. Systems of Linear Equations

Note that the system (2) of linear equations can be written in matrix form as

Ax=b,

where the matricesA,xandbare given by (3) and (4). In this section, we shall establish the following important result.

(7)

Proof._{Clearly the system (2) has either no solution, exactly one solution, or more than one solution.}

It remains to show that if the system (2) has two distinct solutions, then it must have inﬁnitely many solutions. Suppose thatx=uandx=vrepresent two distinct solutions. Then

Au=b and Av=b,

so that

A(u−v) =Au−Av=b−b=0,

where0is the zerom×1 matrix. It now follows that for everyc∈R, we have

A(u+c(u−v)) =Au+A(c(u−v)) =Au+c(A(u−v)) =b+c0=b,

so thatx=u+c(u−v) is a solution for everyc∈R. Clearly we have inﬁnitely many solutions.

2.3. Inversion of Matrices

For the remainder of this chapter, we shall deal with square matrices, those where the number of rows equals the number of columns.

Definition._The_n_×_n_matrix

In= 



a11 . . . a1n ..

. ...

an1 . . . ann



,

where

aij =

1 if i=j, 0 if i6=j,

is called the identity matrix of ordern.

Remark._{Note that}

I1= ( 1 ) and I4=



 

1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1



 .

The following result is relatively easy to check. It shows that the identity matrixInacts as the identity for multiplication ofn×nmatrices.

PROPOSITION 2G.For every n×nmatrixA, we have AIn=InA=A.

This raises the following question: Given ann×nmatrixA, is it possible to ﬁnd anothern×nmatrix B such thatAB=BA=In?

(8)

Definition. _An_n_×_n _matrix _A _{is said to be invertible if there exists an} _n_×_n _matrix _B _{such that}

AB=BA=In. In this case, we say thatB is the inverse ofAand write B=A−1_.

PROPOSITION 2H.Suppose that Ais an invertiblen×n matrix. Then its inverseA−1 _{is unique.}

Proof._{Suppose that}_B _{satisﬁes the requirements for being the inverse of}_{A. Then}_AB₌_BA₌_{In. It}

follows that

A−1₌_A−1_In ₌_A−1_{(AB) = (A}−1_A)B₌_InB ₌_B.

Hence the inverseA−1 _{is unique.}

PROPOSITION 2J.Suppose thatA andB are invertiblen×nmatrices. Then (AB)−1₌_B−1_A−1_.

Proof._{In view of the uniqueness of inverse, it is suﬃcient to show that}_B−1_A−1 _{satisﬁes the}

require-ments for being the inverse ofAB. Note that

(AB)(B−1_A−1_{) =}_A(B(B−1_A−1_{)) =}_A((BB−1_)A−1_{) =}_A(InA−1_{) =}_AA−1₌_In

and

(B−1_A−1_{)(AB) =}_B−1_(A−1_{(AB)) =}_B−1_((A−1_{A)B) =}_B−1_{(InB) =}_B−1_B₌_In

as required.

PROPOSITION 2K.Suppose that Ais an invertiblen×n matrix. Then(A−1₎−1₌_A.

Proof._{Note that both (A}−1₎−1 _and_A_{satisfy the requirements for being the inverse of}_A−1_{. Equality}

follows from the uniqueness of inverse.

2.4. Application to Matrix Multiplication

In this section, we shall discuss an application of invertible matrices. Detailed discussion of the technique involved will be covered in Chapter 7.

Definition._An_n_×_n_matrix

A= 



a11 . . . a1n ..

. ...

an1 . . . ann



,

whereaij = 0 wheneveri6=j, is called a diagonal matrix of ordern.

Example 2.4.1._{The 3}_×_{3 matrices}





1 0 0 0 2 0 0 0 0



 and 



0 0 0 0 0 0 0 0 0





are both diagonal.

Given ann×nmatrixA, it is usually rather complicated to calculate

Ak₌_{A . . . A} | {z }

k .

(9)

Example 2.4.2._{Consider the 3}_×_{3 matrix}

A= 



17 −10 −5 45 −28 −15

−30 20 12



.

Suppose that we wish to calculateA98_{. It can be checked that if we take}

P = 



1 1 2

3 0 3

−2 3 0 

,

then

P−1₌





−3 2 1

−2 4/3 1

3 −5/3 −1 

.

Furthermore, if we write

D= 



−3 0 0

0 2 0

0 0 2



,

then it can be checked thatA=P DP−1_{, so that}

A98= (P DP−1₎_{. . .}_{(P DP}−1₎

| {z }

98

=P D98P−1₌_P





398 ₀ ₀

0 298 ₀

0 0 298



P−1.

This is much simpler than calculatingA98 _{directly. Note that this example is only an illustration. We}

have not discussed here how the matricesP andD are found.

2.5. Finding Inverses by Elementary Row Operations

In this section, we shall discuss a technique by which we can ﬁnd the inverse of a square matrix, if the inverse exists. Before we discuss this technique, let us recall the three elementary row operations we discussed in the previous chapter. These are: (1) interchanging two rows; (2) adding a multiple of one row to another row; and (3) multiplying one row by a non-zero constant.

Let us now consider the following example.

Example 2.5.1._{Consider the matrices}

A= 



a11 a12 a13

a21 a22 a23

a31 a32 a33



 and I3=





1 0 0 0 1 0 0 0 1



.

• Let us interchange rows 1 and 2 ofAand do likewise forI3. We obtain respectively





a21 a22 a23

a11 a12 a13

a31 a32 a33



 and 



0 1 0 1 0 0 0 0 1



(10)

Note that





a21 a22 a23

a11 a12 a13

a31 a32 a33



= 



0 1 0 1 0 0 0 0 1



 



a11 a12 a13

a21 a22 a23

a31 a32 a33



.

• Let us interchange rows 2 and 3 ofAand do likewise forI3. We obtain respectively





a11 a12 a13

a31 a32 a33

a21 a22 a23



 and 



1 0 0 0 0 1 0 1 0



.

Note that





a11 a12 a13

a31 a32 a33

a21 a22 a23



= 



1 0 0 0 0 1 0 1 0



 



a11 a12 a13

a21 a22 a23

a31 a32 a33



.

• Let us add 3 times row 1 to row 2 ofAand do likewise forI3. We obtain respectively





a11 a12 a13

3a11+a21 3a12+a22 3a13+a23

a31 a32 a33



 and 



1 0 0 3 1 0 0 0 1



.

Note that 



a11 a12 a13

3a11+a21 3a12+a22 3a13+a23

a31 a32 a33



= 



1 0 0 3 1 0 0 0 1



 



a11 a12 a13

a21 a22 a23

a31 a32 a33



.

• Let us add−2 times row 3 to row 1 ofA and do likewise forI3. We obtain respectively





−2a31+a11 −2a32+a12 −2a33+a13

a21 a22 a23

a31 a32 a33



 and 



1 0 −2

0 1 0

0 0 1



.

Note that





−2a31+a11 −2a32+a12 −2a33+a13

a21 a22 a23

a31 a32 a33



= 



1 0 −2

0 1 0

0 0 1



 



a11 a12 a13

a21 a22 a23

a31 a32 a33



.

• Let us multiply row 2 ofAby 5 and do likewise forI3. We obtain respectively





a11 a12 a13

5a21 5a22 5a23

a31 a32 a33



 and 



1 0 0 0 5 0 0 0 1



.

Note that





a11 a12 a13

5a21 5a22 5a23

a31 a32 a33



= 



1 0 0 0 5 0 0 0 1



 



a11 a12 a13

a21 a22 a23

a31 a32 a33



.

• Let us multiply row 3 ofAby−1 and do likewise forI3. We obtain respectively





a11 a12 a13

a21 a22 a23 −a31 −a32 −a33



 and 



1 0 0

0 1 0

0 0 −1 

(11)

Note that





a11 a12 a13

a21 a22 a23 −a31 −a32 −a33



= 



1 0 0

0 1 0

0 0 −1 

 



a11 a12 a13

a21 a22 a23

a31 a32 a33



.

Let us now consider the problem in general.

Definition._{By an elementary}_n_×_n_{matrix, we mean an}_n_×_n_{matrix obtained from}_In_{by an elementary}

row operation.

We state without proof the following important result. The interested reader may wish to construct a proof, taking into account the diﬀerent types of elementary row operations.

PROPOSITION 2L.Suppose that A is ann×n matrix, and suppose that B is obtained from A by an elementary row operation. Suppose further that E is an elementary matrix obtained from In by the same elementary row operation. Then B=EA.

We now adopt the following strategy. Consider ann×nmatrixA. Suppose that it is possible to reduce the matrix A by a sequence α1, α2, . . . , αk of elementary row operations to the identity matrix In. If

E1, E2, . . . , Ek are respectively the elementaryn×nmatrices obtained fromIn by the same elementary

row operationsα1, α2. . . , αk, then

In=Ek. . . E2E1A.

We therefore must have

A−1₌_Ek_{. . . E}

2E1=Ek. . . E2E1In.

It follows that the inverseA−1_{can be obtained from}_In_{by performing the same elementary row operations}

α1, α2, . . . , αk. Since we are performing the same elementary row operations onAandIn, it makes sense

to put them side by side. The process can then be described pictorially by

(A|In) α1

−−−→(E1A|E1In)

α2

−−−→(E2E1A|E2E1In)

α3

−−−→. . . αk

−−−→(Ek. . . E2E1A|Ek. . . E2E1In) = (In|A−1).

In other words, we consider an array with the matrixAon the left and the matrixIn on the right. We now perform elementary row operations on the array and try to reduce the left hand half to the matrix In. If we succeed in doing so, then the right hand half of the array gives the inverseA−1_.

Example 2.5.2._{Consider the matrix}

A= 



1 1 2

3 0 3

−2 3 0 

.

To ﬁndA−1_{, we consider the array}

(A|I3) =





1 1 2 1 0 0

3 0 3 0 1 0

−2 3 0 0 0 1 

(12)

We now perform elementary row operations on this array and try to reduce the left hand half to the matrix I3. Note that if we succeed, then the ﬁnal array is clearly in reduced row echelon form. We

therefore follow the same procedure as reducing an array to reduced row echelon form. Adding−3 times row 1 to row 2, we obtain





1 1 2 1 0 0

0 −3 −3 −3 1 0

−2 3 0 0 0 1



.

Adding 2 times row 1 to row 3, we obtain





1 1 2 1 0 0

0 −3 −3 −3 1 0

0 5 4 2 0 1



.

Multiplying row 3 by 3, we obtain





1 1 2 1 0 0

0 −3 −3 −3 1 0

0 15 12 6 0 3



.





1 1 2 1 0 0

0 −3 −3 −3 1 0 0 0 −3 −9 5 3



.

Multiplying row 1 by 3, we obtain





3 3 6 3 0 0

0 −3 −3 −3 1 0 0 0 −3 −9 5 3



.





3 3 0 −15 10 6

0 −3 −3 −3 1 0

0 0 −3 −9 5 3



.

Adding−1 times row 3 to row 2, we obtain





3 3 0 −15 10 6

0 −3 0 6 −4 −3

0 0 −3 −9 5 3



.





3 0 0 −9 6 3

0 −3 0 6 −4 −3

0 0 −3 −9 5 3



.

Multiplying row 1 by 1/3, we obtain





1 0 0 −3 2 1

0 −3 0 6 −4 −3

0 0 −3 −9 5 3



(13)

Multiplying row 2 by−1/3, we obtain





1 0 0 −3 2 1

0 1 0 −2 4/3 1

0 0 −3 −9 5 3



.

Multiplying row 3 by−1/3, we obtain





1 0 0 −3 2 1

0 1 0 −2 4/3 1

0 0 1 3 −5/3 −1 

.

Note now that the array is in reduced row echelon form, and that the left hand half is the identity matrix I3. It follows that the right hand half of the array represents the inverseA−1. Hence

A−1₌





−3 2 1

−2 4/3 1

3 −5/3 −1 

.

A= 

 

1 1 2 3 2 2 4 5 0 3 0 0 0 0 0 1



 .

To ﬁndA−1_{, we consider the array}

(A|I4) =



 

1 1 2 3 1 0 0 0 2 2 4 5 0 1 0 0 0 3 0 0 0 0 1 0 0 0 0 1 0 0 0 1



 .

We now perform elementary row operations on this array and try to reduce the left hand half to the matrixI4. Adding−2 times row 1 to row 2, we obtain



 

1 1 2 3 1 0 0 0

0 0 0 −1 −2 1 0 0

0 3 0 0 0 0 1 0

0 0 0 1 0 0 0 1



 .



 

1 1 2 3 1 0 0 0

0 0 0 −1 −2 1 0 0

0 3 0 0 0 0 1 0

0 0 0 0 −2 1 0 1



 .

Interchanging rows 2 and 3, we obtain



 

1 1 2 3 1 0 0 0

0 3 0 0 0 0 1 0

0 0 0 −1 −2 1 0 0

0 0 0 0 −2 1 0 1



(14)

At this point, we observe that it is impossible to reduce the left hand half of the array toI4. For those

who remain unconvinced, let us continue. Adding 3 times row 3 to row 1, we obtain



 

1 1 2 0 −5 3 0 0

0 3 0 0 0 0 1 0

0 0 0 −1 −2 1 0 0

0 0 0 0 −2 1 0 1



 .



 

1 1 2 0 −5 3 0 0

0 3 0 0 0 0 1 0

0 0 0 −1 0 0 0 −1

0 0 0 0 −2 1 0 1



 .

Multiplying row 1 by 6 (here we want to avoid fractions in the next two steps), we obtain



 

6 6 12 0 −30 18 0 0

0 3 0 0 0 0 1 0

0 0 0 −1 0 0 0 −1

0 0 0 0 −2 1 0 1



 .



 

6 6 12 0 0 3 0 −15

0 3 0 0 0 0 1 0

0 0 0 −1 0 0 0 −1

0 0 0 0 −2 1 0 1



 .



 

6 0 12 0 0 3 −2 −15

0 3 0 0 0 0 1 0

0 0 0 −1 0 0 0 −1

0 0 0 0 −2 1 0 1



 .

Multiplying row 1 by 1/6, multiplying row 2 by 1/3, multiplying row 3 by−1 and multiplying row 4 by

−1/2, we obtain



 

1 0 2 0 0 1/2 −1/3 −5/2

0 1 0 0 0 0 1/3 0

0 0 0 1 0 0 0 1

0 0 0 0 1 −1/2 0 −1/2



 .

Note now that the array is in reduced row echelon form, and that the left hand half is not the identity matrixI4. Our technique has failed. In fact, the matrixAis not invertible.

2.6. Criteria for Invertibility

Examples 2.5.2–2.5.3 raise the question of when a given matrix is invertible. In this section, we shall obtain some partial answers to this question. Our ﬁrst step here is the following simple observation.

PROPOSITION 2M.Every elementary matrix is invertible.

Proof._{Let us consider elementary row operations. Recall that these are: (1) interchanging two rows;}

(15)

These elementary row operations can clearly be reversed by elementary row operations. For (1), we interchange the two rows again. For (2), if we have originally addedctimes rowito rowj, then we can reverse this by adding −c times row i to rowj. For (3), if we have multiplied any row by a non-zero constant c, we can reverse this by multiplying the same row by the constant 1/c. Note now that each elementary matrix is obtained fromIn by an elementary row operation. The inverse of this elementary matrix is clearly the elementary matrix obtained fromInby the elementary row operation that reverses the original elementary row operation.

Suppose that an n×n matrix B can be obtained from an n×n matrixA by a ﬁnite sequence of elementary row operations. Then since these elementary row operations can be reversed, the matrixA can be obtained from the matrixB by a ﬁnite sequence of elementary row operations.

Definition._An_n_×_n_matrix_A_{is said to be row equivalent to an}_n_×_n_matrix_B _{if there exist a ﬁnite}

number of elementaryn×nmatricesE1, . . . , Ek such thatB =Ek. . . E1A.

Remark. _{Note that} _B ₌ _Ek_{. . . E}₁_A _{implies that} _A ₌ _E−1 1 . . . E−

1

k B. It follows that if A is row equivalent toB, then B is row equivalent toA. We usually say thatAandB are row equivalent.

The following result gives conditions equivalent to the invertibility of ann×nmatrixA.

PROPOSITION 2N.Suppose that

A= 



a11 . . . a1n ..

. ... an1 . . . ann



,

and that

x= 

 x1

.. . xn



 and 0= 

 0

.. . 0





aren×1matrices, wherex1, . . . , xn are variables.

(a) Suppose that the matrix A is invertible. Then the systemAx=0 of linear equations has only the trivial solution.

(b) Suppose that the systemAx=0of linear equations has only the trivial solution. Then the matrices AandIn are row equivalent.

(c) Suppose that the matricesA andIn are row equivalent. ThenA is invertible.

Proof._{(a) Suppose that}x0is a solution of the system Ax=0. Then since Ais invertible, we have

x0=Inx0= (A−1A)x0=A−1(Ax0) =A−10=0.

It follows that the trivial solution is the only solution.

(b) Note that if the system Ax =0of linear equations has only the trivial solution, then it can be reduced by elementary row operations to the system

x1= 0, . . . , xn= 0.

This is equivalent to saying that the array





a11 . . . a1n ..

. ...

an1 . . . ann

0 .. . 0



(16)

can be reduced by elementary row operations to the reduced row echelon form





1 . . . 0 ..

. ... 0 . . . 1

0 .. . 0



.

Hence the matricesAandIn are row equivalent.

(c) Suppose that the matricesAandInare row equivalent. Then there exist elementaryn×nmatrices E1, . . . , Ek such thatIn=Ek. . . E1A. By Proposition 2M, the matricesE1, . . . , Ek are all invertible, so

that

A=E−1 1 . . . E−

1

k In =E−

1 1 . . . E−

1

k is a product of invertible matrices, and is therefore itself invertible.

2.7. Consequences of Invertibility

Suppose that the matrix

A= 



a11 . . . a1n ..

. ...

an1 . . . ann





is invertible. Consider the systemAx=b, where

x= 

 x1

.. . xn



 and b= 

 b1

.. . bn





aren×1 matrices, wherex1, . . . , xn are variables andb1, . . . , bn ∈Rare arbitrary. SinceAis invertible,

let us considerx=A−1_b_{. Clearly}

Ax=A(A−1_{b) = (AA}−1_)b₌_In_b₌_b,

so thatx=A−1_b_{is a solution of the system. On the other hand, let}_x

0 be any solution of the system.

ThenAx0=b, so that

x0=Inx0= (A−1A)x0=A−1(Ax0) =A−1b.

It follows that the system has unique solution. We have proved the following important result.

PROPOSITION 2P.Suppose that

A= 



a11 . . . a1n ..

. ... an1 . . . ann



,

and that

x= 

 x1

.. . xn



 and b= 

 b1

.. . bn





are n×1 matrices, where x1, . . . , xn are variables and b1, . . . , bn ∈ R are arbitrary. Suppose further

(17)

We next attempt to study the question in the opposite direction.

PROPOSITION 2Q.Suppose that

A= 



a11 . . . a1n ..

. ... an1 . . . ann



,

and that

x= 

 x1

.. . xn



 and b= 

 b1

.. . bn





are n×1 matrices, wherex1, . . . , xn are variables. Suppose further that for every b1, . . . , bn ∈ R, the

systemAx=bof linear equations is soluble. Then the matrixAis invertible.

Proof._{Suppose that}

b1=



     1 0 .. . 0 0



    

, . . . , bn= 

     0 0 .. . 0 1



     .

In other words, for everyj= 1, . . . , n,bjis ann×1 matrix with entry 1 on rowj and entry 0 elsewhere. Now let

x1=



 x11

.. . xn1



, . . . , xn= 

 x1n

.. . xnn





denote respectively solutions of the systems of linear equations

Ax=b1, . . . , Ax=bn.

It is easy to check that

A(x1 . . . xn) = (b1 . . . bn) ;

in other words,

A 



x11 . . . x1n ..

. ...

xn1 . . . xnn



=In,

so thatAis invertible.

We can now summarize Propositions 2N, 2P and 2Q as follows.

PROPOSITION 2R.In the notation of Proposition 2N, the following four statements are equivalent: (a) The matrixA is invertible.

(b) The systemAx=0of linear equations has only the trivial solution. (c) The matrices AandIn are row equivalent.

(18)

2.8. Application to Economics

In this section, we describe brieﬂy the Leontief input-output model, where an economy is divided inton sectors.

For every i= 1, . . . , n, let xi denote the monetary value of the total output of sector i over a ﬁxed period, and let di denote the output of sector i needed to satisfy outside demand over the same ﬁxed period. Collecting togetherxi anddi fori= 1, . . . , n, we obtain the vectors

x= 

 x1

.. . xn



∈Rn and d= 

 d1

.. . dn



∈Rn,

known respectively as the production vector and demand vector of the economy.

On the other hand, each of thensectors requires material from some or all of the sectors to produce its output. For i, j = 1, . . . , n, let cij denote the monetary value of the output of sector i needed by sectorj to produce one unit of monetary value of output. For everyj= 1, . . . , n, the vector

cj= 

 

c1j .. . cnj



 ∈Rn

is known as the unit consumption vector of sectorj. Note that the column sum

c1j+. . .+cnj≤1 (5)

in order to ensure that sectorj does not make a loss. Collecting together the unit consumption vectors, we obtain the matrix

C= (c1 . . . cn) = 



c11 . . . c1n ..

. ...

cn1 . . . cnn



,

known as the consumption matrix of the economy.

Consider the matrix product

Cx= 



c11x1+. . .+c1nxn

.. .

cn1x1+. . .+cnnxn



.

For everyi= 1, . . . , n, the entryci1x1+. . .+cinxnrepresents the monetary value of the output of sector

ineeded by all the sectors to produce their output. This leads to the production equation

x=Cx+d. (6)

Here Cx represents the part of the total output that is required by the various sectors of the economy to produce the output in the ﬁrst place, anddrepresents the part of the total output that is available to satisfy outside demand.

Clearly (I−C)x=d. If the matrixI−C is invertible, then

x= (I−C)−1_d

(19)

PROPOSITION 2S.Suppose that the entries of the consumption matrix C and the demand vectord are non-negative. Suppose further that the inequality (5)holds for each column of C. Then the inverse matrix(I−C)−1 _{exists, and the production vector} _x_{= (I}₋_C)−1_d_{has non-negative entries and is the}

unique solution of the production equation(6).

Let us indulge in some heuristics. Initially, we have demandd. To produce d, we need Cdas input. To produce this extra Cd, we need C(Cd) = C2_d _{as input. To produce this extra} _C2_{d, we need}

C(C2_{d) =}_C3_d_{as input. And so on. Hence we need to produce}

d+Cd+C2d+C3d+. . .= (I+C+C2+C3+. . .)d

in total. Now it is not diﬃcult to check that for every positive integerk, we have

(I−C)(I+C+C2+C3+. . .+Ck_{) =}_I₋_Ck+1_.

If the entries ofCk+1 _{are all very small, then}

(I−C)(I+C+C2+C3+. . .+Ck)≈I, so that

(I−C)−1_≈_I₊_C₊_C2₊_C3₊_{. . .}₊_Ck_. This gives a practical way of approximating (I−C)−1_{, and also suggests that}

(I−C)−1₌_I₊_C₊_C2₊_C3₊_{. . . .}

Example 2.8.1._{An economy consists of three sectors. Their dependence on each other is summarized}

in the table below:

To produce one unit of monetary value of output in sector

1 2 3

monetary value of output required from sector 1 0.3 0.2 0.1 monetary value of output required from sector 2 0.4 0.5 0.2 monetary value of output required from sector 3 0.1 0.1 0.3

Suppose that the ﬁnal demand from sectors 1, 2 and 3 are respectively 30, 50 and 20. Then the production vector and demand vector are respectively

x= 

 x1

x2

x3



 and d= 

 d1

d2

d3



= 

 30 50 20



,

while the consumption matrix is given by

C= 



0.3 0.2 0.1 0.4 0.5 0.2 0.1 0.1 0.3



, so that I−C= 



0.7 −0.2 −0.1

−0.4 0.5 −0.2

−0.1 −0.1 0.7 

.

The production equation (I−C)x=dhas augmented matrix





0.7 −0.2 −0.1

−0.4 0.5 −0.2

−0.1 −0.1 0.7

30 50 20



, equivalent to 



7 −2 −1

−4 5 −2

−1 −1 7

300 500 200



(20)

and which can be converted to reduced row echelon form 



1 0 0 0 1 0 0 0 1

3200/27 6100/27 700/9  .

This givesx1≈119,x2≈226 andx3≈78, to the nearest integers.

2.9. Matrix Transformation on the Plane

Let A be a 2×2 matrix with real entries. A matrix transformation T : R2 → R2 can be deﬁned as follows: For everyx= (x1, x2)∈R, we writeT(x) =y, wherey= (y1, y2)∈R2 satisﬁes

y1 y2 =A x1 x2 .

Such a transformation is linear, in the sense thatT(x′₊_x′′_{) =}_T_(x′_{) +}_T_(x′_{) for every}_x′_,_x′′_∈_R2 _and

T(cx) =cT(x) for everyx∈R2 and everyc∈R. To see this, simply observe that

A

x′

1+x′′1

x′

2+x′′2

=A x′ 1 x′ 2 +A x′′ 1 x′′ 2 and A cx1 cx2 =cA x1 x2 .

We shall study linear transformations in greater detail in Chapter 8. Here we conﬁne ourselves to looking at a few simple matrix transformations on the plane.

Example 2.9.1._{The matrix}

A=

1 0

0 −1 satisﬁes A x1 x2 = 1 0

0 −1 x1 x2 = x1 −x2

for every (x1, x2)∈R2, and so represents reﬂection across thex1-axis, whereas the matrix

A=

−1 0

0 1 satisﬁes A x1 x2 =

−1 0

0 1 x1 x2 =

−x1

x2

for every (x1, x2)∈R2, and so represents reﬂection across thex2-axis. On the other hand, the matrix

A=

−1 0

0 −1 satisﬁes A x1 x2 =

−1 0

0 −1 x1

x2

=

−x1 −x2

for every (x1, x2)∈R2, and so represents reﬂection across the origin, whereas the matrix

A= 0 1 1 0 satisﬁes A x1 x2 = 0 1 1 0 x1 x2 = x2 x1

for every (x1, x2)∈R2, and so represents reﬂection across the linex1=x2. We give a summary in the

table below:

Transformation Equations Matrix

Reﬂection acrossx1-axis

n_y₁₌_x₁ y2=−x2

1 0

0 −1

Reﬂection acrossx2-axis

n_y₁₌₋_x₁ y2=x2

−1 0

0 1

Reﬂection across origin ny_y1=−x1

2=−x2

−1 0

0 −1

Reﬂection acrossx1=x2

n_y₁₌_x₂ y2=x1

0 1 1 0

(21)

Example 2.9.2._Let_k _{be a ﬁxed positive real number. The matrix}

A=

k 0 0 k

satisﬁes A

x1

x2

=

k 0 0 k

x1

x2

=

kx1

kx2

for every (x1, x2)∈ R2, and so represents a dilation if k > 1 and a contraction if 0 < k <1. On the

other hand, the matrix

A=

k 0 0 1

satisﬁes A

x1

x2

=

k 0 0 1

x1

x2

=

kx1

x2

for every (x1, x2)∈R2, and so represents an expansionn in the x1-direction ifk >1 and a compression

in thex1-direction if 0< k <1, whereas the matrix

A=

1 0 0 k

satisﬁes A

x1

x2

=

1 0 0 k

x1

x2

=

x1

kx2

for every (x1, x2)∈R2, and so represents a expansion in the x2-direction ifk >1 and a compression in

thex2-direction if 0< k <1. We give a summary in the table below:

Dilation or contraction by factork >0

y1=kx1

y2=kx2

k 0 0 k

Expansion or compression inx1-direction by factork >0

y1=kx1

y2=x2

k 0 0 1

Expansion or compression inx2-direction by factork >0

n_y₁₌_x₁ y2=kx2

1 0 0 k

Example 2.9.3._Let_k _{be a ﬁxed real number. The matrix}

A=

1 k 0 1

satisﬁes A

x1

x2

=

1 k 0 1

x1

x2

=

x1+kx2

x2

for every (x1, x2)∈R2, and so represents a shear in the x1-direction. For the casek= 1, we have the

following:

• • • •

T

(k=1)

For the casek=−1, we have the following:

• • • •

T

(22)

Similarly, the matrix

A=

1 0 k 1

satisﬁes A

x1

x2

=

1 0 k 1

x1

x2

=

x1

kx1+x2

for every (x1, x2)∈R2, and so represents a shear in the x2-direction. We give a summary in the table

below:

Shear inx1-direction

y1=x1+kx2

y2=x2

1 k 0 1

Shear inx2-direction

n_y₁₌_x₁ y2=kx1+x2

1 0 k 1

Example 2.9.4._{For anticlockwise rotation by an angle}_{θ, we have}_T_(x₁_{, x}₂_{) = (y}₁_{, y}₂_{), where}

y1+ iy2= (x1+ ix2)(cosθ+ i sinθ),

and so

y1

y2

=

cosθ −sinθ sinθ cosθ

x1

x2

.

It follows that the matrix in question is given by

A=

.

We give a summary in the table below:

Anticlockwise rotation by angleθ

y1=x1cosθ−x2sinθ

y2=x1sinθ+x2cosθ

We conclude this section by establishing the following result which reinforces the linearity of matrix transformations on the plane.

PROPOSITION 2T. Suppose that a matrix transformation T : R2 → R2 is given by an invertible matrixA. Then

(a) the image under T of a straight line is a straight line;

(b) the image under T of a straight line through the origin is a straight line through the origin; and (c) the images under T of parallel straight lines are parallel straight lines.

Proof._{Suppose that}_T_(x₁_{, x}₂_{) = (y}₁_{, y}₂_{). Since}_A_{is invertible, we have}x=A−1_{y, where}

x=

x1

x2

and y=

y1

y2

.

The equation of a straight line is given byαx1+βx2=γ or, in matrix form, by

(α β)

x1

x2

(23)

Hence

(α β)A−1

y1

y2

= (γ).

Let

(α′ _β′_{) = (}_α _β₎_A−1_.

Then

(α′ _β′₎

y1

y2

= (γ).

In other words, the image underT of the straight lineαx1+βx2=γ isα′y1+β′y2=γ, clearly another

straight line. This proves (a). To prove (b), note that straight lines through the origin correspond to γ = 0. To prove (c), note that parallel straight lines correspond to diﬀerent values of γ for the same values ofαandβ.

2.10. Application to Computer Graphics

Example 2.10.1._{Consider the letter}_M _{in the diagram below:}

Following the boundary in the anticlockwise direction starting at the origin, the 12 vertices can be represented by the coordinates

0 0

,

1 0

,

1 6

,

4 0

,

7 6

,

7 0

,

8 0

,

8 8

,

7 8

,

4 2

,

1 8

,

0 8

.

Let us apply a matrix transformation to these vertices, using the matrix

A=

1 1 2

0 1

,

representing a shear in thex1-direction with factor 0.5, so that

A

x1

x2

=

x1+1₂x2

x2

(24)

Then the images of the 12 vertices are respectively

0 0

,

1 0

,

4 6

,

4 0

,

10

6

,

7 0

,

8 0

,

12

8

,

11 8

,

5 2

,

5 8

,

4 8

,

noting that

1 1₂ 0 1

0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8

=

0 1 4 4 10 7 8 12 11 5 5 4

0 0 6 0 6 0 0 8 8 2 8 8

.

In view of Proposition 2T, the image of any line segment that joins two vertices is a line segment that joins the images of the two vertices. Hence the image of the letter M under the shear looks like the following:

Next, we may wish to translate this image. However, a translation is a transformation by vector h= (h1, h2)∈R2 is of the form

y1

y2

=

x1

x2

+

h1

h2

for every (x1, x2)∈R2,

and this cannot be described by a matrix transformation on the plane. To overcome this deﬁciency, we introduce homogeneous coordinates. For every point (x1, x2) ∈ R2, we identify it with the point

(x1, x2,1)∈R3. Now we wish to translate a point (x1, x2) to (x1, x2) + (h1, h2) = (x1+h1, x2+h2), so

we attempt to ﬁnd a 3×3 matrixA∗ _{such that}





x1+h1

x2+h2

1 

=A∗ 

 x1

x2

1 

 for every (x1, x2)∈R2.

It is easy to check that 



x1+h1

x2+h2

1 

= 



1 0 h1

0 1 h2

0 0 1



 

 x1

x2

1 

 for every (x1, x2)∈R2.

It follows that using homogeneous coordinates, translation by vectorh= (h1, h2)∈R2can be described

by the matrix

A∗₌





1 0 h1

0 1 h2

0 0 1



(25)

Remark._{Consider a matrix transformation}_T _:_R2_→_R2 _{on the plane given by a matrix}

A=

a11 a12

a21 a22

.

Suppose thatT(x1, x2) = (y1, y2). Then

y1

y2

=A

x1

x2

=

a11 a12

a21 a22

x1

x2

.

Under homogeneous coordinates, the image of the point (x1, x2,1) is now (y1, y2,1). Note that



 y1

y2

1 

= 



a11 a12 0

a21 a22 0

0 0 1



 

 x1

x2

1 

.

It follows that homogeneous coordinates can also be used to study all the matrix transformations we have discussed in Section 2.9. By moving over to homogeneous coordinates, we simply replace the 2×2 matrixAby the 3×3 matrix

A∗₌

A 0

0 1

.

Example 2.10.2._{Returning to Example 2.10.1 of the letter}_M_{, the 12 vertices are now represented by}

homogeneous coordinates, put in an array in the form 



0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8 1 1 1 1 1 1 1 1 1 1 1 1



.

Then the 2×2 matrix

A=

1 1₂ 0 1

is now replaced by the 3×3 matrix

A∗₌



 1 1

2 0

0 1 0

0 0 1



.

Note that

A∗





0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8 1 1 1 1 1 1 1 1 1 1 1 1





= 



1 1₂ 0

0 1 0

0 0 1



 



0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8 1 1 1 1 1 1 1 1 1 1 1 1





= 



0 1 4 4 10 7 8 12 11 5 5 4

0 0 6 0 6 0 0 8 8 2 8 8

1 1 1 1 1 1 1 1 1 1 1 1



.

Next, let us consider a translation by the vector (2,3). The matrix under homogeneous coordinates for this translation is given by

B∗₌





1 0 2 0 1 3 0 0 1



(26)

Note that

B∗_A∗





0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8 1 1 1 1 1 1 1 1 1 1 1 1





= 



1 0 2 0 1 3 0 0 1



 



0 1 4 4 10 7 8 12 11 5 5 4

0 0 6 0 6 0 0 8 8 2 8 8

1 1 1 1 1 1 1 1 1 1 1 1





= 



2 3 6 6 12 9 10 14 13 7 7 6

3 3 9 3 9 3 3 11 11 5 11 11

1 1 1 1 1 1 1 1 1 1 1 1



,

giving rise to coordinates inR2, displayed as an array

2 3 6 6 12 9 10 14 13 7 7 6

3 3 9 3 9 3 3 11 11 5 11 11

Hence the image of the letterM under the shear followed by translation looks like the following:

Example 2.10.3. _{Under homogeneous coordinates, the transformation representing a reﬂection across}

the x1-axis, followed by a shear by factor 2 in the x1-direction, followed by anticlockwise rotation by

90◦_{, and followed by translation by vector (2,}₋_{1), has matrix}





1 0 2

0 1 −1

0 0 1



 



0 −1 0

1 0 0

0 0 1



 



1 2 0 0 1 0 0 0 1



 



1 0 0

0 −1 0

0 0 1



= 



0 1 2

1 −2 −1

0 0 1



.

2.11. Complexity of a Non-Homogeneous System

(27)

One way of solving the systemAx=bis to write down the augmented matrix





a11 . . . a1n ..

. ...

an1 . . . ann

b1

.. . bn



, (7)

and then convert it to reduced row echelon form by elementary row operations.

The ﬁrst step is to reduce it to row echelon form:

(I) First of all, we may need to interchange two rows in order to ensure that the top left entry in the array is non-zero. This requiresn+ 1 operations.

(II) Next, we need to multiply the new ﬁrst row by a constant in order to make the top left pivot entry equal to 1. This requiresn+ 1 operations, and the array now looks like



  

1 a12 . . . a1n a21 a22 . . . a2n

..

. ... ... an1 an2 . . . ann

b1

b2

.. . bn



  .

Note that we are abusing notation somewhat, as the entrya12 here, for example, may well be diﬀerent

from the entrya12 in the augmented matrix (7).

(III) For each rowi= 2, . . . , n, we now multiply the ﬁrst row by−ai1 and then add to rowi. This

requires 2(n−1)(n+ 1) operations, and the array now looks like



  

1 a12 . . . a1n 0 a22 . . . a2n ..

. ... ... 0 an2 . . . ann

b1

b2

.. . bn



 

. (8)

(IV) In summary, to proceed from the form (7) to the form (8), the number of operations required is at most 2(n+ 1) + 2(n−1)(n+ 1) = 2n(n+ 1).

(V) Our next task is to convert the smaller array





a22 . . . a2n ..

. ...

an2 . . . ann

b2

.. . bn





to an array that looks like



  

1 a23 . . . a2n 0 a33 . . . a3n ..

. ... ... 0 an3 . . . ann

b2

b3

.. . bn



  .

These have one row and one column fewer than the arrays (7) and (8), and the number of operations required is at most 2m(m+ 1), wherem=n−1. We continue in this way systematically to reach row echelon form, and conclude that the number of operations required to convert the augmented matrix (7) to row echelon form is at most

n X

m=1

2m(m+ 1)≈2

3n

(28)

The next step is to convert the row echelon form to reduced row echelon form. This is simpler, as many entries are now zero. It can be shown that the number of operations required is bounded by something like 2n2 _{– indeed, by something like} _n2 _{if one analyzes the problem more carefully. In any}

case, these estimates are insigniﬁcant compared to the estimate 2₃n3earlier.

We therefore conclude that the number of operations required to solve the systemAx=bby reducing the augmented matrix to reduced row echelon form is bounded by something like 2₃n3 _when_n_{is large.}

Another way of solving the systemAx=bis to ﬁrst ﬁnd the inverse matrixA−1_{. This may involve}

converting the array





a11 . . . a1n ..

. ...

an1 . . . ann

1 . ..

1 



to reduced row echelon form by elementary row operations. It can be shown that the number of operations required is something like 2n3_{, so this is less eﬃcient than our ﬁrst method.}

2.12. Matrix Factorization

In some situations, we may need to solve systems of linear equations of the formAx=b, with the same coefficient matrixA but for many different vectorsb. IfA is an invertible square matrix, then we can find its inverseA−1 _{and then compute} _A−1_b_{for each vector}_{b. However, the matrix}_A _{may not be a}

square matrix, and we may have to convert the augmented matrix to reduced row echelon form.

In this section, we describe a way for solving this problem in a more efficient way. To describe this, we first need a definition.

Definition. _{A rectangular array of numbers is said to be in quasi row echelon form if the following}

conditions are satisﬁed:

(1) The left-most non-zero entry of any non-zero row is called a pivot entry. It is not necessary for its value to be equal to 1.

(2) All zero rows are grouped together at the bottom of the array.

(3) The pivot entry of a non-zero row occurring lower in the array is to the right of the pivot entry of a non-zero row occurring higher in the array.

In other words, the array looks like row echelon form in shape, except that the pivot entries do not have to be equal to 1.

We consider ﬁrst of all a special case.

PROPOSITION 2U.Suppose that an m×nmatrixA can be converted to quasi row echelon form by elementary row operations but without interchanging any two rows. ThenA=LU, whereLis anm×m lower triangular matrix with diagonal entries all equal to1 andU is a quasi row echelon form of A.

Sketch of Proof._{Recall that applying an elementary row operation to an}_m_×_n_{matrix corresponds}

(29)

such elementary matrices unit lower triangular. If anm×n matrix A can be reduced in this way to quasi row echelon formU, then

U =Ek. . . E2E1A,

where the elementary matrices E1, E2, . . . , Ek are all unit lower triangular. Let L = (Ek. . . E2E1)−1.

Then A= LU. It can be shown that products and inverses of unit lower triangular matrices are also unit lower triangular. HenceLis a unit lower triangular matrix as required.

IfAx=bandA=LU, thenL(Ux) =b. Writingy=Ux, we have

Ly=b and Ux=y.

It follows that the problem of solving the systemAx=bcorresponds to ﬁrst solving the systemLy=b and then solving the systemUx=y. Both of these systems are easy to solve since bothLand U have many zero entries. It remains to ﬁndLandU.

If we reduce the matrixAto quasi row echelon form by only performing the elementary row operation of adding a multiple of a row higher in the array to another row lower in the array, then U can be taken as the quasi row echelon form resulting from this. It remains to ﬁnd L. However, note that L= (Ek. . . E2E1)−1, whereU =Ek. . . E2E1A, and so

I=Ek. . . E2E1L.

This means that the very elementary row operations that convert A to U will convert L to I. We therefore wish to create a matrixL such that this is satisﬁed. It is simplest to illustrate the technique by an example.

A= 

 

2 −1 2 −2 3

4 1 6 −5 8

2 −10 −4 8 −5 2 −13 −6 16 −5



 .

The entry 2 in row 1 and column 1 is a pivot entry, and column 1 is a pivot column. Adding−2 times row 1 to row 2, adding−1 times row 1 to row 3, and adding−1 times row 1 to row 4, we obtain



 

2 −1 2 −2 3

0 3 2 −1 2

0 −9 −6 10 −8 0 −12 −8 18 −8



 .

Note that the same three elementary row operations convert



 

1 0 0 0 2 1 0 0

1 ∗ 1 0

1 ∗ ∗ 1



  to



 

1 0 0 0 0 1 0 0

0 ∗ 1 0

0 ∗ ∗ 1



 .

Next, the entry 3 in row 2 and column 2 is a pivot entry, and column 2 is a pivot column. Adding 3 times row 2 to row 3, and adding 4 times row 2 to row 4, we obtain



 

2 −1 2 −2 3

0 3 2 −1 2

0 0 0 7 −2

0 0 0 14 0



(30)

Note that the same two elementary row operations convert



 

1 0 0 0

0 1 0 0

0 −3 1 0 0 −4 ∗ 1

   to   

1 0 0 0 0 1 0 0 0 0 1 0

0 0 ∗ 1



 .

Next, the entry 7 in row 3 and column 4 is a pivot entry, and column 4 is a pivot column. Adding−2 times row 3 to row 4, we obtain the quasi row echelon form

U = 

 

2 −1 2 −2 3

0 3 2 −1 2

0 0 0 7 −2

0 0 0 0 4



 ,

where the entry 4 in row 4 and column 5 is a pivot entry, and column 5 is a pivot column. Note that the same elementary row operation converts



 

1 0 0 0 0 1 0 0 0 0 1 0 0 0 2 1

   to   

1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1



 .

Now observe that if we take

L= 

 

1 0 0 0

2 1 0 0

1 −3 1 0 1 −4 2 1



 ,

thenLcan be converted toI4 by the same elementary operations that convertA toU.

The strategy is now clear. Every time we ﬁnd a new pivot, we note its value and the entries below it. The lower triangular entries ofLare formed by these columns with each column divided by the value of the pivot entry in that column.

Example 2.12.2._{Let us examine our last example again. The pivot columns at the time of establishing}

the pivot entries are respectively

   2 4 2 2   ,    ∗ 3 −9 −12   ,    ∗ ∗ 7 14   ,    ∗ ∗ ∗ 4   .

Dividing them respectively by the pivot entries 2, 3, 7 and 4, we obtain respectively the columns

   1 2 1 1   ,    ∗ 1 −3 −4   ,    ∗ ∗ 1 2   ,    ∗ ∗ ∗ 1   .

Note that the lower triangular entries of the matrix

L= 

 

1 0 0 0

2 1 0 0

1 −3 1 0 1 −4 2 1



 

(31)

LU FACTORIZATION ALGORITHM.

(1) Reduce the matrixA to quasi row echelon form by only performing the elementary row operation of adding a multiple of a row higher in the array to another row lower in the array. LetU be the quasi row echelon form obtained.

(2) Record any new pivot column at the time of its first recognition, and modify it by replacing any entry above the pivot entry by zero and dividing every other entry by the value of the pivot entry. (3) Let Ldenote the square matrix obtained by letting the columns be the pivot columns as modified in

step (2).

Example 2.12.3._{We wish to solve the system of linear equations}_Ax=b, where

A= 

 

3 −1 2 −4 1

−3 3 −5 5 −2

6 −4 11 −10 6

−6 8 −21 13 −9





 and b= 

 

1

−2 9

−15 

 .

Let us first apply LU factorization to the matrixA. The first pivot column is column 1, with modified version



 

1

−1 2

−2 

 .

Adding row 1 to row 2, adding−2 times row 1 to row 3, and adding 2 times row 1 to row 4, we obtain



 

3 −1 2 −4 1

0 2 −3 1 −1

0 −2 7 −2 4

0 6 −17 5 −7



 .

The second pivot column is column 2, with modiﬁed version



 

0 1

−1 3



 .

Adding row 2 to row 3, and adding−3 times row 2 to row 4, we obtain



 

3 −1 2 −4 1

0 2 −3 1 −1

0 0 4 −1 3

0 0 −8 2 −4



 .

The third pivot column is column 3, with modiﬁed version



 

0 0 1

−2 

 .

Adding 2 times row 3 to row 4, we obtain the quasi row echelon form



 

3 −1 2 −4 1

0 2 −3 1 −1

0 0 4 −1 3

0 0 0 0 2



(32)

The last pivot column is column 5, with modiﬁed version



  0 0 0 1



 .

It follows that

L= 

 

1 0 0 0

−1 1 0 0

2 −1 1 0

−2 3 −2 1





 and U= 

 

3 −1 2 −4 1

0 2 −3 1 −1

0 0 4 −1 3

0 0 0 0 2



 .

We now consider the systemLy=b, with augmented matrix



 

1 0 0 0

−1 1 0 0

2 −1 1 0

−2 3 −2 1

1

−2 9

−15 

 .

Using row 1, we obtainy1= 1. Using row 2, we obtain y2−y1=−2, so thaty2=−1. Using row 3, we

obtainy3+ 2y1−y2 = 9, so thaty3 = 6. Using row 4, we obtain y4−2y1+ 3y2−2y3=−15, so that

y4= 2. Hence

y= 

 

1

−1 6 2



 .

We next consider the systemUx=y, with augmented matrix



 

3 −1 2 −4 1

0 2 −3 1 −1

0 0 4 −1 3

0 0 0 0 2

1

−1 6 2



 .

Here the free variable isx4. Let x4 =t. Using row 4, we obtain 2x5= 2, so thatx5= 1. Using row 3,

we obtain 4x3= 6 +x4−3x5= 3 +t, so thatx3= 3₄+1₄t. Using row 2, we obtain

2x2=−1 + 3x3−x4+x5= 9₄−1₄t,

so thatx2=98− 1

8t. Using row 1, we obtain 3x1= 1 +x2−2x3+ 4x4−x5= 278t− 3

8, so thatx1= 98t− 1 8.

Hence

(x1, x2, x3, x4, x5) =

9t−1

8 , 9−t

8 , 3 +t

4 , t,1

, where t∈R.

Remarks._{(1) In practical situations, interchanging rows is usually necessary to convert a matrix}_A _to

quasi row echelon form. The technique here can be modiﬁed to produce a matrixL which is not unit lower triangular, but which can be made unit lower triangular by interchanging rows.

(2) Computing an LU factorization of ann×nmatrix takes approximately 2₃n3 _{operations. Solving}

the systemsLy=bandUx=yrequires approximately 2n2 operations.

(33)

2.13. Application to Games of Strategy

Consider a game with two players. Player R, usually known as the row player, has m possible moves, denoted byi= 1,2,3, . . . , m, while playerC, usually known as the column player, hasnpossible moves, denoted byj = 1,2,3, . . . , n. For everyi= 1,2,3, . . . , mandj= 1,2,3, . . . , n, letaij denote the payoﬀ that player C has to make to player R if player R makes move i and player C makes move j. These numbers give rise to the payoﬀ matrix

A= 



a11 . . . a1n ..

. ...

am1 . . . amn



.

The entries can be positive, negative or zero.

Suppose that for every i = 1,2,3, . . . , m, player R makes move i with probability pi, and that for everyj = 1,2,3, . . . , n, playerC makes movej with probabilityqj. Then

p1+. . .+pm= 1 and q1+. . .+qn= 1.

Assume that the players make moves independently of each other. Then for everyi= 1,2,3, . . . , mand j = 1,2,3, . . . , n, the numberpiqj represents the probability that playerR makes moveiand playerC makes movej. Then the double sum

EA(p,q) = m X

i=1

n X

j=1

aijpiqj

represents the expected payoﬀ that playerC has to make to playerR.

The matrices

p= (p1 . . . pm) and q=



 

q1

.. . qn



 

are known as the strategies of playerR and playerC respectively. Clearly the expected payoﬀ

EA(p,q) = m X

i=1

n X

j=1

aijpiqj= (p1 . . . pm)





a11 . . . a1n ..

. ...

am1 . . . amn



 

 

q1

.. . qn





=pAq.

Here we have slightly abused notation. The right hand side is a 1×1 matrix!

We now consider the following problem: Suppose thatAis fixed. Is it possible for playerR to choose a strategy pto try to maximize the expected payoff EA(p,q)? Is it possible for player C to choose a strategyqto try to minimize the expected payoffEA(p,q)?

FUNDEMENTAL THEOREM OF ZERO SUM GAMES.There exist strategiesp∗_and_q∗_such

that

EA(p∗_,_q₎_≥_EA₍_p∗_,_q∗₎_≥_EA₍_p_,_q∗₎

for every strategypof player Rand every strategy qof player C.

Remark._{The strategy}p∗_{is known as an optimal strategy for player}_R_{, and the strategy}_q∗_{is known as}

an optimal strategy for playerC. The quantityEA(p∗_,_q∗_{) is known as the value of the game. Optimal}

strategies are not necessarily unique. However, if p∗∗ _and _q∗∗ _{are another pair of optimal strategies,}