W W L CHEN
c
W W L Chen, 1982, 2008.
This chapter originates from material used by the author at Imperial College, University of London, between 1981 and 1990.
It is available free to all individuals, on the understanding that it is not to be used for financial gain,
and may be downloaded and/or photocopied, with or without permission from the author.
However, this document may not be kept on any information storage and retrieval system without permission
from the author, unless such system is not accessible to any individuals other than its owners.
Chapter 2
MATRICES
2.1. Introduction
A rectangular array of numbers of the form
a11 . . . a1n ..
. ...
am1 . . . amn
(1)
is called anm×nmatrix, withmrows andncolumns. We count rows from the top and columns from the left. Hence
(ai1 . . . ain) and
a1j .. . amj
represent respectively the i-th row and thej-th column of the matrix (1), and aij represents the entry in the matrix (1) on thei-th row andj-th column.
Example 2.1.1.Consider the 3×4 matrix
2 4 3 −1
3 1 5 2
−1 0 7 6
.
Here
( 3 1 5 2 ) and
3 5 7
represent respectively the 2-nd row and the 3-rd column of the matrix, and 5 represents the entry in the matrix on the 2-nd row and 3-rd column.
We now consider the question of arithmetic involving matrices. First of all, let us study the problem of addition. A reasonable theory can be derived from the following definition.
Definition.Suppose that the two matrices
A=
a11 . . . a1n ..
. ...
am1 . . . amn
and B=
b11 . . . b1n ..
. ...
bm1 . . . bmn
both havemrows andncolumns. Then we write
A+B =
a11+b11 . . . a1n+b1n ..
. ...
am1+bm1 . . . amn+bmn
and call this the sum of the two matricesAandB.
Example 2.1.2.Suppose that
A=
2 4 3 −1
3 1 5 2
−1 0 7 6
and B=
1 2 −2 7
0 2 4 −1
−2 1 3 3
.
Then
A+B =
2 + 1 4 + 2 3−2 −1 + 7 3 + 0 1 + 2 5 + 4 2−1
−1−2 0 + 1 7 + 3 6 + 3
=
3 6 1 6
3 3 9 1
−3 1 10 9
.
Example 2.1.3.We do not have a definition for “adding” the matrices
2 4 3 −1
−1 0 7 6
and
2 4 3
3 1 5
−1 0 7
.
PROPOSITION 2A. (MATRIX ADDITION) Suppose that A, B, C are m×n matrices. Suppose further that O represents them×n matrix with all entries zero. Then
(a) A+B =B+A;
(b) A+ (B+C) = (A+B) +C; (c) A+O=A; and
(d) there is an m×nmatrixA′ such that A+A′ =O.
Proof.Parts (a)–(c) are easy consequences of ordinary addition, as matrix addition is simply entry-wise
addition. For part (d), we can consider the matrixA′ obtained fromAby multiplying each entry of A
by−1.
The theory of multiplication is rather more complicated, and includes multiplication of a matrix by a scalar as well as multiplication of two matrices.
Definition.Suppose that the matrix
A=
a11 . . . a1n ..
. ...
am1 . . . amn
hasmrows andncolumns, and thatc∈R. Then we write
cA=
ca11 . . . ca1n ..
. ...
cam1 . . . camn
and call this the product of the matrixAby the scalarc.
Example 2.1.4.Suppose that
A=
2 4 3 −1
3 1 5 2
−1 0 7 6
.
Then
2A=
4 8 6 −2
6 2 10 4
−2 0 14 12
.
PROPOSITION 2B.(MULTIPLICATION BY SCALAR)Suppose that A, Barem×nmatrices, and that c, d∈R. Suppose further thatO represents them×nmatrix with all entries zero. Then
(a) c(A+B) =cA+cB; (b) (c+d)A=cA+dA; (c) 0A=O; and
(d) c(dA) = (cd)A.
Proof.These are all easy consequences of ordinary multiplication, as multiplication by scalarcis simply
entry-wise multiplication by the numberc.
The question of multiplication of two matrices is rather more complicated. To motivate this, let us consider the representation of a system of linear equations
a11x1+. . .+ a1nxn = b1,
.. .
am1x1+. . .+amnxn =bm,
(2)
in the formAx=b, where
A=
a11 . . . a1n ..
. ...
am1 . . . amn
and b=
b1
.. . bm
(3)
represent the coefficients and
x=
x1
.. . xn
represents the variables. This can be written in full matrix notation by
a11 . . . a1n ..
. ...
am1 . . . amn
x1
.. . xn
=
b1
.. . bm
.
Can you work out the meaning of this representation?
Now let us define matrix multiplication more formally.
Definition.Suppose that
A=
a11 . . . a1n ..
. ...
am1 . . . amn
and B=
b11 . . . b1p ..
. ...
bn1 . . . bnp
are respectively anm×n matrix and ann×pmatrix. Then the matrix productAB is given by the m×pmatrix
AB=
q11 . . . q1p ..
. ...
qm1 . . . qmp
,
where for everyi= 1, . . . , mandj = 1, . . . , p, we have
qij = n X
k=1
aikbkj=ai1b1j+. . .+ainbnj.
Remark.Note first of all that the number of columns of the first matrix must be equal to the number
of rows of the second matrix. On the other hand, for a simple way to work outqij, the entry in thei-th row andj-th column ofAB, we observe that thei-th row ofAand thej-th column ofBare respectively
(ai1 . . . ain) and
b1j .. . bnj
.
We now multiply the corresponding entries – fromai1 withb1j, and so on, untilainwithbnj – and then
add these products to obtainqij.
Example 2.1.5.Consider the matrices
A=
2 4 3 −1
3 1 5 2
−1 0 7 6
and B=
1 4
2 3
0 −2
3 1
.
Note thatA is a 3×4 matrix andB is a 4×2 matrix, so that the productAB is a 3×2 matrix. Let us calculate the product
AB=
q11 q12
q21 q22
q31 q32
Consider first of allq11. To calculate this, we need the 1-st row ofAand the 1-st column ofB, so let us
cover up all unnecessary information, so that
2 4 3 −1
× × × ×
× × × ×
1 ×
2 ×
0 ×
3 ×
=
q11 ×
× ×
× ×
.
From the definition, we have
q11= 2·1 + 4·2 + 3·0 + (−1)·3 = 2 + 8 + 0−3 = 7.
Consider nextq12. To calculate this, we need the 1-st row ofAand the 2-nd column ofB, so let us cover
up all unnecessary information, so that
2 4 3 −1
× × × ×
× × × ×
× 4
× 3
× −2
× 1
=
× q12
× ×
× ×
.
From the definition, we have
q12= 2·4 + 4·3 + 3·(−2) + (−1)·1 = 8 + 12−6−1 = 13.
Consider nextq21. To calculate this, we need the 2-nd row ofAand the 1-st column ofB, so let us cover
up all unnecessary information, so that
× × × ×
3 1 5 2
× × × ×
1 ×
2 ×
0 ×
3 ×
=
× ×
q21 ×
× ×
.
From the definition, we have
q21= 3·1 + 1·2 + 5·0 + 2·3 = 3 + 2 + 0 + 6 = 11.
Consider next q22. To calculate this, we need the 2-nd row of A and the 2-nd column of B, so let us
cover up all unnecessary information, so that
× × × ×
3 1 5 2
× × × ×
× 4
× 3
× −2
× 1
=
× ×
× q22
× ×
.
From the definition, we have
q22= 3·4 + 1·3 + 5·(−2) + 2·1 = 12 + 3−10 + 2 = 7.
Consider nextq31. To calculate this, we need the 3-rd row ofAand the 1-st column ofB, so let us cover
up all unnecessary information, so that
× × × ×
× × × ×
−1 0 7 6
1 ×
2 ×
0 ×
3 ×
=
× ×
× ×
q31 ×
.
From the definition, we have
Consider finallyq32. To calculate this, we need the 3-rd row ofA and the 2-nd column ofB, so let us
cover up all unnecessary information, so that
× × × ×
× × × ×
−1 0 7 6
× 4
× 3
× −2
× 1
=
× ×
× ×
× q32
.
From the definition, we have
q32= (−1)·4 + 0·3 + 7·(−2) + 6·1 =−4 + 0 +−14 + 6 =−12.
We therefore conclude that
AB=
2 4 3 −1
3 1 5 2
−1 0 7 6
1 4
2 3
0 −2
3 1
=
7 13
11 7
17 −12
.
Example 2.1.6.Consider again the matrices
A=
2 4 3 −1
3 1 5 2
−1 0 7 6
and B=
1 4
2 3
0 −2
3 1
.
Note that B is a 4×2 matrix and A is a 3×4 matrix, so that we do not have a definition for the “product”BA.
We leave the proofs of the following results as exercises for the interested reader.
PROPOSITION 2C.(ASSOCIATIVE LAW)Suppose thatAis anm×nmatrix,B is ann×pmatrix andC is an p×r matrix. ThenA(BC) = (AB)C.
PROPOSITION 2D.(DISTRIBUTIVE LAWS)
(a) Suppose that Ais anm×nmatrix andB andC aren×pmatrices. ThenA(B+C) =AB+AC. (b) Suppose thatAandB arem×nmatrices andC is ann×pmatrix. Then(A+B)C=AC+BC.
PROPOSITION 2E.Suppose thatAis anm×nmatrix,B is ann×pmatrix, and thatc∈R. Then c(AB) = (cA)B=A(cB).
2.2. Systems of Linear Equations
Note that the system (2) of linear equations can be written in matrix form as
Ax=b,
where the matricesA,xandbare given by (3) and (4). In this section, we shall establish the following important result.
Proof.Clearly the system (2) has either no solution, exactly one solution, or more than one solution.
It remains to show that if the system (2) has two distinct solutions, then it must have infinitely many solutions. Suppose thatx=uandx=vrepresent two distinct solutions. Then
Au=b and Av=b,
so that
A(u−v) =Au−Av=b−b=0,
where0is the zerom×1 matrix. It now follows that for everyc∈R, we have
A(u+c(u−v)) =Au+A(c(u−v)) =Au+c(A(u−v)) =b+c0=b,
so thatx=u+c(u−v) is a solution for everyc∈R. Clearly we have infinitely many solutions.
2.3. Inversion of Matrices
For the remainder of this chapter, we shall deal with square matrices, those where the number of rows equals the number of columns.
Definition.Then×nmatrix
In=
a11 . . . a1n ..
. ...
an1 . . . ann
,
where
aij =
1 if i=j, 0 if i6=j,
is called the identity matrix of ordern.
Remark.Note that
I1= ( 1 ) and I4=
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
.
The following result is relatively easy to check. It shows that the identity matrixInacts as the identity for multiplication ofn×nmatrices.
PROPOSITION 2G.For every n×nmatrixA, we have AIn=InA=A.
This raises the following question: Given ann×nmatrixA, is it possible to find anothern×nmatrix B such thatAB=BA=In?
Definition. Ann×n matrix A is said to be invertible if there exists an n×n matrix B such that
AB=BA=In. In this case, we say thatB is the inverse ofAand write B=A−1.
PROPOSITION 2H.Suppose that Ais an invertiblen×n matrix. Then its inverseA−1 is unique.
Proof.Suppose thatB satisfies the requirements for being the inverse ofA. ThenAB=BA=In. It
follows that
A−1=A−1In =A−1(AB) = (A−1A)B=InB =B.
Hence the inverseA−1 is unique.
PROPOSITION 2J.Suppose thatA andB are invertiblen×nmatrices. Then (AB)−1=B−1A−1.
Proof.In view of the uniqueness of inverse, it is sufficient to show thatB−1A−1 satisfies the
require-ments for being the inverse ofAB. Note that
(AB)(B−1A−1) =A(B(B−1A−1)) =A((BB−1)A−1) =A(InA−1) =AA−1=In
and
(B−1A−1)(AB) =B−1(A−1(AB)) =B−1((A−1A)B) =B−1(InB) =B−1B=In
as required.
PROPOSITION 2K.Suppose that Ais an invertiblen×n matrix. Then(A−1)−1=A.
Proof.Note that both (A−1)−1 andAsatisfy the requirements for being the inverse ofA−1. Equality
follows from the uniqueness of inverse.
2.4. Application to Matrix Multiplication
In this section, we shall discuss an application of invertible matrices. Detailed discussion of the technique involved will be covered in Chapter 7.
Definition.Ann×nmatrix
A=
a11 . . . a1n ..
. ...
an1 . . . ann
,
whereaij = 0 wheneveri6=j, is called a diagonal matrix of ordern.
Example 2.4.1.The 3×3 matrices
1 0 0 0 2 0 0 0 0
and
0 0 0 0 0 0 0 0 0
are both diagonal.
Given ann×nmatrixA, it is usually rather complicated to calculate
Ak=A . . . A | {z }
k .
Example 2.4.2.Consider the 3×3 matrix
A=
17 −10 −5 45 −28 −15
−30 20 12
.
Suppose that we wish to calculateA98. It can be checked that if we take
P =
1 1 2
3 0 3
−2 3 0
,
then
P−1=
−3 2 1
−2 4/3 1
3 −5/3 −1
.
Furthermore, if we write
D=
−3 0 0
0 2 0
0 0 2
,
then it can be checked thatA=P DP−1, so that
A98= (P DP−1). . .(P DP−1)
| {z }
98
=P D98P−1=P
398 0 0
0 298 0
0 0 298
P−1.
This is much simpler than calculatingA98 directly. Note that this example is only an illustration. We
have not discussed here how the matricesP andD are found.
2.5. Finding Inverses by Elementary Row Operations
In this section, we shall discuss a technique by which we can find the inverse of a square matrix, if the inverse exists. Before we discuss this technique, let us recall the three elementary row operations we discussed in the previous chapter. These are: (1) interchanging two rows; (2) adding a multiple of one row to another row; and (3) multiplying one row by a non-zero constant.
Let us now consider the following example.
Example 2.5.1.Consider the matrices
A=
a11 a12 a13
a21 a22 a23
a31 a32 a33
and I3=
1 0 0 0 1 0 0 0 1
.
• Let us interchange rows 1 and 2 ofAand do likewise forI3. We obtain respectively
a21 a22 a23
a11 a12 a13
a31 a32 a33
and
0 1 0 1 0 0 0 0 1
Note that
a21 a22 a23
a11 a12 a13
a31 a32 a33
=
0 1 0 1 0 0 0 0 1
a11 a12 a13
a21 a22 a23
a31 a32 a33
.
• Let us interchange rows 2 and 3 ofAand do likewise forI3. We obtain respectively
a11 a12 a13
a31 a32 a33
a21 a22 a23
and
1 0 0 0 0 1 0 1 0
.
Note that
a11 a12 a13
a31 a32 a33
a21 a22 a23
=
1 0 0 0 0 1 0 1 0
a11 a12 a13
a21 a22 a23
a31 a32 a33
.
• Let us add 3 times row 1 to row 2 ofAand do likewise forI3. We obtain respectively
a11 a12 a13
3a11+a21 3a12+a22 3a13+a23
a31 a32 a33
and
1 0 0 3 1 0 0 0 1
.
Note that
a11 a12 a13
3a11+a21 3a12+a22 3a13+a23
a31 a32 a33
=
1 0 0 3 1 0 0 0 1
a11 a12 a13
a21 a22 a23
a31 a32 a33
.
• Let us add−2 times row 3 to row 1 ofA and do likewise forI3. We obtain respectively
−2a31+a11 −2a32+a12 −2a33+a13
a21 a22 a23
a31 a32 a33
and
1 0 −2
0 1 0
0 0 1
.
Note that
−2a31+a11 −2a32+a12 −2a33+a13
a21 a22 a23
a31 a32 a33
=
1 0 −2
0 1 0
0 0 1
a11 a12 a13
a21 a22 a23
a31 a32 a33
.
• Let us multiply row 2 ofAby 5 and do likewise forI3. We obtain respectively
a11 a12 a13
5a21 5a22 5a23
a31 a32 a33
and
1 0 0 0 5 0 0 0 1
.
Note that
a11 a12 a13
5a21 5a22 5a23
a31 a32 a33
=
1 0 0 0 5 0 0 0 1
a11 a12 a13
a21 a22 a23
a31 a32 a33
.
• Let us multiply row 3 ofAby−1 and do likewise forI3. We obtain respectively
a11 a12 a13
a21 a22 a23 −a31 −a32 −a33
and
1 0 0
0 1 0
0 0 −1
Note that
a11 a12 a13
a21 a22 a23 −a31 −a32 −a33
=
1 0 0
0 1 0
0 0 −1
a11 a12 a13
a21 a22 a23
a31 a32 a33
.
Let us now consider the problem in general.
Definition.By an elementaryn×nmatrix, we mean ann×nmatrix obtained fromInby an elementary
row operation.
We state without proof the following important result. The interested reader may wish to construct a proof, taking into account the different types of elementary row operations.
PROPOSITION 2L.Suppose that A is ann×n matrix, and suppose that B is obtained from A by an elementary row operation. Suppose further that E is an elementary matrix obtained from In by the same elementary row operation. Then B=EA.
We now adopt the following strategy. Consider ann×nmatrixA. Suppose that it is possible to reduce the matrix A by a sequence α1, α2, . . . , αk of elementary row operations to the identity matrix In. If
E1, E2, . . . , Ek are respectively the elementaryn×nmatrices obtained fromIn by the same elementary
row operationsα1, α2. . . , αk, then
In=Ek. . . E2E1A.
We therefore must have
A−1=Ek. . . E
2E1=Ek. . . E2E1In.
It follows that the inverseA−1can be obtained fromInby performing the same elementary row operations
α1, α2, . . . , αk. Since we are performing the same elementary row operations onAandIn, it makes sense
to put them side by side. The process can then be described pictorially by
(A|In) α1
−−−→(E1A|E1In)
α2
−−−→(E2E1A|E2E1In)
α3
−−−→. . . αk
−−−→(Ek. . . E2E1A|Ek. . . E2E1In) = (In|A−1).
In other words, we consider an array with the matrixAon the left and the matrixIn on the right. We now perform elementary row operations on the array and try to reduce the left hand half to the matrix In. If we succeed in doing so, then the right hand half of the array gives the inverseA−1.
Example 2.5.2.Consider the matrix
A=
1 1 2
3 0 3
−2 3 0
.
To findA−1, we consider the array
(A|I3) =
1 1 2 1 0 0
3 0 3 0 1 0
−2 3 0 0 0 1
We now perform elementary row operations on this array and try to reduce the left hand half to the matrix I3. Note that if we succeed, then the final array is clearly in reduced row echelon form. We
therefore follow the same procedure as reducing an array to reduced row echelon form. Adding−3 times row 1 to row 2, we obtain
1 1 2 1 0 0
0 −3 −3 −3 1 0
−2 3 0 0 0 1
.
Adding 2 times row 1 to row 3, we obtain
1 1 2 1 0 0
0 −3 −3 −3 1 0
0 5 4 2 0 1
.
Multiplying row 3 by 3, we obtain
1 1 2 1 0 0
0 −3 −3 −3 1 0
0 15 12 6 0 3
.
Adding 5 times row 2 to row 3, we obtain
1 1 2 1 0 0
0 −3 −3 −3 1 0 0 0 −3 −9 5 3
.
Multiplying row 1 by 3, we obtain
3 3 6 3 0 0
0 −3 −3 −3 1 0 0 0 −3 −9 5 3
.
Adding 2 times row 3 to row 1, we obtain
3 3 0 −15 10 6
0 −3 −3 −3 1 0
0 0 −3 −9 5 3
.
Adding−1 times row 3 to row 2, we obtain
3 3 0 −15 10 6
0 −3 0 6 −4 −3
0 0 −3 −9 5 3
.
Adding 1 times row 2 to row 1, we obtain
3 0 0 −9 6 3
0 −3 0 6 −4 −3
0 0 −3 −9 5 3
.
Multiplying row 1 by 1/3, we obtain
1 0 0 −3 2 1
0 −3 0 6 −4 −3
0 0 −3 −9 5 3
Multiplying row 2 by−1/3, we obtain
1 0 0 −3 2 1
0 1 0 −2 4/3 1
0 0 −3 −9 5 3
.
Multiplying row 3 by−1/3, we obtain
1 0 0 −3 2 1
0 1 0 −2 4/3 1
0 0 1 3 −5/3 −1
.
Note now that the array is in reduced row echelon form, and that the left hand half is the identity matrix I3. It follows that the right hand half of the array represents the inverseA−1. Hence
A−1=
−3 2 1
−2 4/3 1
3 −5/3 −1
.
Example 2.5.3.Consider the matrix
A=
1 1 2 3 2 2 4 5 0 3 0 0 0 0 0 1
.
To findA−1, we consider the array
(A|I4) =
1 1 2 3 1 0 0 0 2 2 4 5 0 1 0 0 0 3 0 0 0 0 1 0 0 0 0 1 0 0 0 1
.
We now perform elementary row operations on this array and try to reduce the left hand half to the matrixI4. Adding−2 times row 1 to row 2, we obtain
1 1 2 3 1 0 0 0
0 0 0 −1 −2 1 0 0
0 3 0 0 0 0 1 0
0 0 0 1 0 0 0 1
.
Adding 1 times row 2 to row 4, we obtain
1 1 2 3 1 0 0 0
0 0 0 −1 −2 1 0 0
0 3 0 0 0 0 1 0
0 0 0 0 −2 1 0 1
.
Interchanging rows 2 and 3, we obtain
1 1 2 3 1 0 0 0
0 3 0 0 0 0 1 0
0 0 0 −1 −2 1 0 0
0 0 0 0 −2 1 0 1
At this point, we observe that it is impossible to reduce the left hand half of the array toI4. For those
who remain unconvinced, let us continue. Adding 3 times row 3 to row 1, we obtain
1 1 2 0 −5 3 0 0
0 3 0 0 0 0 1 0
0 0 0 −1 −2 1 0 0
0 0 0 0 −2 1 0 1
.
Adding−1 times row 4 to row 3, we obtain
1 1 2 0 −5 3 0 0
0 3 0 0 0 0 1 0
0 0 0 −1 0 0 0 −1
0 0 0 0 −2 1 0 1
.
Multiplying row 1 by 6 (here we want to avoid fractions in the next two steps), we obtain
6 6 12 0 −30 18 0 0
0 3 0 0 0 0 1 0
0 0 0 −1 0 0 0 −1
0 0 0 0 −2 1 0 1
.
Adding−15 times row 4 to row 1, we obtain
6 6 12 0 0 3 0 −15
0 3 0 0 0 0 1 0
0 0 0 −1 0 0 0 −1
0 0 0 0 −2 1 0 1
.
Adding−2 times row 2 to row 1, we obtain
6 0 12 0 0 3 −2 −15
0 3 0 0 0 0 1 0
0 0 0 −1 0 0 0 −1
0 0 0 0 −2 1 0 1
.
Multiplying row 1 by 1/6, multiplying row 2 by 1/3, multiplying row 3 by−1 and multiplying row 4 by
−1/2, we obtain
1 0 2 0 0 1/2 −1/3 −5/2
0 1 0 0 0 0 1/3 0
0 0 0 1 0 0 0 1
0 0 0 0 1 −1/2 0 −1/2
.
Note now that the array is in reduced row echelon form, and that the left hand half is not the identity matrixI4. Our technique has failed. In fact, the matrixAis not invertible.
2.6. Criteria for Invertibility
Examples 2.5.2–2.5.3 raise the question of when a given matrix is invertible. In this section, we shall obtain some partial answers to this question. Our first step here is the following simple observation.
PROPOSITION 2M.Every elementary matrix is invertible.
Proof.Let us consider elementary row operations. Recall that these are: (1) interchanging two rows;
These elementary row operations can clearly be reversed by elementary row operations. For (1), we interchange the two rows again. For (2), if we have originally addedctimes rowito rowj, then we can reverse this by adding −c times row i to rowj. For (3), if we have multiplied any row by a non-zero constant c, we can reverse this by multiplying the same row by the constant 1/c. Note now that each elementary matrix is obtained fromIn by an elementary row operation. The inverse of this elementary matrix is clearly the elementary matrix obtained fromInby the elementary row operation that reverses the original elementary row operation.
Suppose that an n×n matrix B can be obtained from an n×n matrixA by a finite sequence of elementary row operations. Then since these elementary row operations can be reversed, the matrixA can be obtained from the matrixB by a finite sequence of elementary row operations.
Definition.Ann×nmatrixAis said to be row equivalent to ann×nmatrixB if there exist a finite
number of elementaryn×nmatricesE1, . . . , Ek such thatB =Ek. . . E1A.
Remark. Note that B = Ek. . . E1A implies that A = E−1 1 . . . E−
1
k B. It follows that if A is row equivalent toB, then B is row equivalent toA. We usually say thatAandB are row equivalent.
The following result gives conditions equivalent to the invertibility of ann×nmatrixA.
PROPOSITION 2N.Suppose that
A=
a11 . . . a1n ..
. ... an1 . . . ann
,
and that
x=
x1
.. . xn
and 0=
0
.. . 0
aren×1matrices, wherex1, . . . , xn are variables.
(a) Suppose that the matrix A is invertible. Then the systemAx=0 of linear equations has only the trivial solution.
(b) Suppose that the systemAx=0of linear equations has only the trivial solution. Then the matrices AandIn are row equivalent.
(c) Suppose that the matricesA andIn are row equivalent. ThenA is invertible.
Proof.(a) Suppose thatx0is a solution of the system Ax=0. Then since Ais invertible, we have
x0=Inx0= (A−1A)x0=A−1(Ax0) =A−10=0.
It follows that the trivial solution is the only solution.
(b) Note that if the system Ax =0of linear equations has only the trivial solution, then it can be reduced by elementary row operations to the system
x1= 0, . . . , xn= 0.
This is equivalent to saying that the array
a11 . . . a1n ..
. ...
an1 . . . ann
0 .. . 0
can be reduced by elementary row operations to the reduced row echelon form
1 . . . 0 ..
. ... 0 . . . 1
0 .. . 0
.
Hence the matricesAandIn are row equivalent.
(c) Suppose that the matricesAandInare row equivalent. Then there exist elementaryn×nmatrices E1, . . . , Ek such thatIn=Ek. . . E1A. By Proposition 2M, the matricesE1, . . . , Ek are all invertible, so
that
A=E−1 1 . . . E−
1
k In =E−
1 1 . . . E−
1
k is a product of invertible matrices, and is therefore itself invertible.
2.7. Consequences of Invertibility
Suppose that the matrix
A=
a11 . . . a1n ..
. ...
an1 . . . ann
is invertible. Consider the systemAx=b, where
x=
x1
.. . xn
and b=
b1
.. . bn
aren×1 matrices, wherex1, . . . , xn are variables andb1, . . . , bn ∈Rare arbitrary. SinceAis invertible,
let us considerx=A−1b. Clearly
Ax=A(A−1b) = (AA−1)b=Inb=b,
so thatx=A−1bis a solution of the system. On the other hand, letx
0 be any solution of the system.
ThenAx0=b, so that
x0=Inx0= (A−1A)x0=A−1(Ax0) =A−1b.
It follows that the system has unique solution. We have proved the following important result.
PROPOSITION 2P.Suppose that
A=
a11 . . . a1n ..
. ... an1 . . . ann
,
and that
x=
x1
.. . xn
and b=
b1
.. . bn
are n×1 matrices, where x1, . . . , xn are variables and b1, . . . , bn ∈ R are arbitrary. Suppose further
We next attempt to study the question in the opposite direction.
PROPOSITION 2Q.Suppose that
A=
a11 . . . a1n ..
. ... an1 . . . ann
,
and that
x=
x1
.. . xn
and b=
b1
.. . bn
are n×1 matrices, wherex1, . . . , xn are variables. Suppose further that for every b1, . . . , bn ∈ R, the
systemAx=bof linear equations is soluble. Then the matrixAis invertible.
Proof.Suppose that
b1=
1 0 .. . 0 0
, . . . , bn=
0 0 .. . 0 1
.
In other words, for everyj= 1, . . . , n,bjis ann×1 matrix with entry 1 on rowj and entry 0 elsewhere. Now let
x1=
x11
.. . xn1
, . . . , xn=
x1n
.. . xnn
denote respectively solutions of the systems of linear equations
Ax=b1, . . . , Ax=bn.
It is easy to check that
A(x1 . . . xn) = (b1 . . . bn) ;
in other words,
A
x11 . . . x1n ..
. ...
xn1 . . . xnn
=In,
so thatAis invertible.
We can now summarize Propositions 2N, 2P and 2Q as follows.
PROPOSITION 2R.In the notation of Proposition 2N, the following four statements are equivalent: (a) The matrixA is invertible.
(b) The systemAx=0of linear equations has only the trivial solution. (c) The matrices AandIn are row equivalent.
2.8. Application to Economics
In this section, we describe briefly the Leontief input-output model, where an economy is divided inton sectors.
For every i= 1, . . . , n, let xi denote the monetary value of the total output of sector i over a fixed period, and let di denote the output of sector i needed to satisfy outside demand over the same fixed period. Collecting togetherxi anddi fori= 1, . . . , n, we obtain the vectors
x=
x1
.. . xn
∈Rn and d=
d1
.. . dn
∈Rn,
known respectively as the production vector and demand vector of the economy.
On the other hand, each of thensectors requires material from some or all of the sectors to produce its output. For i, j = 1, . . . , n, let cij denote the monetary value of the output of sector i needed by sectorj to produce one unit of monetary value of output. For everyj= 1, . . . , n, the vector
cj=
c1j .. . cnj
∈Rn
is known as the unit consumption vector of sectorj. Note that the column sum
c1j+. . .+cnj≤1 (5)
in order to ensure that sectorj does not make a loss. Collecting together the unit consumption vectors, we obtain the matrix
C= (c1 . . . cn) =
c11 . . . c1n ..
. ...
cn1 . . . cnn
,
known as the consumption matrix of the economy.
Consider the matrix product
Cx=
c11x1+. . .+c1nxn
.. .
cn1x1+. . .+cnnxn
.
For everyi= 1, . . . , n, the entryci1x1+. . .+cinxnrepresents the monetary value of the output of sector
ineeded by all the sectors to produce their output. This leads to the production equation
x=Cx+d. (6)
Here Cx represents the part of the total output that is required by the various sectors of the economy to produce the output in the first place, anddrepresents the part of the total output that is available to satisfy outside demand.
Clearly (I−C)x=d. If the matrixI−C is invertible, then
x= (I−C)−1d
PROPOSITION 2S.Suppose that the entries of the consumption matrix C and the demand vectord are non-negative. Suppose further that the inequality (5)holds for each column of C. Then the inverse matrix(I−C)−1 exists, and the production vector x= (I−C)−1dhas non-negative entries and is the
unique solution of the production equation(6).
Let us indulge in some heuristics. Initially, we have demandd. To produce d, we need Cdas input. To produce this extra Cd, we need C(Cd) = C2d as input. To produce this extra C2d, we need
C(C2d) =C3das input. And so on. Hence we need to produce
d+Cd+C2d+C3d+. . .= (I+C+C2+C3+. . .)d
in total. Now it is not difficult to check that for every positive integerk, we have
(I−C)(I+C+C2+C3+. . .+Ck) =I−Ck+1.
If the entries ofCk+1 are all very small, then
(I−C)(I+C+C2+C3+. . .+Ck)≈I, so that
(I−C)−1≈I+C+C2+C3+. . .+Ck. This gives a practical way of approximating (I−C)−1, and also suggests that
(I−C)−1=I+C+C2+C3+. . . .
Example 2.8.1.An economy consists of three sectors. Their dependence on each other is summarized
in the table below:
To produce one unit of monetary value of output in sector
1 2 3
monetary value of output required from sector 1 0.3 0.2 0.1 monetary value of output required from sector 2 0.4 0.5 0.2 monetary value of output required from sector 3 0.1 0.1 0.3
Suppose that the final demand from sectors 1, 2 and 3 are respectively 30, 50 and 20. Then the production vector and demand vector are respectively
x=
x1
x2
x3
and d=
d1
d2
d3
=
30 50 20
,
while the consumption matrix is given by
C=
0.3 0.2 0.1 0.4 0.5 0.2 0.1 0.1 0.3
, so that I−C=
0.7 −0.2 −0.1
−0.4 0.5 −0.2
−0.1 −0.1 0.7
.
The production equation (I−C)x=dhas augmented matrix
0.7 −0.2 −0.1
−0.4 0.5 −0.2
−0.1 −0.1 0.7
30 50 20
, equivalent to
7 −2 −1
−4 5 −2
−1 −1 7
300 500 200
and which can be converted to reduced row echelon form
1 0 0 0 1 0 0 0 1
3200/27 6100/27 700/9 .
This givesx1≈119,x2≈226 andx3≈78, to the nearest integers.
2.9. Matrix Transformation on the Plane
Let A be a 2×2 matrix with real entries. A matrix transformation T : R2 → R2 can be defined as follows: For everyx= (x1, x2)∈R, we writeT(x) =y, wherey= (y1, y2)∈R2 satisfies
y1 y2 =A x1 x2 .
Such a transformation is linear, in the sense thatT(x′+x′′) =T(x′) +T(x′) for everyx′,x′′∈R2 and
T(cx) =cT(x) for everyx∈R2 and everyc∈R. To see this, simply observe that
A
x′
1+x′′1
x′
2+x′′2
=A x′ 1 x′ 2 +A x′′ 1 x′′ 2 and A cx1 cx2 =cA x1 x2 .
We shall study linear transformations in greater detail in Chapter 8. Here we confine ourselves to looking at a few simple matrix transformations on the plane.
Example 2.9.1.The matrix
A=
1 0
0 −1 satisfies A x1 x2 = 1 0
0 −1 x1 x2 = x1 −x2
for every (x1, x2)∈R2, and so represents reflection across thex1-axis, whereas the matrix
A=
−1 0
0 1 satisfies A x1 x2 =
−1 0
0 1 x1 x2 =
−x1
x2
for every (x1, x2)∈R2, and so represents reflection across thex2-axis. On the other hand, the matrix
A=
−1 0
0 −1 satisfies A x1 x2 =
−1 0
0 −1 x1
x2
=
−x1 −x2
for every (x1, x2)∈R2, and so represents reflection across the origin, whereas the matrix
A= 0 1 1 0 satisfies A x1 x2 = 0 1 1 0 x1 x2 = x2 x1
for every (x1, x2)∈R2, and so represents reflection across the linex1=x2. We give a summary in the
table below:
Transformation Equations Matrix
Reflection acrossx1-axis
ny1=x1 y2=−x2
1 0
0 −1
Reflection acrossx2-axis
ny1=−x1 y2=x2
−1 0
0 1
Reflection across origin nyy1=−x1
2=−x2
−1 0
0 −1
Reflection acrossx1=x2
ny1=x2 y2=x1
0 1 1 0
Example 2.9.2.Letk be a fixed positive real number. The matrix
A=
k 0 0 k
satisfies A
x1
x2
=
k 0 0 k
x1
x2
=
kx1
kx2
for every (x1, x2)∈ R2, and so represents a dilation if k > 1 and a contraction if 0 < k <1. On the
other hand, the matrix
A=
k 0 0 1
satisfies A
x1
x2
=
k 0 0 1
x1
x2
=
kx1
x2
for every (x1, x2)∈R2, and so represents an expansionn in the x1-direction ifk >1 and a compression
in thex1-direction if 0< k <1, whereas the matrix
A=
1 0 0 k
satisfies A
x1
x2
=
1 0 0 k
x1
x2
=
x1
kx2
for every (x1, x2)∈R2, and so represents a expansion in the x2-direction ifk >1 and a compression in
thex2-direction if 0< k <1. We give a summary in the table below:
Transformation Equations Matrix
Dilation or contraction by factork >0
y1=kx1
y2=kx2
k 0 0 k
Expansion or compression inx1-direction by factork >0
y1=kx1
y2=x2
k 0 0 1
Expansion or compression inx2-direction by factork >0
ny1=x1 y2=kx2
1 0 0 k
Example 2.9.3.Letk be a fixed real number. The matrix
A=
1 k 0 1
satisfies A
x1
x2
=
1 k 0 1
x1
x2
=
x1+kx2
x2
for every (x1, x2)∈R2, and so represents a shear in the x1-direction. For the casek= 1, we have the
following:
• • • •
• • • •
T
(k=1)
For the casek=−1, we have the following:
• • • •
• • • •
T
Similarly, the matrix
A=
1 0 k 1
satisfies A
x1
x2
=
1 0 k 1
x1
x2
=
x1
kx1+x2
for every (x1, x2)∈R2, and so represents a shear in the x2-direction. We give a summary in the table
below:
Transformation Equations Matrix
Shear inx1-direction
y1=x1+kx2
y2=x2
1 k 0 1
Shear inx2-direction
ny1=x1 y2=kx1+x2
1 0 k 1
Example 2.9.4.For anticlockwise rotation by an angleθ, we haveT(x1, x2) = (y1, y2), where
y1+ iy2= (x1+ ix2)(cosθ+ i sinθ),
and so
y1
y2
=
cosθ −sinθ sinθ cosθ
x1
x2
.
It follows that the matrix in question is given by
A=
cosθ −sinθ sinθ cosθ
.
We give a summary in the table below:
Transformation Equations Matrix
Anticlockwise rotation by angleθ
y1=x1cosθ−x2sinθ
y2=x1sinθ+x2cosθ
cosθ −sinθ sinθ cosθ
We conclude this section by establishing the following result which reinforces the linearity of matrix transformations on the plane.
PROPOSITION 2T. Suppose that a matrix transformation T : R2 → R2 is given by an invertible matrixA. Then
(a) the image under T of a straight line is a straight line;
(b) the image under T of a straight line through the origin is a straight line through the origin; and (c) the images under T of parallel straight lines are parallel straight lines.
Proof.Suppose thatT(x1, x2) = (y1, y2). SinceAis invertible, we havex=A−1y, where
x=
x1
x2
and y=
y1
y2
.
The equation of a straight line is given byαx1+βx2=γ or, in matrix form, by
(α β)
x1
x2
Hence
(α β)A−1
y1
y2
= (γ).
Let
(α′ β′) = (α β)A−1.
Then
(α′ β′)
y1
y2
= (γ).
In other words, the image underT of the straight lineαx1+βx2=γ isα′y1+β′y2=γ, clearly another
straight line. This proves (a). To prove (b), note that straight lines through the origin correspond to γ = 0. To prove (c), note that parallel straight lines correspond to different values of γ for the same values ofαandβ.
2.10. Application to Computer Graphics
Example 2.10.1.Consider the letterM in the diagram below:
Following the boundary in the anticlockwise direction starting at the origin, the 12 vertices can be represented by the coordinates
0 0
,
1 0
,
1 6
,
4 0
,
7 6
,
7 0
,
8 0
,
8 8
,
7 8
,
4 2
,
1 8
,
0 8
.
Let us apply a matrix transformation to these vertices, using the matrix
A=
1 1 2
0 1
,
representing a shear in thex1-direction with factor 0.5, so that
A
x1
x2
=
x1+12x2
x2
Then the images of the 12 vertices are respectively
0 0
,
1 0
,
4 6
,
4 0
,
10
6
,
7 0
,
8 0
,
12
8
,
11 8
,
5 2
,
5 8
,
4 8
,
noting that
1 12 0 1
0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8
=
0 1 4 4 10 7 8 12 11 5 5 4
0 0 6 0 6 0 0 8 8 2 8 8
.
In view of Proposition 2T, the image of any line segment that joins two vertices is a line segment that joins the images of the two vertices. Hence the image of the letter M under the shear looks like the following:
Next, we may wish to translate this image. However, a translation is a transformation by vector h= (h1, h2)∈R2 is of the form
y1
y2
=
x1
x2
+
h1
h2
for every (x1, x2)∈R2,
and this cannot be described by a matrix transformation on the plane. To overcome this deficiency, we introduce homogeneous coordinates. For every point (x1, x2) ∈ R2, we identify it with the point
(x1, x2,1)∈R3. Now we wish to translate a point (x1, x2) to (x1, x2) + (h1, h2) = (x1+h1, x2+h2), so
we attempt to find a 3×3 matrixA∗ such that
x1+h1
x2+h2
1
=A∗
x1
x2
1
for every (x1, x2)∈R2.
It is easy to check that
x1+h1
x2+h2
1
=
1 0 h1
0 1 h2
0 0 1
x1
x2
1
for every (x1, x2)∈R2.
It follows that using homogeneous coordinates, translation by vectorh= (h1, h2)∈R2can be described
by the matrix
A∗=
1 0 h1
0 1 h2
0 0 1
Remark.Consider a matrix transformationT :R2→R2 on the plane given by a matrix
A=
a11 a12
a21 a22
.
Suppose thatT(x1, x2) = (y1, y2). Then
y1
y2
=A
x1
x2
=
a11 a12
a21 a22
x1
x2
.
Under homogeneous coordinates, the image of the point (x1, x2,1) is now (y1, y2,1). Note that
y1
y2
1
=
a11 a12 0
a21 a22 0
0 0 1
x1
x2
1
.
It follows that homogeneous coordinates can also be used to study all the matrix transformations we have discussed in Section 2.9. By moving over to homogeneous coordinates, we simply replace the 2×2 matrixAby the 3×3 matrix
A∗=
A 0
0 1
.
Example 2.10.2.Returning to Example 2.10.1 of the letterM, the 12 vertices are now represented by
homogeneous coordinates, put in an array in the form
0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8 1 1 1 1 1 1 1 1 1 1 1 1
.
Then the 2×2 matrix
A=
1 12 0 1
is now replaced by the 3×3 matrix
A∗=
1 1
2 0
0 1 0
0 0 1
.
Note that
A∗
0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8 1 1 1 1 1 1 1 1 1 1 1 1
=
1 12 0
0 1 0
0 0 1
0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8 1 1 1 1 1 1 1 1 1 1 1 1
=
0 1 4 4 10 7 8 12 11 5 5 4
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
.
Next, let us consider a translation by the vector (2,3). The matrix under homogeneous coordinates for this translation is given by
B∗=
1 0 2 0 1 3 0 0 1
Note that
B∗A∗
0 1 1 4 7 7 8 8 7 4 1 0 0 0 6 0 6 0 0 8 8 2 8 8 1 1 1 1 1 1 1 1 1 1 1 1
=
1 0 2 0 1 3 0 0 1
0 1 4 4 10 7 8 12 11 5 5 4
0 0 6 0 6 0 0 8 8 2 8 8
1 1 1 1 1 1 1 1 1 1 1 1
=
2 3 6 6 12 9 10 14 13 7 7 6
3 3 9 3 9 3 3 11 11 5 11 11
1 1 1 1 1 1 1 1 1 1 1 1
,
giving rise to coordinates inR2, displayed as an array
2 3 6 6 12 9 10 14 13 7 7 6
3 3 9 3 9 3 3 11 11 5 11 11
Hence the image of the letterM under the shear followed by translation looks like the following:
Example 2.10.3. Under homogeneous coordinates, the transformation representing a reflection across
the x1-axis, followed by a shear by factor 2 in the x1-direction, followed by anticlockwise rotation by
90◦, and followed by translation by vector (2,−1), has matrix
1 0 2
0 1 −1
0 0 1
0 −1 0
1 0 0
0 0 1
1 2 0 0 1 0 0 0 1
1 0 0
0 −1 0
0 0 1
=
0 1 2
1 −2 −1
0 0 1
.
2.11. Complexity of a Non-Homogeneous System
One way of solving the systemAx=bis to write down the augmented matrix
a11 . . . a1n ..
. ...
an1 . . . ann
b1
.. . bn
, (7)
and then convert it to reduced row echelon form by elementary row operations.
The first step is to reduce it to row echelon form:
(I) First of all, we may need to interchange two rows in order to ensure that the top left entry in the array is non-zero. This requiresn+ 1 operations.
(II) Next, we need to multiply the new first row by a constant in order to make the top left pivot entry equal to 1. This requiresn+ 1 operations, and the array now looks like
1 a12 . . . a1n a21 a22 . . . a2n
..
. ... ... an1 an2 . . . ann
b1
b2
.. . bn
.
Note that we are abusing notation somewhat, as the entrya12 here, for example, may well be different
from the entrya12 in the augmented matrix (7).
(III) For each rowi= 2, . . . , n, we now multiply the first row by−ai1 and then add to rowi. This
requires 2(n−1)(n+ 1) operations, and the array now looks like
1 a12 . . . a1n 0 a22 . . . a2n ..
. ... ... 0 an2 . . . ann
b1
b2
.. . bn
. (8)
(IV) In summary, to proceed from the form (7) to the form (8), the number of operations required is at most 2(n+ 1) + 2(n−1)(n+ 1) = 2n(n+ 1).
(V) Our next task is to convert the smaller array
a22 . . . a2n ..
. ...
an2 . . . ann
b2
.. . bn
to an array that looks like
1 a23 . . . a2n 0 a33 . . . a3n ..
. ... ... 0 an3 . . . ann
b2
b3
.. . bn
.
These have one row and one column fewer than the arrays (7) and (8), and the number of operations required is at most 2m(m+ 1), wherem=n−1. We continue in this way systematically to reach row echelon form, and conclude that the number of operations required to convert the augmented matrix (7) to row echelon form is at most
n X
m=1
2m(m+ 1)≈2
3n
The next step is to convert the row echelon form to reduced row echelon form. This is simpler, as many entries are now zero. It can be shown that the number of operations required is bounded by something like 2n2 – indeed, by something like n2 if one analyzes the problem more carefully. In any
case, these estimates are insignificant compared to the estimate 23n3earlier.
We therefore conclude that the number of operations required to solve the systemAx=bby reducing the augmented matrix to reduced row echelon form is bounded by something like 23n3 whennis large.
Another way of solving the systemAx=bis to first find the inverse matrixA−1. This may involve
converting the array
a11 . . . a1n ..
. ...
an1 . . . ann
1 . ..
1
to reduced row echelon form by elementary row operations. It can be shown that the number of operations required is something like 2n3, so this is less efficient than our first method.
2.12. Matrix Factorization
In some situations, we may need to solve systems of linear equations of the formAx=b, with the same coefficient matrixA but for many different vectorsb. IfA is an invertible square matrix, then we can find its inverseA−1 and then compute A−1bfor each vectorb. However, the matrixA may not be a
square matrix, and we may have to convert the augmented matrix to reduced row echelon form.
In this section, we describe a way for solving this problem in a more efficient way. To describe this, we first need a definition.
Definition. A rectangular array of numbers is said to be in quasi row echelon form if the following
conditions are satisfied:
(1) The left-most non-zero entry of any non-zero row is called a pivot entry. It is not necessary for its value to be equal to 1.
(2) All zero rows are grouped together at the bottom of the array.
(3) The pivot entry of a non-zero row occurring lower in the array is to the right of the pivot entry of a non-zero row occurring higher in the array.
In other words, the array looks like row echelon form in shape, except that the pivot entries do not have to be equal to 1.
We consider first of all a special case.
PROPOSITION 2U.Suppose that an m×nmatrixA can be converted to quasi row echelon form by elementary row operations but without interchanging any two rows. ThenA=LU, whereLis anm×m lower triangular matrix with diagonal entries all equal to1 andU is a quasi row echelon form of A.
Sketch of Proof.Recall that applying an elementary row operation to anm×nmatrix corresponds
such elementary matrices unit lower triangular. If anm×n matrix A can be reduced in this way to quasi row echelon formU, then
U =Ek. . . E2E1A,
where the elementary matrices E1, E2, . . . , Ek are all unit lower triangular. Let L = (Ek. . . E2E1)−1.
Then A= LU. It can be shown that products and inverses of unit lower triangular matrices are also unit lower triangular. HenceLis a unit lower triangular matrix as required.
IfAx=bandA=LU, thenL(Ux) =b. Writingy=Ux, we have
Ly=b and Ux=y.
It follows that the problem of solving the systemAx=bcorresponds to first solving the systemLy=b and then solving the systemUx=y. Both of these systems are easy to solve since bothLand U have many zero entries. It remains to findLandU.
If we reduce the matrixAto quasi row echelon form by only performing the elementary row operation of adding a multiple of a row higher in the array to another row lower in the array, then U can be taken as the quasi row echelon form resulting from this. It remains to find L. However, note that L= (Ek. . . E2E1)−1, whereU =Ek. . . E2E1A, and so
I=Ek. . . E2E1L.
This means that the very elementary row operations that convert A to U will convert L to I. We therefore wish to create a matrixL such that this is satisfied. It is simplest to illustrate the technique by an example.
Example 2.12.1.Consider the matrix
A=
2 −1 2 −2 3
4 1 6 −5 8
2 −10 −4 8 −5 2 −13 −6 16 −5
.
The entry 2 in row 1 and column 1 is a pivot entry, and column 1 is a pivot column. Adding−2 times row 1 to row 2, adding−1 times row 1 to row 3, and adding−1 times row 1 to row 4, we obtain
2 −1 2 −2 3
0 3 2 −1 2
0 −9 −6 10 −8 0 −12 −8 18 −8
.
Note that the same three elementary row operations convert
1 0 0 0 2 1 0 0
1 ∗ 1 0
1 ∗ ∗ 1
to
1 0 0 0 0 1 0 0
0 ∗ 1 0
0 ∗ ∗ 1
.
Next, the entry 3 in row 2 and column 2 is a pivot entry, and column 2 is a pivot column. Adding 3 times row 2 to row 3, and adding 4 times row 2 to row 4, we obtain
2 −1 2 −2 3
0 3 2 −1 2
0 0 0 7 −2
0 0 0 14 0
Note that the same two elementary row operations convert
1 0 0 0
0 1 0 0
0 −3 1 0 0 −4 ∗ 1
to
1 0 0 0 0 1 0 0 0 0 1 0
0 0 ∗ 1
.
Next, the entry 7 in row 3 and column 4 is a pivot entry, and column 4 is a pivot column. Adding−2 times row 3 to row 4, we obtain the quasi row echelon form
U =
2 −1 2 −2 3
0 3 2 −1 2
0 0 0 7 −2
0 0 0 0 4
,
where the entry 4 in row 4 and column 5 is a pivot entry, and column 5 is a pivot column. Note that the same elementary row operation converts
1 0 0 0 0 1 0 0 0 0 1 0 0 0 2 1
to
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
.
Now observe that if we take
L=
1 0 0 0
2 1 0 0
1 −3 1 0 1 −4 2 1
,
thenLcan be converted toI4 by the same elementary operations that convertA toU.
The strategy is now clear. Every time we find a new pivot, we note its value and the entries below it. The lower triangular entries ofLare formed by these columns with each column divided by the value of the pivot entry in that column.
Example 2.12.2.Let us examine our last example again. The pivot columns at the time of establishing
the pivot entries are respectively
2 4 2 2 , ∗ 3 −9 −12 , ∗ ∗ 7 14 , ∗ ∗ ∗ 4 .
Dividing them respectively by the pivot entries 2, 3, 7 and 4, we obtain respectively the columns
1 2 1 1 , ∗ 1 −3 −4 , ∗ ∗ 1 2 , ∗ ∗ ∗ 1 .
Note that the lower triangular entries of the matrix
L=
1 0 0 0
2 1 0 0
1 −3 1 0 1 −4 2 1
LU FACTORIZATION ALGORITHM.
(1) Reduce the matrixA to quasi row echelon form by only performing the elementary row operation of adding a multiple of a row higher in the array to another row lower in the array. LetU be the quasi row echelon form obtained.
(2) Record any new pivot column at the time of its first recognition, and modify it by replacing any entry above the pivot entry by zero and dividing every other entry by the value of the pivot entry. (3) Let Ldenote the square matrix obtained by letting the columns be the pivot columns as modified in
step (2).
Example 2.12.3.We wish to solve the system of linear equationsAx=b, where
A=
3 −1 2 −4 1
−3 3 −5 5 −2
6 −4 11 −10 6
−6 8 −21 13 −9
and b=
1
−2 9
−15
.
Let us first apply LU factorization to the matrixA. The first pivot column is column 1, with modified version
1
−1 2
−2
.
Adding row 1 to row 2, adding−2 times row 1 to row 3, and adding 2 times row 1 to row 4, we obtain
3 −1 2 −4 1
0 2 −3 1 −1
0 −2 7 −2 4
0 6 −17 5 −7
.
The second pivot column is column 2, with modified version
0 1
−1 3
.
Adding row 2 to row 3, and adding−3 times row 2 to row 4, we obtain
3 −1 2 −4 1
0 2 −3 1 −1
0 0 4 −1 3
0 0 −8 2 −4
.
The third pivot column is column 3, with modified version
0 0 1
−2
.
Adding 2 times row 3 to row 4, we obtain the quasi row echelon form
3 −1 2 −4 1
0 2 −3 1 −1
0 0 4 −1 3
0 0 0 0 2
The last pivot column is column 5, with modified version
0 0 0 1
.
It follows that
L=
1 0 0 0
−1 1 0 0
2 −1 1 0
−2 3 −2 1
and U=
3 −1 2 −4 1
0 2 −3 1 −1
0 0 4 −1 3
0 0 0 0 2
.
We now consider the systemLy=b, with augmented matrix
1 0 0 0
−1 1 0 0
2 −1 1 0
−2 3 −2 1
1
−2 9
−15
.
Using row 1, we obtainy1= 1. Using row 2, we obtain y2−y1=−2, so thaty2=−1. Using row 3, we
obtainy3+ 2y1−y2 = 9, so thaty3 = 6. Using row 4, we obtain y4−2y1+ 3y2−2y3=−15, so that
y4= 2. Hence
y=
1
−1 6 2
.
We next consider the systemUx=y, with augmented matrix
3 −1 2 −4 1
0 2 −3 1 −1
0 0 4 −1 3
0 0 0 0 2
1
−1 6 2
.
Here the free variable isx4. Let x4 =t. Using row 4, we obtain 2x5= 2, so thatx5= 1. Using row 3,
we obtain 4x3= 6 +x4−3x5= 3 +t, so thatx3= 34+14t. Using row 2, we obtain
2x2=−1 + 3x3−x4+x5= 94−14t,
so thatx2=98− 1
8t. Using row 1, we obtain 3x1= 1 +x2−2x3+ 4x4−x5= 278t− 3
8, so thatx1= 98t− 1 8.
Hence
(x1, x2, x3, x4, x5) =
9t−1
8 , 9−t
8 , 3 +t
4 , t,1
, where t∈R.
Remarks.(1) In practical situations, interchanging rows is usually necessary to convert a matrixA to
quasi row echelon form. The technique here can be modified to produce a matrixL which is not unit lower triangular, but which can be made unit lower triangular by interchanging rows.
(2) Computing an LU factorization of ann×nmatrix takes approximately 23n3 operations. Solving
the systemsLy=bandUx=yrequires approximately 2n2 operations.
2.13. Application to Games of Strategy
Consider a game with two players. Player R, usually known as the row player, has m possible moves, denoted byi= 1,2,3, . . . , m, while playerC, usually known as the column player, hasnpossible moves, denoted byj = 1,2,3, . . . , n. For everyi= 1,2,3, . . . , mandj= 1,2,3, . . . , n, letaij denote the payoff that player C has to make to player R if player R makes move i and player C makes move j. These numbers give rise to the payoff matrix
A=
a11 . . . a1n ..
. ...
am1 . . . amn
.
The entries can be positive, negative or zero.
Suppose that for every i = 1,2,3, . . . , m, player R makes move i with probability pi, and that for everyj = 1,2,3, . . . , n, playerC makes movej with probabilityqj. Then
p1+. . .+pm= 1 and q1+. . .+qn= 1.
Assume that the players make moves independently of each other. Then for everyi= 1,2,3, . . . , mand j = 1,2,3, . . . , n, the numberpiqj represents the probability that playerR makes moveiand playerC makes movej. Then the double sum
EA(p,q) = m X
i=1
n X
j=1
aijpiqj
represents the expected payoff that playerC has to make to playerR.
The matrices
p= (p1 . . . pm) and q=
q1
.. . qn
are known as the strategies of playerR and playerC respectively. Clearly the expected payoff
EA(p,q) = m X
i=1
n X
j=1
aijpiqj= (p1 . . . pm)
a11 . . . a1n ..
. ...
am1 . . . amn
q1
.. . qn
=pAq.
Here we have slightly abused notation. The right hand side is a 1×1 matrix!
We now consider the following problem: Suppose thatAis fixed. Is it possible for playerR to choose a strategy pto try to maximize the expected payoff EA(p,q)? Is it possible for player C to choose a strategyqto try to minimize the expected payoffEA(p,q)?
FUNDEMENTAL THEOREM OF ZERO SUM GAMES.There exist strategiesp∗andq∗such
that
EA(p∗,q)≥EA(p∗,q∗)≥EA(p,q∗)
for every strategypof player Rand every strategy qof player C.
Remark.The strategyp∗is known as an optimal strategy for playerR, and the strategyq∗is known as
an optimal strategy for playerC. The quantityEA(p∗,q∗) is known as the value of the game. Optimal
strategies are not necessarily unique. However, if p∗∗ and q∗∗ are another pair of optimal strategies,