Improving the Analysis with Chebyshev Polynomials

Ford∈N, the degree-dChebyshev polynomial(of the first kind)T_d∈R[λ]_≤d is defined by

Td(λ) :=







1, ifd= 0,

λ, ifd= 1,

2λT_d−1(λ)−T_d−2(λ), ifd≥2.

Proposition 4.13. Ifθ∈[−π/2, π/2] andd∈N, thenTd(cosθ) = cos(dθ).

Proof. Our proof is by induction on d∈N. By definition, ifd= 0, thenT0(cosθ) = 1 = cos(0) = cos(0θ), and ifd= 1, thenT1(cosθ) = cos(θ).

Letd≥1. Let us prove thatTd+1(cosθ) = cos((d+ 1)θ). We have cos((d+ 1)θ) = cosθcos(dθ)−sin(dθ) sinθ, cos((d−1)θ) = cosθcos(dθ) + sin(dθ) sinθ.

Therefore,

Td+1(cosθ) = 2 cosθ Td(cosθ)−T_d−1(cosθ) = 2 cosθcos(dθ)−cos((d−1)θ)

= 2 cosθcos(dθ)−cosθcos(dθ)−sin(dθ) sin(θ)

= cosθcos(dθ)−sin(dθ) sin(θ) = cos((d+ 1)θ).

Lemma 4.14. For everyλ∈Rwith|λ| ≥1 and for everyd∈N, Td(λ) =1

λ+p

λ²−1^d +

λ−p

λ²−1^d .

Proof. Letλ∈Rbe such that |λ| ≥1 and letµ:=λ+√

λ²−1. Note that µ(λ−p

λ²−1) =λ²−λ²+ 1 = 1.

Therefore,µ⁻¹=λ−√

λ²−1. Hence, to prove the statement of the lemma is equivalent to prove that Td(λ) = ¹₂(µ^d+µ^−d) ∀d∈N.

Let us prove the above claim by induction ond. For the base cases whered∈ {0,1}, we have ¹₂(µ⁰+µ⁰) = 1 =T₀(λ) and ¹₂(µ+µ⁻¹) =λ=T₁(λ).

Letd >1. Note that

µ²=λ²+ 2λp

λ²−1 +λ²−1 = 2λµ−1.

Similarly, we haveµ⁻²= 2λµ⁻¹−1. Therefore, Td(λ) = 2λT_d−1(λ)−T_d−2(λ) = 2λ1

2(µ^d−1+µ^−(d−1))−1

2(µ^d−2+µ^−(d−2))

= 1 2

µ^d−2(2λµ−1) +y^−(d−2)(2λµ⁻¹−1)

2 µ^d+µ^−d .

Letd∈Nand letα, β∈R++ be such that α < β. Define the polynomial

Q_α,β,d(λ) :=

β+α−2λ β−α

_β+α

β−α

Note thatQα,β,d ∈ Qd. Since ^β+α_β−α >1, by Lemma 4.14 we have that Td

β+α β−α

>0, ∀k∈N. (4.16)

Lemma 4.15. Ifα, β∈R++ are such thatα < β, then for everyd∈N,

Qα,β,d(λ)≤2

pβ/α−1 pβ/α+ 1

∀λ∈[α, β].

In particular, letA∈Sⁿ+ be such that λ⁺_min:=λ^↓_rank(A)(A)< λ_max:=λ_max(A). Then, for everyd∈N,

Q_λ+

min,λ_max,d(λ)≤2

pκ⁺(A)−1 pκ⁺(A) + 1

!^d

∀λ∈[λ⁺_min, λ_max], where κ⁺(A) :=λ_max/λ⁺_min.

Proof. Note that for everyλ∈[α, β],

β+α−2λ

β−α ∈[−1,1].

Letd∈N. Thus, by Proposition 4.13, Tt

β+α−2λ β−α

∈[−1,1].

Letκ:=β/α. Therefore, using Lemma 4.14, for everyλ∈[α, β],

Qα,b,d(λ) = Td

β+α−2λ β−α

T_d

β+α β−α

(4.16)

≤ Td

β+α β−α

−1

=Td

κ+ 1 κ−1

−1

= 2 √

κ+ 1

√κ−1 ^d

+ √

κ−1

√κ+ 1 ^d!−1

≤2 √

κ+ 1

√κ−1 ^−d

= 2 √

κ−1

√κ+ 1 ^d

where in the last inequality we used the fact that^√_κ−1

√κ+1

≥0.

Corollary 4.16. LetA∈Sⁿ+ be such thatλ^↓_rank(A)(A)< λ_max(A), letb∈Im(A), letε >0, and letx^∗=A^†b.

Letf be defined as in (4.2) and define

xt:= arg min

x∈Kt(A,b)

f(x) (4.17)

for everyt∈N. Then

kxt−x^∗k²_A≤2εkx^∗k²_A for everyt≥

√

κ⁺(A)

2 ln 2/ε, whereκ⁺(A) :=λ_max(A)/λ^↓_rank(A)(A).

Proof. Let λ⁺_min :=λ^↓_rank(A)(A), let λ_max :=λ_max(A), and let κ:= κ⁺(A). Since Q_λ⁺

min,λ_max,t(x)∈ Qt, by Theorem 4.11 we have that

kx_t−x^∗k²_A≤ kx^∗k²_Amin

q∈Qt

max

λ∈[λ⁺_min,λ_max]

q(λ)²≤ kx^∗k²_A max

λ∈[λ⁺_min,λ_max]

Q_λ+

min,λ_max,t(λ)². Using Lemma 4.15 and Lemma 2.2, we get that

kxt−x^∗k²_A≤ kx^∗k²_A max

λ∈[λ⁺_min,λ_max]

Q_λ+

min,λ_max,t(λ)²

≤2kx^∗k²_A √

κ−1

√κ+ 1 ^2t

≤2kx^∗k²_A

1− 1

√κ ^2t

≤2kx^∗k²_Aexp

−2t

√κ

≤kx^∗k²_A.

Corollary 4.17. Let A∈Sⁿ+, letb∈Im(A), let ε >0, and letx^∗=A^†b. Let f be defined as in (4.2) and define

xt:= arg min

x∈Kt(A,b)

f(x) (4.18)

for everyt∈N. If there areα, β∈R++ withα < β such that all butc eigenvalues ofAare in [α, β]∪ {0}, and the remainingceigenvalues are all greater thanβ, then

kxt−x^∗k²_A≤εkx^∗k²_A for everyt≥c+

√

β/α 2 ln²_ε.

Proof. Letλ1, . . . , λc∈(β,+∞) be the eigenvalues ofAthat are not in [α, β]∪ {0}and letr∈N. Define the polynomial

qr(λ) =Qα,β,r(λ)

i=1

1− λ

λi

Notice thatqr(λ)∈ Qr+c. Moreover,

qr(λi) = 0, ∀i∈[c]. (4.19)

Letκ:=β/α. Note that

i=1

1− λ

λi

≤1, ∀λ∈[α, β]∪ {0}.

Hence, we have

qr(λ)≤Qα,β,r(λ), ∀λ∈[α, β]∪ {0}. (4.20)

Therefore, ifr:=

√κ

2 ln 2/ε, thenq_r∈ Q_r+c. Hence, by Theorem 4.11, for every t≥r+c we have kxt−x^∗k²_A≤ kx^∗k²_Amin

q∈Qt

max

i∈[rank(A)]q(λ^↓_i(A))²

(4.19)

≤ kx^∗k²_Amin

q∈Qt

max

λ∈[α,β]q(λ)²

≤ kx^∗k²_A max

λ∈[α,β]|qr(λ)|²

(4.20)

≤ kx^∗k²_A max

λ∈[α,β]Q_α,β,r(λ)²

Le. 4.15

≤ 2kx^∗k²_A √

κ−1

√κ+ 1 ^2r

≤2kx^∗k²_A

1− 1

√κ ^2r

Le. 2.2

≤ 2kx^∗k²_Aexp

−2r

√κ

≤kx^∗k²_A.

Theorem 4.18. Let A∈Sⁿ+ such thatλ^↓_rank(A)(A)≥1 and Tr(A)≤τ ∈R++. Letb∈Im(A), let ε >0, and letx^∗=A^†b. Letf be defined as in (4.2) and define

xt:= arg min

x∈Kt(A,b)

f(x) (4.21)

fort∈N. Then

kxt−x^∗k²_A≤εkx^∗k²_A for everyt≥τ^1/3(1 + ln 1/ε).

Proof. By Theorem 2.6, we know that Tr(A) is the sum of the eigenvalues of A. Therefore, it is easy to see that for anyβ ∈R++, the number of eigenvalues that are greater thanβ is at most Tr(A)/β. Hence, if we setβ :=τ^2/3, at most Tr(A)^1/3of the eigenvalues ofAwill be outside the range [1, τ^2/3]. Letcbe the number of eigenvalues ofAthat are not in the range [1, τ^2/3]. Hence, by Corollary 4.17, we havekxt−x^∗k²_A≤εkx^∗k²_A fort≥c+

√

τ^2/3ln 1/ε. Sincec≤Tr(A)^1/3≤τ^1/3, the result follows.

Chapter 5

Fast Laplacian Solvers

Many interesting algorithms for graph problems, including the algorithm for the maximum flow problem that we study in Chapter 6, use a solver for a Laplacian system as a subroutine. Spielman and Teng described in seminal work [11, 12, 13, 14] the first nearly-linear time solver for Laplacian systems. We state their result in the following theorem.

Theorem 5.1. There is an algorithm that takes as input

• a weighted connected graphG= (V, E, w);

• a vector b∈R^V such thatb⊥1;

• a valueε >0,

and computes as outputx∈R^V such that

x−L^†b

_L≤εkL^†bk_L,

where L:=LG(w). This algorithm runs in time ˜O(mlog(1/ε)), wherem:=|E|.

The algorithm constructed by Spielman and Teng is intricate, makes use of complex graph-theoretic structures, and the power of logn hidden by the soft-O notation is quite large, but their solver opened the floodgates. Following their work, many authors were able to simplify and improve the algorithm of Theorem 5.1 (see [4, 8, 9]), and research for simpler Laplacian solvers is still active. One recent development in the area is due to Kyng and Sachdeva [10], who constructed a simple nearly-linear time Laplacian solver based purely on random sampling, not depending on any graph-theoretic construction. Moreover, fast Laplacian solvers have been used in the development of very efficient algorithms for a host of combinatorial problems (see [15]).

Although we shall not prove Theorem 5.1, in this chapter we describe a ˜O(m^4/3log(1/ε)) Laplacian solver.

This solver already has quite a respectable running time, and its construction contains many ideas used in the solver of Spielman and Teng.

5.1 Preconditioning

Let A∈R^m×n and letb∈Im(A). Usually, the time it takes to solve the systemAx=bwith iterative methods depends on some properties of the matrixAsuch as its condition number. The idea of preconditioning is to build a matrix M ∈ R^m×m such that applying an iterative method to the system M Ax = M b is considerably faster than applying the same method to the original system. Moreover, constructing a solution to the systemAx=bfrom a solution to the system M Ax=M bshould be efficient. We call the matrixM a preconditioner of the systemAx=b. In this section, we will focus on preconditioners that decrease the condition number of the matrix.

Note that A^† is an excellent preconditioner forAx=b, but computing it exactly boils down to solving the system itself. Moreover, the main operation that depends onAin many iterative algorithms, such as

the Conjugate Gradient method or the Power Method, is left-multiplyingA by a vector in its image. Hence, a preconditioner M such that it is time-consuming to left-multiply it by a vector may make an iterative algorithm slower. Therefore, when choosing a preconditioner of a linear system, there are trade-offs involving the time it takes to compute it, the time it takes to apply it to a vector (by left-multiplying it), and the decrease it yields on the condition number of the system.

A problem that one may be concerned is that, if A∈Sⁿ, preconditioning the system Ax=bmay not preserve the symmetry of the matrixA. This is a problem in some cases, as in the Conjugate Gradient method, which depends in a fundamental way on the symmetry of the matrixA. In this case, we may choose a preconditionerM ∈Sⁿ+. This implies, by Theorem 2.13, that there existsE∈R^n×n such thatM =EE^T. In this case, we will precondition the systemAx=b by considering the systemEAE^T =b. Hence, we will first show that we can obtain an approximate solution to the systemAx=bfrom an approximate solution toEAE^T =bwhen some conditions are met. In the next section, we will show how this preconditioning affects the condition number of the matrix, specially when considering the case of preconditioning a Laplacian system.

Lemma 5.2. Let A, B ∈Sⁿ+ be such that Null(A) = Null(B), and letE ∈R^n×n be such thatB =EE^T. DefineW :=E^†AE^T†. Then Im(E) = Im(B), and Im(E^T) = Im(W).

Proof. First, let us prove that Im(E) = Im(B). Let x∈Rⁿ. Note that

x∈Null(B)^{Prop. 2.12}⇐⇒ x^TBx= 0 ⇐⇒ x^TEE^Tx= 0 ⇐⇒ kE^Txk²= 0 ⇐⇒ x∈Null(E^T).

Hence, Null(B) = Null(E^T), and by Theorem (2.1), we have Im(B) = Im(E). This yields

Im(E^T^†)^{Prop. 2.25}= Im(E) = Im(B) = Im(A). (5.1) Let us now prove that Null(W) = Null(E). Let x∈Rⁿ. SinceA0, we have W 0. Hence,

x∈Null(W)^{Prop. 2.12}⇐⇒ x^TW x= 0 ⇐⇒ x^TE^†AE^T^†x= 0^{Prop. 2.12}⇐⇒ E^T^†x∈Null(A).

By (5.1), we haveE^T^†x∈Im(A). Therefore,E^T†x∈Null(A)∩Im(A) ={0}. This is the case if and only ifx∈Null(E^T^†), and by Proposition 2.25 we havex∈Null(E). Hence, Null(W) = Null(E). Theorem 2.1 implies that Im(W) = Im(E^T).

Theorem 5.3. LetA, B∈Sⁿ+be such that Null(A) = Null(B). LetE∈R^n×n be such thatB =EE^T, and defineW :=E^†AE^T†. Defineφ:y∈Im(W)7→E^T^†y∈Im(A). Then

hx, yi_W =hφ(x), φ(y)i_A, ∀x, y∈Im(W).

In particular, letb∈Im(A) and let ε >0. If, for eachy∈Rⁿ, ky^∗−yk_W ≤εky^∗k_W, where y^∗:=W^†E^†b, then

kx^∗−φ(y)k_A≤kx^∗k_A, where x^∗:=A^†b.

Proof. By Theorem 2.1, since Null(A) = Null(B), we have Im(B) = Im(A). By Proposition 2.25, we have Im(E^T^†) = Im(E), and by Lemma 5.2 we have Im(E) = Im(B) = Im(A). Therefore,

Im(E) = Im(B) = Im(A). (5.2)

Letx, y∈Im(W). We have

hφ(x), φ(y)i_A=φ(x)^TAφ(y) =x^TE^†AE^T†y=x^TW y=hx, yi_W. In particular,

ky^∗−yk²_W ≤ky^∗k²_W ⇐⇒ kφ(y^∗)−φ(y)k²_A=kφ(y^∗−y)k²_A≤kφ(y^∗)k²_A.

Therefore, it only remains to show that φ(y^∗) =x^∗. By Proposition 2.23 and by (5.2),

AA^†=EE^†= Proj_Im(A). (5.3)

Moreover, by Propositions 2.23 and 2.25,

W^†W is an orthogonal projector onto Im(W^†) = Im(W) = Im(E^†). (5.4) Therefore,

φ(y^∗) =E^T†y^∗=E^T^†W^†E^†b

(5.3)

= E^T†W^†E^†AA^†b^(5.3)= E^T^†W^†E^†A(EE^†)^TA^†b

=E^T†W^†E^†AE^T^†E^TA^†b=E^T^†W^†W E^TA^†b

(5.4)

= E^T†E^TA^†b= (EE^†)^TA^†b

(5.3)

= A^†b=x^∗.

No documento Almost Linear Time Algorithms for Flows in Graphs (páginas 42-48)