# On the Linear Convergence of Forward–Backward Splitting Method: Part I—Convergence Analysis

## Full text

(1)

https://doi.org/10.1007/s10957-020-01787-7

### On the Linear Convergence of Forward–Backward SplittingMethod: Part I—Convergence Analysis

Yunier Bello-Cruz1 ·Guoyin Li2 ·Tran T. A. Nghia3

Received: 19 March 2019 / Accepted: 9 November 2020

Abstract

In this paper, we study the complexity of the forward–backward splitting method with Beck–Teboulle’s line search for solving convex optimization problems, where the objective function can be split into the sum of a differentiable function and a nons- mooth function. We show that the method converges weakly to an optimal solution in Hilbert spaces, under mild standing assumptions without the global Lipschitz conti- nuity of the gradient of the differentiable function involved. Our standing assumptions is weaker than the corresponding conditions in the paper of Salzo (SIAM J Optim 27:2153–2181, 2017). The conventional complexity of sublinear convergence for the functional value is also obtained under the local Lipschitz continuity of the gradient of the differentiable function. Our main results are about the linear convergence of this method (in the quotient type), in terms of both the function value sequence and the itera- tive sequence, under only the quadratic growth condition. Our proof technique is direct from the quadratic growth conditions and some properties of the forward–backward splitting method without using error bounds or Kurdya-Łojasiewicz inequality as in other publications in this direction.

Keywords Nonsmooth and convex optimization problems·Forward–Backward splitting method·Linear convergence·Quadratic growth condition

Mathematics Subject Classification 65K05·90C25·90C30

### B

Guoyin Li

g.li@unsw.edu.au Yunier Bello-Cruz yunierbello@niu.edu Tran T. A. Nghia nttran@oakland.edu

1 Department of Mathematical Sciences, Northern Illinois University, DeKalb, IL 60115, USA 2 Department of Applied Mathematics, University of New South Wales, Sydney 2052, Australia 3 Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309, USA

(2)

1 Introduction

In this paper, we mainly consider an optimization problem of minimizing the sum of two convex functions, one of which is differentiable and the other is nondifferen- tiable. Many problems in this format have appeared in different fields of science and engineering including machine learning, compressed sensing, and image processing.

A particular class of this problem is the so-called1-regularized problem that usually provides sparse solutions with many applications in signal processing and statistics [1–3].

Among many methods solving the aforementioned optimization problems, the forward–backward splitting method (FBS in brief, known also as the proximal gradient method) [2,4–9] is very popular due to its simplicity and efficiency. It is well-known that this method is globally convergent to an optimal solution with the complexity O(k1)on the iterative cost values under the assumption that the gradient of the differentiable function is globally Lipschitz continuous [6]. An important advance was due to [10], where the sublinear convergence was improved too(k1)in Hilbert spaces. Independently, by using some line searches motivated from the work of Tseng [11] and under a more restricted step size rule, Bello-Cruz and Nghia [12] achieve the same complexity with the milder assumption that the aforementioned gradient is only locallyLipschitz continuous in finite-dimensional spaces. The results in [12] were fur- ther extended in [13] to different line searches for the FBS method and more general classes of optimization problems even in infinite-dimensional spaces. A recent work of Bauschke, Bolte, and Teboulle [14] also tackles the absence of Lipschitz continuous gradient by introducing the so-called NoLips algorithm close to the FBS method with the involvement of Bregman distance. Their algorithm provides the sublinear com- plexity O(1k)and guarantees the global convergence with an additional hypothesis about the closedness of the domain of an auxiliary Legendre function defined there.

Unfortunately, the latter assumption as well as those in [12] are not satisfied for the Poisson inverse regularized problems with Kullback–Leibler divergence [15,16], one of the main applications in [14]. This situation was overcome in [13] by using the FBS method and would be revisited again in our sequence [17] with further advanced achievements on the linear convergence in the quotient type (often referred as Q-linear convergence). Indeed, Salzo in [13, Section 4] proposed some new hypotheses to avoid the standard assumptions [6,10,12] on the domain of cost functions. Although these hypotheses are valid in many natural situations of optimization problems, some non- trivial parts of them can be further relaxed. In Sect.3, by reanalyzing the theory of the FBS method in [12,13], we relax and weaken several conditions assumed in [13] to acquire the same global convergence and sublinear complexity in Hilbert spaces.1The proof of many results in this section modifies our corresponding ones in [12] under the simplified standing assumptions. This section is not a major part of the paper, but the corresponding results will be used intensively in our main Sect.4.

The central part of our paper, Sect.4devotes to the linear convergence of the FBS method without the global Lipschitz continuity of gradient mentioned above in finite

1 We are indebted to some remarks from one referee that allow us to extend the results from finite dimensions to Hilbert spaces in the current version.

(3)

dimensions. Despite the popularity of the FBS method, the linear convergence of this method has been established recently by using Kurdya-Łojasiewicz inequality [18–22]

(even for nonconvex optimization problems) or some error bound conditions [23–27]

with the base from [28]. Those conditions are somehow equivalent in convex settings;

see, e.g., [18,19,25]. Linear convergence of the FBS method was also obtained in Hilbert spaces [5] for Lasso problems under some nontrivial conditions such asfinite basis injectivityandstrict sparsity pattern. These conditions were fully relaxed in the papers [19,20], which also work in infinite dimensions for more general optimization problems. Our results can be also extended to infinite dimensions by following some ideas of [19], but for simplicity, we restrict ourselves on finite dimensions in this sec- tion. Our approach is close to the works of Drusvyatskiy and Lewis [25] and Garrigos, Rosasco, and Villa [19,20] by using the so-calledquadratic growth conditionknown also as 2-conditioned property. However, our proof of linear convergence is more direct just from quadratic growth condition without using KŁ inequality as in [18–21]

or the error bound [28] as in [23,25] and reveals theQ-linear convergence for the FBS sequence rather than theR-one obtained in all the aforementioned works. Some of our linear rates are sharper than those in [19,25]. The property of quadratic growth condi- tion is indeed automatic in many classes of optimization problems including Poisson inverse regularized problem [14–16], least-square1regularized problems [3,5,6,19], group Lasso [23] or under mild second-order conditions on initial data [2,30–34]; see our second part [17] for further studies in this direction.

2 Preliminaries

Throughout the paper, H is a Hilbert space, where · and·,·denote the cor- responding norm and inner product in H. We use 0(H) to denote the set of proper, lower semicontinuous, and convex functions on H. Let h0(H), we write domh := {x ∈ H :h(x) < +∞}. The subdifferential ofh atx¯ ∈ domh is defined by

∂h(¯x):= {v∈H : v,x− ¯x ≤h(x)h(x),¯ xH}. (1) We sayh satisfies thequadratic growth conditionatx¯ with modulusκ >0 if there existsε >0 such that

h(x)h(x¯)+κ

2d2(x;(∂h)1(0)) for all x∈Bε(x¯). (2) Moreover, if in additionally(∂h)1(0)= { ¯x},his said to satisfy thestrongquadratic growth condition atx¯with modulusκ.

Some relationship between the quadratic growth condition and the so-called metric subregularity of the subdifferential could be found in [18,29–31,34] even for the case of nonconvex functions. The quadratic growth condition (2) is also calledquadratic functional growthproperty in [26] whenhis continuously differentiable over a closed convex set. In [19,20],his said to be 2-conditioned onBε(x¯)if it satisfies the quadratic growth condition (2).

(4)

The quadratic growth condition is slightly different in [25] as follows:

h(x)h(x)¯ +c

2d2(x;(∂h)1(0)) for all x∈ [h <h+ν] (3) for some constantsc, ν > 0, where[h < h+ν] := {xH :h(x) < h+ν}

andh:=infh(x)=h(x). It is easy to check that this growth condition implies (2).¯ Indeed, suppose that (3) is satisfied for somec, ν >0 Defineη:=

2ν

c and note that

¯

x∈ [h <h+ν]. Take anyx∈Bη(¯x), ifx∈ [h<h+ν], inequality (2) is trivial.

Ifx∈ [h/ <h+ν], it follows that h(x)≥h(¯x)+ν=h(¯x)+c

2η2≥h(x)¯ +c

2x− ¯x2h(x)¯ +c

2d2(x;(∂h)1(0)), which clearly verifies (2). Thus, (2) is weaker than (3), but it is equivalent to the local version of (3):

h(x)h(x)¯ +c

2d2(x;(∂h)1(0)) for all x ∈ [h <h+ν] ∩Bε(¯x) for some constantsc, ν, ε >0. This property has been showed recently in [18, Theo- rem 5] to be equivalent to the fact thathsatisfies the Kurdyka-Łojasiewicz inequality with order21.

To complete this section, we recall two important notions of linear convergence in our study. Letμbe a real number such that 0< μ <1. A sequence(xk)k∈NH is said to be R-linearly convergent toxwith rateμif there existsM >0 such that

xkxk =O(μk) for all k∈N.

We say(xk)k∈NisQ-linearly convergenttoxwith rateμif there existsK ∈Nwith xk+1xμxkx for all kK.

From the definitions, it can be directly verified that Q-linear convergence implies R-linear convergence with the same rate. On the other hand, in general, R-linear convergence does not imply Q-linear convergence.

3 Global Convergence of Forward–Backward Splitting Methods in Hilbert Spaces

Throughout the paper, we consider the following optimization problem

xmin∈H F(x):= f(x)+g(x), (4)

where f,g0(H)and f is differentiable on int(domf)∩domg; see our standing assumptions below. This section reanalyzes the theory for forward–backward splitting

(5)

(FBS) methods:

xk+1=proxαkg(xkαkf(xk)) (5) with the proximal operator defined later in (8) and the stepsizeαk > 0, when the gradient∇f is not globally Lipschitz continuous [12,13]. The results here will be employed in our main topic about the linear convergence of the FBS method in Sect.4.

The convergence of the FBS method without Lipschitz continuity of∇f was stud- ied in [12] via some line searches with the standard assumption domg ⊂ dom f together with some assumptions on the continuity of∇f. In [13], Salzo simplified these assumptions and significantly extended it to more general frameworks via many line searches. It is worth noting that the standard assumption domg⊂dom f, used in [12], is not satisfied by several important optimization problems including the Poisson inverse regularized problems with Kullback–Leibler divergence [15,16]. This situa- tion was overcome in [13, Section 4 and 5] with the following assumptions on f and g:

H1. f,g0(H)are bounded from below with int(dom f)∩domg = ∅.

H2. f is differentiable on int(domf)∩domg,f is uniformly continuous on any weakly compact subset of int(dom f)∩domg, andf is bounded on any sub- level sets ofF.

H3. Forx∈int(dom f)∩domg,{F ≤F(x)} ⊂int(domf)∩domg, and

d({FF(x)};H \int(dom f)) >0. (6) These assumptions hold under some mild conditions; see [13, Proposition 4.2] for details. However, they are not trivial even in finite dimensions. Throughout the paper, we focus on the FBS method with the popular Beck–Teboulle line search [6] that was also studied in [13]. Next, following the techniques used in [12,13], we show that the global convergence of the FBS method holds true under the below simplified standing assumptions that relax some conditions fromH1–H3:

A1. f,g0(H)and int(dom f)∩domg= ∅.

A2. f is differentiable at any point in int(dom f)∩domg,fis uniformly continuous on any weakly compact subset of int(dom f)∩domg.

A3. For anyx ∈int(dom f)∩domg,{FF(x)} ⊂int(domf)∩domg and for any weakly compact subsetX of{FF(x)}we have

d(X;H \int(dom f)) >0. (7)

Remark 3.1 (The standing assumptions in finite dimensions) When dimH <∞,A2 means that f is continuously differentiable on int(domf)∩domg. Moreover, inA3 the distance gap requirement (7) is automatically true asX(H \int(domf))= ∅ andX is compact. On the other hand, the condition (6) is still nontrivial; see the example below.

Clearly, some boundedness assumptions in H1–H2are relaxed in our A1–A2.

Moreover, the positive distance gap requirement (6) in (H3) is slightly stronger than (7) in(A3). But, it should be noted that condition (6) can, in general, be easier to verify

(6)

in infinite-dimensional spaces. Even in finite dimensions,(A1–A3)are strictly weaker than (H1–H3). To see this, consider p∈ {1,2},

Cp:= {x=(x1,x2)∈R2:x1x2p,x1>0,x2>0},

and f,g :R2→R∪ {+∞}be defined as f(x)= −logx1−logx2ifx∈R2++and +∞otherwise, andg(x)=δC1(x)whereδC1 is the indicator function of the setC1

defined above. It can be directly verified that the assumptions (A1–A3) are satisfied.

On the other hand,H1fails because f is unbounded on

int(dom f)∩domg=R2++C1=C1= {(x1,x2)

∈R2:x1x2≥1,x1>0,x2>0}.

Moreover,(2,1)∈int(dom f)∩domgand sinceF(2,1)=(f+g)(2,1)= −log 2, we get

{x: F(x)≤ −log 2} = {(x1,x2)∈R2:x1x2≥2,x1>0,x2>0} =C2. So, clearly H2 also fails because ∇f(x1,x2) = (−x11,x12) is unbounded on the sublevel set of F above. Finally, observing that (2k,k)C2 and (0,k) ∈ R2\int(domf)=R2\R2++, we see that

d

{F≤ −log 2};R2\int(dom f)

=d

C2;R2\R2++

≤ lim

k→∞

(2

k,k)(0,k) = lim

k→∞

2 k =0.

This shows thatH3fails in this case.

Next, let us recall the proximal operator proxg:H →domggiven by

proxg(z):=(Id +∂g)1(z) for all zH, (8) which is well-known to be a single-valued mapping with full domain. Withα >0, it is easy to check that

z−proxαg(z)

α∂g(proxαg(z)) for all zH. (9) LetS be the optimal solution set to problem (4) andxbe in int(dom f)∩domg due to our assumptionA3. Then,xSiff

0∈∂(f +g)(x)= ∇f(x)+∂g(x). (10) The following lemma, a consequence of [4, Theorem 23.47] is helpful in our proof of the finite termination of Beck–Teboulle’s line search under our standing assumptions.

(7)

Lemma 3.1 Let g0(H)and letα >0. Then, for every x∈domg, we have

proxαg(x)x as α↓0. (11)

Under our standing assumptions, we define theproximal forward–backward oper- ator

J :

int(domf)∩domg

×R++→domg by

J(x, α):=proxαg(xα∇f(x)) for all x ∈int(dom f)∩domg, α >0. (12) The following result is essentially from [12, Lemma 2.4]. Even if the standing assump- tions are different, the proof is quite similar. A more general variant of this result could be found in [35, Lemma 3].

Lemma 3.2 For any x ∈int(dom f)∩domg, we have α2

α1

x−J(x, α1) ≥ xJ(x, α2) ≥ xJ(x, α1) for all α2α1>0. (13) Proof By using (9) and (12) withz=xα∇f(x), we have

xα∇f(x)J(x, α)

α∂g(J(x, α)) (14)

for allα >0. For anyα2α1>0, it follows from the monotonicity of∂gand (14) that

0≤

xα2f(x)J(x, α2)

α2xα1f(x)J(x, α1)

α1 ,J(x, α2)J(x, α1)

=

xJ(x, α2)

α2xJ(x, α1)

α1 , (xJ(x, α1))(xJ(x, α2))

= −x−J(x, α2)2

α2 −x−J(x, α1)2 α1

+ 1

α2

+ 1 α1

x−J(x, α2),xJ(x, α1)

≤ −xJ(x, α2)2

α2xJ(x, α1)2 α1

+ 1

α2+ 1 α1

xJ(x, α2) · xJ(x, α1).

This gives us that

x−J(x, α2) − xJ(x, α1)

·

x−J(x, α2) −α2

α1x−J(x, α1)

≤0.

(8)

Since α2

α1 ≥1, we derive (13) and thus complete the proof of the lemma.

Next, let us present Beck–Teboulle’s backtracking line search [6], which is specifi- cally useful for forward–backward methods when the Lipschitz constant of∇f is not known or hard to estimate.

Linesearch BT(Beck–Teboulle’s Line Search) Givenx∈int(dom f)∩domg,σ >0 and 0< θ <1.

Input.Setα=σ andJ(x, α)=proxαg(xα∇f(x))withx∈domg.

While f(J(x, α)) > f(x)+ ∇f(x),J(x, α)x + 1

2αx−J(x, α)2,do α=θα.

End While Output.α.

The outputαin this line search will be denoted byL S(x, σ, θ). Let us show the well- definedness and finite termination of this line search under the standing assumptions A1–A3below.

Proposition 3.1 (Finite Termination of Beck–Teboulle’s Line Search) Suppose that assumptionsA1–A2hold. Then, for any x ∈int(dom f)∩domg, we have

(i) The above line search terminates after finitely many iterations with the positive outputα¯ =L S(x, σ, θ).

(ii) x−u2− J(x,α)¯ −u2≥2α[F(J¯ (x,α))¯ −F(u)]for any uH. (iii) F(J(x,α))¯ −F(x)≤ − 1

2α¯J(x,α)−x¯ 2≤0. Consequently, byA3, J(x,α)¯ ∈ int(dom f)∩domg.

Proof Take anyx ∈int(dom f)∩domg. Let us justify (i) first. Note that J(x, α)is well-defined for anyα >0 because∇f(x)exists due to assumptionA2. IfxS, where S is the optimal solution set to problem (4), thenx = J(x, σ)due to (10) and (9). Thus, the line search stops with zero step and gives us the output σ and x = J(x, σ) ∈ int(dom f)∩domg. Ifx/ S, suppose by contradiction that the line search does not terminate after finitely many steps. Hence, for allαP :=

{σ, σθ, σθ2, . . .}it follows that

f(x),J(x, α)x + 1

2αx−J(x, α)2< f(J(x, α))f(x). (15) Since proxαgis non-expansive, we have

J(x, α)x ≤ proxαg(xα∇f(x))−proxαg(x) + proxαg(x)x

α∇f(x) + proxαg(x)x. (16)

Lemma3.1tells us that

J(x, α)x →0 as α↓0. (17)

(9)

Sincex ∈int(dom f), there exists∈Nsuch thatJ(x, α)∈int(dom f)for all αP:= {σ θ, σθ+1, . . .} ⊆P.

Thanks to the convexity of f, we have

f(x)f(J(x, α))≥ ∇f(J(x, α)),xJ(x, α) for αP. (18) This inequality together with (15) implies

1

J(x, α)x2<f

J(x, α)

− ∇f(x),J(x, α)x

≤∇f

J(x, α)

− ∇f(x)· J(x, α)−x, which yieldsJ(x, α)=xand

0<J(x, α)x

α <2∇f

J(x, α)

− ∇f(x) for all αP. (19) SincexJ(x, α) → 0 asα → 0 by (17), J(x, α) ∈ int(domf)∩domg for sufficiently smallα >0. Due to the continuity of∇f atx ∈int(dom f)∩domgby assumptionA2, we obtain from (19) that

α→lim0,α∈P

x−J(x, α)

α =0. (20)

Applying (9) withz=x−α∇f(x)gives us thatxJ(x, α)

α −∇f(x)∂g(J(x, α)).

Takingα → 0, we have−∇f(x)∂g(x)due to the demiclosedness of the subd- ifferentials; see, e.g., [38, Theorem 4.7.1 and Proposition 4.2.1(i)]. It follows that 0 ∈ ∇f(x)+∂g(x), which contradicts the hypothesis thatx/ Sby (10). Hence, the line search terminates after finitely many steps with the outputα¯ .

To proceed the proof of (ii), note that

f(J(x,α))¯ ≤ f(x)+ ∇f(x),J(x,α)¯ −x + 1

2α¯xJ(x,α)¯ 2. (21) Moreover, by (9) , we have

xJ(x,α)¯

¯

α − ∇f(x)∂g(J(x,α)).¯ Pick anyuH, we get from the latter that

g(u)g(J(x,α))¯ ≥

xJ(x,α)¯

¯

α − ∇f(x),uJ(x,α)¯ . (22)

(10)

Note also that f(u)f(x)≥ ∇f(x),ux.This together with (22) and (21) gives us that

F(u)=(f+g)(u) f(x)+g(J(x,α))¯ +

xJ(x,α)¯

¯

α − ∇f(x),uJ(x,α)¯ + ∇f(x),ux

= f(x)+g(J(x,α))¯ +1

¯

αxJ(x,α),¯ uJ(x,α) + ∇¯ f(x),J(x,α)¯ x

f(J(x,α))¯ +g(J(x,α))+¯ 1

¯

αxJ(x,α),¯ uJ(x,α)−¯ 1

2α¯J(x,α)¯ x2. It follows that

x−J(x,α),¯ J(x,α)¯ −u ≥ ¯α[F(J(x,α))¯ −F(u)] −1

2J(x,α)¯ −x2. Since 2xJ(x,α),¯ J(x,α)¯ −u = xu2J(x,α)¯ −x2J(x,α)¯ −u2, the latter implies that

x−u2J(x,α)¯ −u2≥2α[F¯ (J(x,α))¯ −F(u)], which clearly ensures (ii).

Finally, (iii) is a direct consequence of (ii) with u = x. It follows that J(x,α)¯ belongs to the sublevel set{F ≤ F(x)}. By assumptionA3,J(x,α)¯ ∈ int(dom f).

The proof is complete.

Now, we recall the forward–backward splitting method with line search proposed by [6] as following.

Forward–Backward Splitting Method with Backtracking Line Search(FBS method)

Step 0.Takex0∈int(domf)∩domg,σ >0 and 0< θ <1.

Step k.Set

xk+1:=J(xk, αk)=proxαkg(xkαkf(xk)) (23) withα1:=σ and

αk :=L S(xk, αk1, θ). (24) The following result which is a direct consequence of Proposition3.1plays the central role in our study.

Corollary 3.1 (Well-definedness of the FBS method) Let x0 ∈int(domf)∩domg.

The sequence(xk)k∈Nfrom the FBS method is well-defined, xk ∈ int(domf)g, and f is differentiable at any xkfor all k∈N. Moreover, for any x∈H, we have

(i) xkx2xk+1x2≥2αk

F(xk+1)F(x) .

(11)

(i) F(xk+1)F(xk)≤ − 1

2αkxk+1xk2.

Proof Thanks to Proposition3.1(iii),xk∈int(dom f)∩domgand f is differentiable at anyxkby assumptionA2. This verifies the well-definedness of(xk)k∈N. Moreover, both (i) and (ii) are consequence of (ii) and (iii) from Proposition3.1by replacing u =x,x=xk,α¯ =αk, andJ(x,α)¯ =J(xk, αk)=xk+1. The following result recovers the global convergence of the FBS method without supposing the Lipschitz continuity on the gradient∇f in [13, Theorem 3.18] and [12, Theorem 4.2] with our simplified assumptions(A1A3). The proof is similar to that of [12, Theorem 4.2] with some modifications and ideas from [13].

Theorem 3.1 (Global Convergence of the FBS Method) Let(xk)k∈Nbe the sequence generated from the FBS method. The following statements hold:

(i) If S= ∅, then(xk)k∈Nweakly converges to a point in S. Moreover,

klim→∞F(xk)= min

x∈H F(x). (25)

(ii) If S= ∅, then we have

klim→∞xk = +∞ and lim

k→∞F(xk)= inf

x∈H F(x).

Proof Let us justify (i) by supposing thatS= ∅. By Corollary3.1(i), for anyxS we have

xkx2− xk+1x2≥2αk[F(xk+1)F(x)] ≥0. (26) It follows that the sequence(xk)k∈Nis Fejér monotone with respect toS. Thus, it is bounded according to [4, Proposition 5.4]. Define M :=sup{xkx: k∈N}<

+∞for somexS, we get from (26) that

0≤2αk[F(xk+1)F(x)] ≤ xkx2− xk+1x2

=(xkx + xk+1x(xkx − xk+1x)

≤2Mxkxk+1. This yields that

0≤ F(xk+1)F(x)M xkxk+1

αk . (27)

Note further from Corollary3.1(ii) that

2σ[F(xk)F(xk+1)] ≥2αk[F(xk)F(xk+1)] ≥ xkxk+12.

(12)

As(F(xk))k∈Nis decreasing and bounded below byF(x), the latter implies

>2σ[F(x0)F(x)] ≥ k=0

2σ[F(xk)F(xk+1)] ≥ k=0

xkxk+12. (28)

Thus,xk−xk+1 →0 ask→ ∞. Letx¯be a weak accumulation point of(xk)k∈N. Let us find a subsequence(xnk)k∈Nweakly converging tox. As¯ Fis l.s.c. and convex, it is also weakly l.s.c.. Moreover, since(F(xk))k∈Nis decreasing, we haveF(x0)F(x).¯ According to assumptionsA2andA3, f is differentiable atx¯∈int(domf)∩domg.

Ask)k∈Nis decreasing by the construction of the line search in the FBS method, α:= lim

k→∞αk = lim

k→∞αnkexists.

Case 1:α >0. Note from (9) and (23that xnkαnkf(xnk)xnk+1

αnk∂g(xnk+1), which implies

xnkxnk+1

αnk + ∇f(xnk+1)− ∇f(xnk)∈ ∇f(xnk+1)+∂g(xnk+1). (29)

Applying Corollary 3.1(ii), we have xnkxnk+12 αnk

→ 0 as the sequence (F(xk))k∈Nis decreasing and thus converging. Sinceαnkα >0, xnkxnk+1

αnk

also converges to 0. Moreover, note from (28) thatxnkxnk+1 →0,f(xnk)

f(xnk+1) → 0 due to the uniform continuity of∇f on the weakly compact set x, (x¯ nk)k∈N, (xnk+1)k∈N

of[F ≤F(x0)]as in assumptionA2.

Takingk→ ∞in (29) gives us that 0∈ ∇f(¯x)+∂g(x)¯ due to the demiclosedness of the subdifferentials, which implies x¯ ∈ S. Furthermore, since(F(xk))k∈N is decreasing, (25) is a consequence of (27).

Case 2:α=0. Defineαˆnk := αnk

θ > αnk >0 andxˆnk := J xnkˆnk

∈domg. It follows from the definition ofLinesearch BTthat

f(xˆnk) > f(xnk)+ ∇f(xnk),xˆnkxnk + 1 2αˆnk

xnk− ˆxnk2. (30)

Due to Lemma3.2, we have

xnk − ˆxnk = xnkJ(xnkˆnk) ≤ αˆnk

αnkxnkJ(xnk, αnk)

= 1

θxnkxnk+1 →0,

which tells us that(xˆnk)k∈Nweakly converges tox¯ ∈ int(dom f)∩domg. Define A := { ¯x, (xnk)k∈N}. It follows thatA ⊆ [FF(x0)], which is weakly compact.

(13)

By assumptionA3,d(A,H \int(dom f)) >0. Asxnk− ˆxnk →0, we find some K >0 such thatxˆnk ∈int(domf)and thusxˆnk ∈int(dom f)∩domgfor anyk>K. Hence,∇f(xˆnk)is well-defined by assumptionA2. It follows that

f(xnk)f(xˆnk)+ ∇f(xˆnk),xnk− ˆxnk. This together with (30) gives us that

1

2αˆnkxnk − ˆxnk2<f(xnk)− ∇f(ˆxnk),xnk − ˆxnk

≤ ∇f(xnk)− ∇f(xˆnk) · xnk− ˆxnk. Hence, we have

f(xnk)− ∇f(xˆnk) ≥ 1 2αˆnk

xnk− ˆxnk. (31)

As

¯

x, (xnk)k∈N, (xˆnk)k∈N

is a weakly compact subset of int(dom f)∩domg and ˆxnkxnk →0, assumptionA2tells us that∇f(xnk)−∇f(ˆxnk) →0 ask→ ∞. We derive from (31) that

klim→∞

1 ˆ

αnkxnk− ˆxnk =0. (32)

Applying (9) withz=xnk− ˆαnkf(xnk)gives us that xnk− ˆxnk

ˆ αnk

= ∇f(ˆxnk)+xnk− ˆαnkf(xnk)− ˆxnk ˆ

αnk

∈ ∇f(xˆnk)+∂g(ˆxnk).

By lettingk→ ∞, it is similar to the conclusion after (29) that 0∈ ∇f(x)¯ +∂g(¯x) due to the demiclosedness of the subdifferentials, i.e.,x¯ ∈ S. To proceed the proof of (25) in this case, we observe from Lemma3.2that

xnk − ˆxnk =xnkJ(xnknk

θ )xnkJ(xnk, αnk) = xnkxnk+1. This together with (32) yieldxnkxnk+1

αnk

→0 ask → ∞. Since(F(xk))k∈Nis decreasing, we derive from the latter and (27) that

0= lim

k→∞M xnkxnk+1

αnk ≥ lim

k→∞F(xnk+1)F(x)= lim

k→∞F(xk)F(x)≥0, which ensures (25).

From the above two cases, we have (25) and the fact that any weak accumulation point of the Fejér sequence(xk)k∈Nbelongs toS. The classical Fejér theorem [4,

(14)

Theorem 5.5] tells us that(xk)k∈Nweakly converges to some point ofS. This verifies (i) of the theorem.

The proof for the second part (ii) follows the same lines in the proof of [12, Theo-

rem 4.2].

The following proposition shows that when f is locally Lipschitz continuous, the step sizeαk is bounded below by a positive number. The second part of this result coincides with [6, Remark 1.2].

Proposition 3.2 (Boundedness from Below for the Step Sizes) Let (xk)k∈N and k)k∈Nbe the two sequences generated from the FBS method. Suppose that S= ∅ and that the sequence(xk)k∈Nis converging to some xS. Iff is locally Lip- schitz continuous around x with modulus L, then there exists some K ∈ Nsuch that

αk≥min

αK L

>0 for all k>K. (33) Consequently, for k > K the line search L S(xk, αk1, θ) needs at most logθ

min

1,αKθL steps.

Furthermore, iff is globally Lipschitz continuous onint(domf)∩domg with uniform modulus L thenαk ≥ min{σ,θL}for all k ∈ N. In this case, line search L S(xk, αk1, θ)needs at mostlogθ

min{1,σθL}

steps for any k.

Proof To justify, suppose thatS= ∅, the sequence(xk)k∈Nis converging toxS, and that∇f is locally Lipschitz continuous aroundxwith constantL >0. We find someε >0 such that

f(x)− ∇f(y) ≤Lxy for all x,y∈Bε(x), (34) whereBε(x)is the closed ball inH with centerxand radiusε. Since(xk)k∈Nis converging tox, there exists someK ∈Nsuch that

xkxθε

2+θ < ε for all k>K (35) with 0< θ <1 defined inLinesearch BT. We claim that

αk≥min

αk1 L

for all k>K. (36)

Suppose by contradiction thatαk <min{αk1,θL}. Then,αk < αk1, and so, the loop inLinesearch BTat(xk, αk1)needs more than one iteration. Defineαˆk := αθk >0 andxˆk:= J(xkˆk), we have

f(xˆk) > f(xk)+ ∇f(xk),xˆkxk + 1

2αˆkxk− ˆxk2. (37)

(15)

Furthermore, it follows from Lemma3.2that xk− ˆxk = xkJ(xkˆk) ≤ αˆk

αkxkJ(xk, αk) = 1

θxkxk+1. This together with (35) yields

ˆxkx ≤ ˆxkxk + xkx

≤ 1

θxkxk+1 + xkx

≤ 1 θ · 2θε

2+θ + θε

2+θ =ε. (38)

Sincexk,xˆk∈Bε(x)by (35) and (38), we get from (34) that f(xˆk)f(xk)− ∇f(xk),xˆkxk

= 1

0

f(xk+t(xˆkxk))− ∇f(xk),xˆkxkdt

1

0

t L ˆxkxk2dt= L

2 ˆxkxk2.

Combining this with (37) yieldsαˆkL1 and thusαkθL. This is a contradiction.

If there is some H > K withH ∈ Nsuch thatαH > θL, we get from (36) that αkLθ for allkH. Otherwise, αk < θL for any k > K, which implies that αk =αk1=αKfor allk>K due to (36) and the decreasing property ofk)k∈N. In both cases, we have (33).

Now suppose that the line searchL S(xk, αk1, θ)needsd repetitions. It follows from (33) that

αKθdαk =αk1θd ≥min

αK, θ L

, which tells us thatd ≤logθ

min

1,αKθL .

Finally, suppose that ∇f is globally Lipschitz continuous with modulus L on int(dom f)∩domgsubset of int(domf). By using Proposition3.1(iii), we can repeat the above proof without concerningε,K and replace (36) byαk≥min

σ,θL

.

Next, we present the conventional complexity o(k1) of the FBS method [10, Theorem 3] but under our standing assumptions and that ∇f is locallyLipschitz continuous. The complexity remains valid when replacing the local Lipschitz condition there by the weaker one that the step size{αk}is bounded below by a positive number;

see Proposition3.2. This idea was initiated in [12, Theorem 4.3] in finite dimensions and extended to different kinds of line searches in Hilbert spaces in [13], e.g., [13,

(16)

Corollary 3.20]. For the completeness, we provide a short proof2, which could be obtained by following the method of proof as in [13, Theorem 3.18(iii)(d)] as well.

Theorem 3.2 (Sublinear Convergence of the FBS Method) Let (xk)k∈N be the sequence generated in the FBS method. Suppose that S = ∅and thatf is locally Lipschitz continuous around any point in S. Then,

klim→∞k

F(xk)− min

x∈H F(x)

=0. (39)

Proof By Proposition3.2,αk is bounded below by someα > 0. Take anyxS, we obtain from Corollary3.1(i) that

xkx2− xk+1x2≥2α[F(xk+1)F(x)] ≥0. (40) Asxkx2is decreasing,(xkx2)k∈Nis converging. Givenε > 0, there existsK ∈N(large enough) such that|xK+kx2− xKx2| ≤ε, for any k∈N. It follows from (40) that

ε≥ xKx2− xk+Kx2=

k+K1

=K

(xx2− x+1x2)

≥2α

k+K1

=K

[F(x+1)F(x)].

Since(F(xk))is decreasing, we get that ε≥2αk(F(xK+k)F(x))=2α k

K+k(K+k)(F(xK+k)F(x)), which implies that

0≤lim inf

k→∞ k(F(xk)F(x))≤lim sup

k→∞ k(F(xk)F(x))

=lim sup

k→∞ (K+k)(F(xK+k)F(x))ε.

This verifies (39) and completes the proof of the theorem.

4 Local Linear Convergence of Forward–Backward Splitting Methods In this section, we obtain the local Q-linear convergence for the FBS method under the quadratic growth condition and local Lipschitz continuity of∇f, which is automatic in many problems including Lasso problem and Poisson linear inverse regularized

2 We are grateful to one of the referees whose remarks lead us to this simple proof.

Updating...

## References

Related subjects :