On the Linear Convergence of Forward–Backward Splitting Method: Part I—Convergence Analysis

(1)

https://doi.org/10.1007/s10957-020-01787-7

On the Linear Convergence of Forward–Backward Splitting Method: Part I—Convergence Analysis

Yunier Bello-Cruz¹ ·Guoyin Li² ·Tran T. A. Nghia³

Received: 19 March 2019 / Accepted: 9 November 2020

Abstract

In this paper, we study the complexity of the forward–backward splitting method with Beck–Teboulle’s line search for solving convex optimization problems, where the objective function can be split into the sum of a differentiable function and a nonsmooth function. We show that the method converges weakly to an optimal solution in Hilbert spaces, under mild standing assumptions without the global Lipschitz continuity of the gradient of the differentiable function involved. Our standing assumptions is weaker than the corresponding conditions in the paper of Salzo (SIAM J Optim 27:2153–2181, 2017). The conventional complexity of sublinear convergence for the functional value is also obtained under the local Lipschitz continuity of the gradient of the differentiable function. Our main results are about the linear convergence of this method (in the quotient type), in terms of both the function value sequence and the iterative sequence, under only the quadratic growth condition. Our proof technique is direct from the quadratic growth conditions and some properties of the forward–backward splitting method without using error bounds or Kurdya-Łojasiewicz inequality as in other publications in this direction.

Keywords Nonsmooth and convex optimization problems·Forward–Backward splitting method·Linear convergence·Quadratic growth condition

Mathematics Subject Classification 65K05·90C25·90C30

B

^{Guoyin Li}

g.li@unsw.edu.au Yunier Bello-Cruz yunierbello@niu.edu Tran T. A. Nghia nttran@oakland.edu

1 Department of Mathematical Sciences, Northern Illinois University, DeKalb, IL 60115, USA 2 Department of Applied Mathematics, University of New South Wales, Sydney 2052, Australia 3 Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309, USA

(2)

1 Introduction

In this paper, we mainly consider an optimization problem of minimizing the sum of two convex functions, one of which is differentiable and the other is nondifferen- tiable. Many problems in this format have appeared in different fields of science and engineering including machine learning, compressed sensing, and image processing.

A particular class of this problem is the so-called1-regularized problem that usually provides sparse solutions with many applications in signal processing and statistics [1–3].

Among many methods solving the aforementioned optimization problems, the forward–backward splitting method (FBS in brief, known also as the proximal gradient method) [2,4–9] is very popular due to its simplicity and efficiency. It is well-known that this method is globally convergent to an optimal solution with the complexity O(k⁻¹)on the iterative cost values under the assumption that the gradient of the differentiable function is globally Lipschitz continuous [6]. An important advance was due to [10], where the sublinear convergence was improved too(k⁻¹)in Hilbert spaces. Independently, by using some line searches motivated from the work of Tseng [11] and under a more restricted step size rule, Bello-Cruz and Nghia [12] achieve the same complexity with the milder assumption that the aforementioned gradient is only locallyLipschitz continuous in finite-dimensional spaces. The results in [12] were further extended in [13] to different line searches for the FBS method and more general classes of optimization problems even in infinite-dimensional spaces. A recent work of Bauschke, Bolte, and Teboulle [14] also tackles the absence of Lipschitz continuous gradient by introducing the so-called NoLips algorithm close to the FBS method with the involvement of Bregman distance. Their algorithm provides the sublinear complexity O(¹_k)and guarantees the global convergence with an additional hypothesis about the closedness of the domain of an auxiliary Legendre function defined there.

Unfortunately, the latter assumption as well as those in [12] are not satisfied for the Poisson inverse regularized problems with Kullback–Leibler divergence [15,16], one of the main applications in [14]. This situation was overcome in [13] by using the FBS method and would be revisited again in our sequence [17] with further advanced achievements on the linear convergence in the quotient type (often referred as Q-linear convergence). Indeed, Salzo in [13, Section 4] proposed some new hypotheses to avoid the standard assumptions [6,10,12] on the domain of cost functions. Although these hypotheses are valid in many natural situations of optimization problems, some nontrivial parts of them can be further relaxed. In Sect.3, by reanalyzing the theory of the FBS method in [12,13], we relax and weaken several conditions assumed in [13] to acquire the same global convergence and sublinear complexity in Hilbert spaces.¹The proof of many results in this section modifies our corresponding ones in [12] under the simplified standing assumptions. This section is not a major part of the paper, but the corresponding results will be used intensively in our main Sect.4.

The central part of our paper, Sect.4devotes to the linear convergence of the FBS method without the global Lipschitz continuity of gradient mentioned above in finite

1 We are indebted to some remarks from one referee that allow us to extend the results from finite dimensions to Hilbert spaces in the current version.

(3)

dimensions. Despite the popularity of the FBS method, the linear convergence of this method has been established recently by using Kurdya-Łojasiewicz inequality [18–22]

(even for nonconvex optimization problems) or some error bound conditions [23–27]

with the base from [28]. Those conditions are somehow equivalent in convex settings;

see, e.g., [18,19,25]. Linear convergence of the FBS method was also obtained in Hilbert spaces [5] for Lasso problems under some nontrivial conditions such asfinite basis injectivityandstrict sparsity pattern. These conditions were fully relaxed in the papers [19,20], which also work in infinite dimensions for more general optimization problems. Our results can be also extended to infinite dimensions by following some ideas of [19], but for simplicity, we restrict ourselves on finite dimensions in this section. Our approach is close to the works of Drusvyatskiy and Lewis [25] and Garrigos, Rosasco, and Villa [19,20] by using the so-calledquadratic growth conditionknown also as 2-conditioned property. However, our proof of linear convergence is more direct just from quadratic growth condition without using KŁ inequality as in [18–21]

or the error bound [28] as in [23,25] and reveals theQ-linear convergence for the FBS sequence rather than theR-one obtained in all the aforementioned works. Some of our linear rates are sharper than those in [19,25]. The property of quadratic growth condition is indeed automatic in many classes of optimization problems including Poisson inverse regularized problem [14–16], least-square1regularized problems [3,5,6,19], group Lasso [23] or under mild second-order conditions on initial data [2,30–34]; see our second part [17] for further studies in this direction.

2 Preliminaries

Throughout the paper, H is a Hilbert space, where · and·,·denote the corresponding norm and inner product in H. We use 0(H) to denote the set of proper, lower semicontinuous, and convex functions on H. Let h ∈ 0(H), we write domh := {x ∈ H :h(x) < +∞}. The subdifferential ofh atx¯ ∈ domh is defined by

∂h(¯x):= {v∈H : v,x− ¯x ≤h(x)−h(x),¯ x∈H}. (1) We sayh satisfies thequadratic growth conditionatx¯ with modulusκ >0 if there existsε >0 such that

h(x)≥h(x¯)+κ

2d²(x;(∂h)⁻¹(0)) for all x∈Bε(x¯). (2) Moreover, if in additionally(∂h)⁻¹(0)= { ¯x},his said to satisfy thestrongquadratic growth condition atx¯with modulusκ.

Some relationship between the quadratic growth condition and the so-called metric subregularity of the subdifferential could be found in [18,29–31,34] even for the case of nonconvex functions. The quadratic growth condition (2) is also calledquadratic functional growthproperty in [26] whenhis continuously differentiable over a closed convex set. In [19,20],his said to be 2-conditioned onBε(x¯)if it satisfies the quadratic growth condition (2).

(4)

The quadratic growth condition is slightly different in [25] as follows:

h(x)≥h(x)¯ +c

2d²(x;(∂h)⁻¹(0)) for all x∈ [h <h^∗+ν] (3) for some constantsc, ν > 0, where[h < h^∗+ν] := {x ∈ H :h(x) < h^∗+ν}

andh^∗:=infh(x)=h(x). It is easy to check that this growth condition implies (2).¯ Indeed, suppose that (3) is satisfied for somec, ν >0 Defineη:=

2ν

c and note that

¯

x∈ [h <h^∗+ν]. Take anyx∈B_η(¯x), ifx∈ [h<h^∗+ν], inequality (2) is trivial.

Ifx∈ [h/ <h^∗+ν], it follows that h(x)≥h(¯x)+ν=h(¯x)+c

2η²≥h(x)¯ +c

2x− ¯x²≥h(x)¯ +c

2d²(x;(∂h)⁻¹(0)), which clearly verifies (2). Thus, (2) is weaker than (3), but it is equivalent to the local version of (3):

h(x)≥h(x)¯ +c

2d²(x;(∂h)⁻¹(0)) for all x ∈ [h <h^∗+ν] ∩B_ε(¯x) for some constantsc, ν, ε >0. This property has been showed recently in [18, Theo- rem 5] to be equivalent to the fact thathsatisfies the Kurdyka-Łojasiewicz inequality with order₂¹.

To complete this section, we recall two important notions of linear convergence in our study. Letμbe a real number such that 0< μ <1. A sequence(x^k)k∈N⊂H is said to be R-linearly convergent tox^∗with rateμif there existsM >0 such that

x^k−x^∗ ≤Mμ^k =O(μ^k) for all k∈N.

We say(x^k)k∈NisQ-linearly convergenttox^∗with rateμif there existsK ∈Nwith x^k⁺¹−x^∗ ≤μx^k−x^∗ for all k≥K.

From the definitions, it can be directly verified that Q-linear convergence implies R-linear convergence with the same rate. On the other hand, in general, R-linear convergence does not imply Q-linear convergence.

3 Global Convergence of Forward–Backward Splitting Methods in Hilbert Spaces

Throughout the paper, we consider the following optimization problem

xmin∈H F(x):= f(x)+g(x), (4)

where f,g∈0(H)and f is differentiable on int(domf)∩domg; see our standing assumptions below. This section reanalyzes the theory for forward–backward splitting

(5)

(FBS) methods:

x^k⁺¹=prox_α_k_g(x^k−αk∇f(x^k)) (5) with the proximal operator defined later in (8) and the stepsizeαk > 0, when the gradient∇f is not globally Lipschitz continuous [12,13]. The results here will be employed in our main topic about the linear convergence of the FBS method in Sect.4.

The convergence of the FBS method without Lipschitz continuity of∇f was studied in [12] via some line searches with the standard assumption domg ⊂ dom f together with some assumptions on the continuity of∇f. In [13], Salzo simplified these assumptions and significantly extended it to more general frameworks via many line searches. It is worth noting that the standard assumption domg⊂dom f, used in [12], is not satisfied by several important optimization problems including the Poisson inverse regularized problems with Kullback–Leibler divergence [15,16]. This situation was overcome in [13, Section 4 and 5] with the following assumptions on f and g:

H1. f,g ∈0(H)are bounded from below with int(dom f)∩domg = ∅.

H2. f is differentiable on int(domf)∩domg,∇f is uniformly continuous on any weakly compact subset of int(dom f)∩domg, and∇f is bounded on any sublevel sets ofF.

H3. Forx∈int(dom f)∩domg,{F ≤F(x)} ⊂int(domf)∩domg, and

d({F≤ F(x)};H \int(dom f)) >0. (6) These assumptions hold under some mild conditions; see [13, Proposition 4.2] for details. However, they are not trivial even in finite dimensions. Throughout the paper, we focus on the FBS method with the popular Beck–Teboulle line search [6] that was also studied in [13]. Next, following the techniques used in [12,13], we show that the global convergence of the FBS method holds true under the below simplified standing assumptions that relax some conditions fromH1–H3:

A1. f,g ∈0(H)and int(dom f)∩domg= ∅.

A2. f is differentiable at any point in int(dom f)∩domg,∇fis uniformly continuous on any weakly compact subset of int(dom f)∩domg.

A3. For anyx ∈int(dom f)∩domg,{F ≤ F(x)} ⊂int(domf)∩domg and for any weakly compact subsetX of{F ≤F(x)}we have

d(X;H \int(dom f)) >0. (7)

Remark 3.1 (The standing assumptions in finite dimensions) When dimH <∞,A2 means that f is continuously differentiable on int(domf)∩domg. Moreover, inA3 the distance gap requirement (7) is automatically true asX ∩(H \int(domf))= ∅ andX is compact. On the other hand, the condition (6) is still nontrivial; see the example below.

Clearly, some boundedness assumptions in H1–H2are relaxed in our A1–A2.

Moreover, the positive distance gap requirement (6) in (H3) is slightly stronger than (7) in(A3). But, it should be noted that condition (6) can, in general, be easier to verify

(6)

in infinite-dimensional spaces. Even in finite dimensions,(A1–A3)are strictly weaker than (H1–H3). To see this, consider p∈ {1,2},

Cp:= {x=(x1,x2)∈R²:x1x2≥ p,x1>0,x2>0},

and f,g :R²→R∪ {+∞}be defined as f(x)= −logx1−logx2ifx∈R²₊₊and +∞otherwise, andg(x)=δC₁(x)whereδC₁ is the indicator function of the setC1

defined above. It can be directly verified that the assumptions (A1–A3) are satisfied.

On the other hand,H1fails because f is unbounded on

int(dom f)∩domg=R²₊₊∩C1=C1= {(x1,x2)

∈R²:x1x2≥1,x1>0,x2>0}.

Moreover,(2,1)∈int(dom f)∩domgand sinceF(2,1)=(f+g)(2,1)= −log 2, we get

{x: F(x)≤ −log 2} = {(x1,x2)∈R²:x1x2≥2,x1>0,x2>0} =C2. So, clearly H2 also fails because ∇f(x1,x2) = (−_x¹₁,−_x¹₂) is unbounded on the sublevel set of F above. Finally, observing that (²_k,k) ∈ C2 and (0,k) ∈ R²\int(domf)=R²\R²₊₊, we see that

d

{F≤ −log 2};R²\int(dom f)

=d

C2;R²\R²₊₊

≤ lim

k→∞

(2

k,k)−(0,k) = lim

k→∞

2 k =0.

This shows thatH3fails in this case.

Next, let us recall the proximal operator prox_g:H →domggiven by

prox_g(z):=(Id +∂g)⁻¹(z) for all z∈H, (8) which is well-known to be a single-valued mapping with full domain. Withα >0, it is easy to check that

z−prox_α_g(z)

α ∈∂g(prox_α_g(z)) for all z∈H. (9) LetS^∗ be the optimal solution set to problem (4) andx^∗be in int(dom f)∩domg due to our assumptionA3. Then,x^∗∈ S^∗iff

0∈∂(f +g)(x^∗)= ∇f(x^∗)+∂g(x^∗). (10) The following lemma, a consequence of [4, Theorem 23.47] is helpful in our proof of the finite termination of Beck–Teboulle’s line search under our standing assumptions.

(7)

Lemma 3.1 Let g∈0(H)and letα >0. Then, for every x∈domg, we have

prox_α_g(x)→x as α↓0. (11)

Under our standing assumptions, we define theproximal forward–backward oper- ator

J :

int(domf)∩domg

×R++→domg by

J(x, α):=prox_α_g(x−α∇f(x)) for all x ∈int(dom f)∩domg, α >0. (12) The following result is essentially from [12, Lemma 2.4]. Even if the standing assumptions are different, the proof is quite similar. A more general variant of this result could be found in [35, Lemma 3].

Lemma 3.2 For any x ∈int(dom f)∩domg, we have α2

α1

x−J(x, α1) ≥ x−J(x, α2) ≥ x−J(x, α1) for all α2≥α1>0. (13) Proof By using (9) and (12) withz=x−α∇f(x), we have

x−α∇f(x)−J(x, α)

α ∈∂g(J(x, α)) (14)

for allα >0. For anyα2≥α1>0, it follows from the monotonicity of∂gand (14) that

0≤

x−α2∇f(x)−J(x, α2)

α2 − x−α1∇f(x)−J(x, α1)

α1 ,J(x, α2)−J(x, α1)

=

x−J(x, α2)

α2 −x−J(x, α1)

α1 , (x−J(x, α1))−(x−J(x, α2))

= −x−J(x, α2)²

α2 −x−J(x, α1)² α1

+ 1

α2

+ 1 α1

x−J(x, α2),x−J(x, α1)

≤ −x−J(x, α2)²

α2 −x−J(x, α1)² α1

+ 1

α2+ 1 α1

x−J(x, α2) · x−J(x, α1).

This gives us that

x−J(x, α2) − x−J(x, α1)

·

x−J(x, α2) −α2

α1x−J(x, α1)

≤0.

(8)

Since α2

α1 ≥1, we derive (13) and thus complete the proof of the lemma.

Next, let us present Beck–Teboulle’s backtracking line search [6], which is specifi- cally useful for forward–backward methods when the Lipschitz constant of∇f is not known or hard to estimate.

Linesearch BT(Beck–Teboulle’s Line Search) Givenx∈int(dom f)∩domg,σ >0 and 0< θ <1.

Input.Setα=σ andJ(x, α)=prox_α_g(x−α∇f(x))withx∈domg.

While f(J(x, α)) > f(x)+ ∇f(x),J(x, α)−x + 1

2αx−J(x, α)²,do α=θα.

End While Output.α.

The outputαin this line search will be denoted byL S(x, σ, θ). Let us show the well- definedness and finite termination of this line search under the standing assumptions A1–A3below.

Proposition 3.1 (Finite Termination of Beck–Teboulle’s Line Search) Suppose that assumptionsA1–A2hold. Then, for any x ∈int(dom f)∩domg, we have

(i) The above line search terminates after finitely many iterations with the positive outputα¯ =L S(x, σ, θ).

(ii) x−u²− J(x,α)¯ −u²≥2α[F(J¯ (x,α))¯ −F(u)]for any u∈H. (iii) F(J(x,α))¯ −F(x)≤ − 1

2α¯J(x,α)−x¯ ²≤0. Consequently, byA3, J(x,α)¯ ∈ int(dom f)∩domg.

Proof Take anyx ∈int(dom f)∩domg. Let us justify (i) first. Note that J(x, α)is well-defined for anyα >0 because∇f(x)exists due to assumptionA2. Ifx ∈ S^∗, where S^∗ is the optimal solution set to problem (4), thenx = J(x, σ)due to (10) and (9). Thus, the line search stops with zero step and gives us the output σ and x = J(x, σ) ∈ int(dom f)∩domg. Ifx ∈/ S^∗, suppose by contradiction that the line search does not terminate after finitely many steps. Hence, for allα ∈ P :=

{σ, σθ, σθ², . . .}it follows that

∇f(x),J(x, α)−x + 1

2αx−J(x, α)²< f(J(x, α))− f(x). (15) Since prox_α_gis non-expansive, we have

J(x, α)−x ≤ prox_αg(x−α∇f(x))−prox_α_g(x) + prox_αg(x)−x

≤α∇f(x) + prox_α_g(x)−x. (16)

Lemma3.1tells us that

J(x, α)−x →0 as α↓0. (17)

(9)

Sincex ∈int(dom f), there exists∈Nsuch thatJ(x, α)∈int(dom f)for all α∈P:= {σ θ, σθ⁺¹, . . .} ⊆P.

Thanks to the convexity of f, we have

f(x)− f(J(x, α))≥ ∇f(J(x, α)),x−J(x, α) for α∈P. (18) This inequality together with (15) implies

1

2αJ(x, α)−x²<∇f

J(x, α)

− ∇f(x),J(x, α)−x

≤∇f

J(x, α)

− ∇f(x)· J(x, α)−x, which yieldsJ(x, α)=xand

0<J(x, α)−x

α <2∇f

J(x, α)

− ∇f(x) for all α∈P. (19) Sincex− J(x, α) → 0 asα → 0 by (17), J(x, α) ∈ int(domf)∩domg for sufficiently smallα >0. Due to the continuity of∇f atx ∈int(dom f)∩domgby assumptionA2, we obtain from (19) that

α→lim0,α∈P

x−J(x, α)

α =0. (20)

Applying (9) withz=x−α∇f(x)gives us thatx−J(x, α)

α −∇f(x)∈∂g(J(x, α)).

Takingα → 0, we have−∇f(x) ∈ ∂g(x)due to the demiclosedness of the subdifferentials; see, e.g., [38, Theorem 4.7.1 and Proposition 4.2.1(i)]. It follows that 0 ∈ ∇f(x)+∂g(x), which contradicts the hypothesis thatx ∈/ S^∗by (10). Hence, the line search terminates after finitely many steps with the outputα¯ .

To proceed the proof of (ii), note that

f(J(x,α))¯ ≤ f(x)+ ∇f(x),J(x,α)¯ −x + 1

2α¯x−J(x,α)¯ ². (21) Moreover, by (9) , we have

x−J(x,α)¯

¯

α − ∇f(x)∈∂g(J(x,α)).¯ Pick anyu∈H, we get from the latter that

g(u)−g(J(x,α))¯ ≥

x−J(x,α)¯

¯

α − ∇f(x),u−J(x,α)¯ . (22)

(10)

Note also that f(u)−f(x)≥ ∇f(x),u−x.This together with (22) and (21) gives us that

F(u)=(f+g)(u)≥ f(x)+g(J(x,α))¯ +

x−J(x,α)¯

¯

α − ∇f(x),u−J(x,α)¯ + ∇f(x),u−x

= f(x)+g(J(x,α))¯ +1

¯

αx−J(x,α),¯ u−J(x,α) + ∇¯ f(x),J(x,α)¯ −x

≥ f(J(x,α))¯ +g(J(x,α))+¯ 1

¯

αx−J(x,α),¯ u−J(x,α)−¯ 1

2α¯J(x,α)¯ −x². It follows that

x−J(x,α),¯ J(x,α)¯ −u ≥ ¯α[F(J(x,α))¯ −F(u)] −1

2J(x,α)¯ −x². Since 2x−J(x,α),¯ J(x,α)¯ −u = x−u²− J(x,α)¯ −x²− J(x,α)¯ −u², the latter implies that

x−u²− J(x,α)¯ −u²≥2α[F¯ (J(x,α))¯ −F(u)], which clearly ensures (ii).

Finally, (iii) is a direct consequence of (ii) with u = x. It follows that J(x,α)¯ belongs to the sublevel set{F ≤ F(x)}. By assumptionA3,J(x,α)¯ ∈ int(dom f).

The proof is complete.

Now, we recall the forward–backward splitting method with line search proposed by [6] as following.

Forward–Backward Splitting Method with Backtracking Line Search(FBS method)

Step 0.Takex⁰∈int(domf)∩domg,σ >0 and 0< θ <1.

Step k.Set

x^k⁺¹:=J(x^k, αk)=prox_α_k_g(x^k−αk∇f(x^k)) (23) withα−1:=σ and

αk :=L S(x^k, αk−1, θ). (24) The following result which is a direct consequence of Proposition3.1plays the central role in our study.

Corollary 3.1 (Well-definedness of the FBS method) Let x⁰ ∈int(domf)∩domg.

The sequence(x^k)k∈Nfrom the FBS method is well-defined, x^k ∈ int(domf)∩g, and f is differentiable at any x^kfor all k∈N. Moreover, for any x∈H, we have

(i) x^k−x²− x^k⁺¹−x²≥2αk

F(x^k⁺¹)−F(x) .

(11)

(i) F(x^k⁺¹)−F(x^k)≤ − 1

2αkx^k⁺¹−x^k².

Proof Thanks to Proposition3.1(iii),x^k∈int(dom f)∩domgand f is differentiable at anyx^kby assumptionA2. This verifies the well-definedness of(x^k)k∈N. Moreover, both (i) and (ii) are consequence of (ii) and (iii) from Proposition3.1by replacing u =x,x=x^k,α¯ =αk, andJ(x,α)¯ =J(x^k, αk)=x^k⁺¹. The following result recovers the global convergence of the FBS method without supposing the Lipschitz continuity on the gradient∇f in [13, Theorem 3.18] and [12, Theorem 4.2] with our simplified assumptions(A1−A3). The proof is similar to that of [12, Theorem 4.2] with some modifications and ideas from [13].

Theorem 3.1 (Global Convergence of the FBS Method) Let(x^k)k∈Nbe the sequence generated from the FBS method. The following statements hold:

(i) If S^∗= ∅, then(x^k)k∈Nweakly converges to a point in S^∗. Moreover,

klim→∞F(x^k)= min

x∈H F(x). (25)

(ii) If S^∗= ∅, then we have

klim→∞x^k = +∞ and lim

k→∞F(x^k)= inf

x∈H F(x).

Proof Let us justify (i) by supposing thatS^∗= ∅. By Corollary3.1(i), for anyx^∗∈S^∗ we have

x^k−x^∗²− x^k⁺¹−x^∗²≥2αk[F(x^k⁺¹)−F(x^∗)] ≥0. (26) It follows that the sequence(x^k)k∈Nis Fejér monotone with respect toS^∗. Thus, it is bounded according to [4, Proposition 5.4]. Define M :=sup{x^k−x^∗: k∈N}<

+∞for somex^∗∈ S^∗, we get from (26) that

0≤2αk[F(x^k⁺¹)−F(x^∗)] ≤ x^k−x^∗²− x^k⁺¹−x^∗²

=(x^k−x^∗ + x^k⁺¹−x^∗)·(x^k−x^∗ − x^k⁺¹−x^∗)

≤2Mx^k−x^k⁺¹. This yields that

0≤ F(x^k⁺¹)−F(x^∗)≤M x^k−x^k⁺¹

αk . (27)

Note further from Corollary3.1(ii) that

2σ[F(x^k)−F(x^k⁺¹)] ≥2αk[F(x^k)−F(x^k⁺¹)] ≥ x^k−x^k⁺¹².

(12)

As(F(x^k))k∈Nis decreasing and bounded below byF(x^∗), the latter implies

∞>2σ[F(x⁰)−F(x^∗)] ≥ ∞ k=0

2σ[F(x^k)−F(x^k⁺¹)] ≥ ∞ k=0

x^k−x^k⁺¹². (28)

Thus,x^k−x^k⁺¹ →0 ask→ ∞. Letx¯be a weak accumulation point of(x^k)k∈N. Let us find a subsequence(xⁿ^k)k∈Nweakly converging tox. As¯ Fis l.s.c. and convex, it is also weakly l.s.c.. Moreover, since(F(x^k))k∈Nis decreasing, we haveF(x⁰)≥F(x).¯ According to assumptionsA2andA3, f is differentiable atx¯∈int(domf)∩domg.

As(αk)k∈Nis decreasing by the construction of the line search in the FBS method, α:= lim

k→∞αk = lim

k→∞αnkexists.

Case 1:α >0. Note from (9) and (23that xⁿ^k−αn_k∇f(xⁿ^k)−xⁿ^k⁺¹

αn_k ∈∂g(xⁿ^k⁺¹), which implies

xⁿ^k −xⁿ^k⁺¹

αn_k + ∇f(xⁿ^k⁺¹)− ∇f(xⁿ^k)∈ ∇f(xⁿ^k⁺¹)+∂g(xⁿ^k⁺¹). (29)

Applying Corollary 3.1(ii), we have xⁿ^k −xⁿ^k⁺¹² αnk

→ 0 as the sequence (F(x^k))k∈Nis decreasing and thus converging. Sinceαnk ≥α >0, xⁿ^k−xⁿ^k⁺¹

αn_k

also converges to 0. Moreover, note from (28) thatxⁿ^k−xⁿ^k⁺¹ →0,∇f(xⁿ^k)−

∇f(xⁿ^k⁺¹) → 0 due to the uniform continuity of∇f on the weakly compact set x, (x¯ ⁿ^k)k∈N, (xⁿ^k⁺¹)k∈N

of[F ≤F(x⁰)]as in assumptionA2.

Takingk→ ∞in (29) gives us that 0∈ ∇f(¯x)+∂g(x)¯ due to the demiclosedness of the subdifferentials, which implies x¯ ∈ S^∗. Furthermore, since(F(x^k))k∈N is decreasing, (25) is a consequence of (27).

Case 2:α=0. Defineαˆnk := αn_k

θ > αnk >0 andxˆⁿ^k := J xⁿ^k,αˆnk

∈domg. It follows from the definition ofLinesearch BTthat

f(xˆⁿ^k) > f(xⁿ^k)+ ∇f(xⁿ^k),xˆⁿ^k −xⁿ^k + 1 2αˆnk

xⁿ^k− ˆxⁿ^k². (30)

Due to Lemma3.2, we have

xⁿ^k − ˆxⁿ^k = xⁿ^k−J(xⁿ^k,αˆnk) ≤ αˆn_k

αn_kxⁿ^k−J(xⁿ^k, αnk)

= 1

θxⁿ^k−xⁿ^k⁺¹ →0,

which tells us that(xˆⁿ^k)k∈Nweakly converges tox¯ ∈ int(dom f)∩domg. Define A := { ¯x, (xⁿ^k)k∈N}. It follows thatA ⊆ [F ≤ F(x⁰)], which is weakly compact.

(13)

By assumptionA3,d(A,H \int(dom f)) >0. Asxⁿ^k− ˆxⁿ^k →0, we find some K >0 such thatxˆⁿ^k ∈int(domf)and thusxˆⁿ^k ∈int(dom f)∩domgfor anyk>K. Hence,∇f(xˆⁿ^k)is well-defined by assumptionA2. It follows that

f(xⁿ^k)≥ f(xˆⁿ^k)+ ∇f(xˆⁿ^k),xⁿ^k− ˆxⁿ^k. This together with (30) gives us that

1

2αˆn_kxⁿ^k − ˆxⁿ^k²<∇f(xⁿ^k)− ∇f(ˆxⁿ^k),xⁿ^k − ˆxⁿ^k

≤ ∇f(xⁿ^k)− ∇f(xˆⁿ^k) · xⁿ^k− ˆxⁿ^k. Hence, we have

∇f(xⁿ^k)− ∇f(xˆⁿ^k) ≥ 1 2αˆnk

xⁿ^k− ˆxⁿ^k. (31)

As

¯

x, (xⁿ^k)k∈N, (xˆⁿ^k)k∈N

is a weakly compact subset of int(dom f)∩domg and ˆxⁿ^k−xⁿ^k →0, assumptionA2tells us that∇f(xⁿ^k)−∇f(ˆxⁿ^k) →0 ask→ ∞. We derive from (31) that

klim→∞

1 ˆ

αn_kxⁿ^k− ˆxⁿ^k =0. (32)

Applying (9) withz=xⁿ^k− ˆαn_k∇f(xⁿ^k)gives us that xⁿ^k− ˆxⁿ^k

ˆ αnk

= ∇f(ˆxⁿ^k)+xⁿ^k− ˆαnk∇f(xⁿ^k)− ˆxⁿ^k ˆ

αnk

∈ ∇f(xˆⁿ^k)+∂g(ˆxⁿ^k).

By lettingk→ ∞, it is similar to the conclusion after (29) that 0∈ ∇f(x)¯ +∂g(¯x) due to the demiclosedness of the subdifferentials, i.e.,x¯ ∈ S^∗. To proceed the proof of (25) in this case, we observe from Lemma3.2that

xⁿ^k − ˆxⁿ^k =xⁿ^k −J(xⁿ^k,αnk

θ )≥ xⁿ^k −J(xⁿ^k, αn_k) = xⁿ^k−xⁿ^k⁺¹. This together with (32) yieldxⁿ^k−xⁿ^k⁺¹

αnk

→0 ask → ∞. Since(F(x^k))k∈Nis decreasing, we derive from the latter and (27) that

0= lim

k→∞M xⁿ^k−xⁿ^k⁺¹

αn_k ≥ lim

k→∞F(xⁿ^k⁺¹)−F(x^∗)= lim

k→∞F(x^k)−F(x^∗)≥0, which ensures (25).

From the above two cases, we have (25) and the fact that any weak accumulation point of the Fejér sequence(x^k)k∈Nbelongs toS^∗. The classical Fejér theorem [4,

(14)

Theorem 5.5] tells us that(x^k)k∈Nweakly converges to some point ofS^∗. This verifies (i) of the theorem.

The proof for the second part (ii) follows the same lines in the proof of [12, Theo-

rem 4.2].

The following proposition shows that when f is locally Lipschitz continuous, the step sizeαk is bounded below by a positive number. The second part of this result coincides with [6, Remark 1.2].

Proposition 3.2 (Boundedness from Below for the Step Sizes) Let (x^k)k∈N and (αk)k∈Nbe the two sequences generated from the FBS method. Suppose that S^∗= ∅ and that the sequence(x^k)k∈Nis converging to some x^∗ ∈ S^∗. If∇f is locally Lip- schitz continuous around x^∗ with modulus L, then there exists some K ∈ Nsuch that

αk≥min

αK,θ L

>0 for all k>K. (33) Consequently, for k > K the line search L S(x^k, αk−1, θ) needs at most log_θ

min

1,_α_K^θ_L steps.

Furthermore, if∇f is globally Lipschitz continuous onint(domf)∩domg with uniform modulus L thenαk ≥ min{σ,^θ_L}for all k ∈ N. In this case, line search L S(x^k, αk−1, θ)needs at mostlog_θ

min{1,_σ^θ_L}

steps for any k.

Proof To justify, suppose thatS^∗= ∅, the sequence(x^k)k∈Nis converging tox^∗∈ S^∗, and that∇f is locally Lipschitz continuous aroundx^∗with constantL >0. We find someε >0 such that

∇f(x)− ∇f(y) ≤Lx−y for all x,y∈Bε(x^∗), (34) whereBε(x^∗)is the closed ball inH with centerx^∗and radiusε. Since(x^k)k∈Nis converging tox^∗, there exists someK ∈Nsuch that

x^k−x^∗ ≤ θε

2+θ < ε for all k>K (35) with 0< θ <1 defined inLinesearch BT. We claim that

αk≥min

αk−1,θ L

for all k>K. (36)

Suppose by contradiction thatαk <min{αk−1,^θ_L}. Then,αk < αk−1, and so, the loop inLinesearch BTat(x^k, αk−1)needs more than one iteration. Defineαˆk := ^α_θ^k >0 andxˆ^k:= J(x^k,αˆk), we have

f(xˆ^k) > f(x^k)+ ∇f(x^k),xˆ^k−x^k + 1

2αˆkx^k− ˆx^k². (37)

(15)

Furthermore, it follows from Lemma3.2that x^k− ˆx^k = x^k−J(x^k,αˆk) ≤ αˆk

αkx^k−J(x^k, αk) = 1

θx^k−x^k⁺¹. This together with (35) yields

ˆx^k−x^∗ ≤ ˆx^k−x^k + x^k−x^∗

≤ 1

θx^k−x^k⁺¹ + x^k−x^∗

≤ 1 θ · 2θε

2+θ + θε

2+θ =ε. (38)

Sincex^k,xˆ^k∈B_ε(x^∗)by (35) and (38), we get from (34) that f(xˆ^k)− f(x^k)− ∇f(x^k),xˆ^k−x^k

= 1

0

∇f(x^k+t(xˆ^k−x^k))− ∇f(x^k),xˆ^k−x^kdt

≤ 1

0

t L ˆx^k−x^k²dt= L

2 ˆx^k−x^k².

Combining this with (37) yieldsαˆk≥ _L¹ and thusαk ≥ ^θ_L. This is a contradiction.

If there is some H > K withH ∈ Nsuch thatαH > ^θ_L, we get from (36) that αk ≥ _L^θ for allk ≥ H. Otherwise, αk < ^θ_L for any k > K, which implies that αk =αk−1=αKfor allk>K due to (36) and the decreasing property of(αk)k∈N. In both cases, we have (33).

Now suppose that the line searchL S(x^k, αk−1, θ)needsd repetitions. It follows from (33) that

αKθ^d≥αk =αk−1θ^d ≥min

αK, θ L

, which tells us thatd ≤log_θ

min

1,_α_K^θ_L .

Finally, suppose that ∇f is globally Lipschitz continuous with modulus L on int(dom f)∩domgsubset of int(domf). By using Proposition3.1(iii), we can repeat the above proof without concerningε,K and replace (36) byαk≥min

σ,^θ_L

.

Next, we present the conventional complexity o(k⁻¹) of the FBS method [10, Theorem 3] but under our standing assumptions and that ∇f is locallyLipschitz continuous. The complexity remains valid when replacing the local Lipschitz condition there by the weaker one that the step size{α^k}is bounded below by a positive number;

see Proposition3.2. This idea was initiated in [12, Theorem 4.3] in finite dimensions and extended to different kinds of line searches in Hilbert spaces in [13], e.g., [13,

(16)

Corollary 3.20]. For the completeness, we provide a short proof², which could be obtained by following the method of proof as in [13, Theorem 3.18(iii)(d)] as well.

Theorem 3.2 (Sublinear Convergence of the FBS Method) Let (x^k)k∈N be the sequence generated in the FBS method. Suppose that S^∗ = ∅and that∇f is locally Lipschitz continuous around any point in S^∗. Then,

klim→∞k

F(x^k)− min

x∈H F(x)

=0. (39)

Proof By Proposition3.2,αk is bounded below by someα > 0. Take anyx^∗ ∈ S^∗, we obtain from Corollary3.1(i) that

x^k−x^∗²− x^k⁺¹−x^∗²≥2α[F(x^k⁺¹)−F(x^∗)] ≥0. (40) Asx^k−x^∗²is decreasing,(x^k −x^∗²)k∈Nis converging. Givenε > 0, there existsK ∈N(large enough) such that|x^K⁺^k−x^∗²− x^K −x^∗²| ≤ε, for any k∈N. It follows from (40) that

ε≥ x^K−x^∗²− x^k⁺^K −x^∗²=

k+K−1

=K

(x−x^∗²− x⁺¹−x^∗²)

≥2α

k+K−1

=K

[F(x⁺¹)−F(x^∗)].

Since(F(x^k))is decreasing, we get that ε≥2αk(F(x^K⁺^k)−F(x^∗))=2α k

K+k(K+k)(F(x^K⁺^k)−F(x^∗)), which implies that

0≤lim inf

k→∞ k(F(x^k)−F(x^∗))≤lim sup

k→∞ k(F(x^k)−F(x^∗))

=lim sup

k→∞ (K+k)(F(x^K⁺^k)−F(x^∗))≤ ε 2α.

This verifies (39) and completes the proof of the theorem.

4 Local Linear Convergence of Forward–Backward Splitting Methods In this section, we obtain the local Q-linear convergence for the FBS method under the quadratic growth condition and local Lipschitz continuity of∇f, which is automatic in many problems including Lasso problem and Poisson linear inverse regularized

2 We are grateful to one of the referees whose remarks lead us to this simple proof.