https://doi.org/10.1007/s10957-020-01787-7
On the Linear Convergence of Forward–Backward Splitting Method: Part I—Convergence Analysis
Yunier Bello-Cruz1 ·Guoyin Li2 ·Tran T. A. Nghia3
Received: 19 March 2019 / Accepted: 9 November 2020
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
In this paper, we study the complexity of the forward–backward splitting method with Beck–Teboulle’s line search for solving convex optimization problems, where the objective function can be split into the sum of a differentiable function and a nons- mooth function. We show that the method converges weakly to an optimal solution in Hilbert spaces, under mild standing assumptions without the global Lipschitz conti- nuity of the gradient of the differentiable function involved. Our standing assumptions is weaker than the corresponding conditions in the paper of Salzo (SIAM J Optim 27:2153–2181, 2017). The conventional complexity of sublinear convergence for the functional value is also obtained under the local Lipschitz continuity of the gradient of the differentiable function. Our main results are about the linear convergence of this method (in the quotient type), in terms of both the function value sequence and the itera- tive sequence, under only the quadratic growth condition. Our proof technique is direct from the quadratic growth conditions and some properties of the forward–backward splitting method without using error bounds or Kurdya-Łojasiewicz inequality as in other publications in this direction.
Keywords Nonsmooth and convex optimization problems·Forward–Backward splitting method·Linear convergence·Quadratic growth condition
Mathematics Subject Classification 65K05·90C25·90C30
B
Guoyin Lig.li@unsw.edu.au Yunier Bello-Cruz yunierbello@niu.edu Tran T. A. Nghia nttran@oakland.edu
1 Department of Mathematical Sciences, Northern Illinois University, DeKalb, IL 60115, USA 2 Department of Applied Mathematics, University of New South Wales, Sydney 2052, Australia 3 Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309, USA
1 Introduction
In this paper, we mainly consider an optimization problem of minimizing the sum of two convex functions, one of which is differentiable and the other is nondifferen- tiable. Many problems in this format have appeared in different fields of science and engineering including machine learning, compressed sensing, and image processing.
A particular class of this problem is the so-called1-regularized problem that usually provides sparse solutions with many applications in signal processing and statistics [1–3].
Among many methods solving the aforementioned optimization problems, the forward–backward splitting method (FBS in brief, known also as the proximal gradient method) [2,4–9] is very popular due to its simplicity and efficiency. It is well-known that this method is globally convergent to an optimal solution with the complexity O(k−1)on the iterative cost values under the assumption that the gradient of the differentiable function is globally Lipschitz continuous [6]. An important advance was due to [10], where the sublinear convergence was improved too(k−1)in Hilbert spaces. Independently, by using some line searches motivated from the work of Tseng [11] and under a more restricted step size rule, Bello-Cruz and Nghia [12] achieve the same complexity with the milder assumption that the aforementioned gradient is only locallyLipschitz continuous in finite-dimensional spaces. The results in [12] were fur- ther extended in [13] to different line searches for the FBS method and more general classes of optimization problems even in infinite-dimensional spaces. A recent work of Bauschke, Bolte, and Teboulle [14] also tackles the absence of Lipschitz continuous gradient by introducing the so-called NoLips algorithm close to the FBS method with the involvement of Bregman distance. Their algorithm provides the sublinear com- plexity O(1k)and guarantees the global convergence with an additional hypothesis about the closedness of the domain of an auxiliary Legendre function defined there.
Unfortunately, the latter assumption as well as those in [12] are not satisfied for the Poisson inverse regularized problems with Kullback–Leibler divergence [15,16], one of the main applications in [14]. This situation was overcome in [13] by using the FBS method and would be revisited again in our sequence [17] with further advanced achievements on the linear convergence in the quotient type (often referred as Q-linear convergence). Indeed, Salzo in [13, Section 4] proposed some new hypotheses to avoid the standard assumptions [6,10,12] on the domain of cost functions. Although these hypotheses are valid in many natural situations of optimization problems, some non- trivial parts of them can be further relaxed. In Sect.3, by reanalyzing the theory of the FBS method in [12,13], we relax and weaken several conditions assumed in [13] to acquire the same global convergence and sublinear complexity in Hilbert spaces.1The proof of many results in this section modifies our corresponding ones in [12] under the simplified standing assumptions. This section is not a major part of the paper, but the corresponding results will be used intensively in our main Sect.4.
The central part of our paper, Sect.4devotes to the linear convergence of the FBS method without the global Lipschitz continuity of gradient mentioned above in finite
1 We are indebted to some remarks from one referee that allow us to extend the results from finite dimensions to Hilbert spaces in the current version.
dimensions. Despite the popularity of the FBS method, the linear convergence of this method has been established recently by using Kurdya-Łojasiewicz inequality [18–22]
(even for nonconvex optimization problems) or some error bound conditions [23–27]
with the base from [28]. Those conditions are somehow equivalent in convex settings;
see, e.g., [18,19,25]. Linear convergence of the FBS method was also obtained in Hilbert spaces [5] for Lasso problems under some nontrivial conditions such asfinite basis injectivityandstrict sparsity pattern. These conditions were fully relaxed in the papers [19,20], which also work in infinite dimensions for more general optimization problems. Our results can be also extended to infinite dimensions by following some ideas of [19], but for simplicity, we restrict ourselves on finite dimensions in this sec- tion. Our approach is close to the works of Drusvyatskiy and Lewis [25] and Garrigos, Rosasco, and Villa [19,20] by using the so-calledquadratic growth conditionknown also as 2-conditioned property. However, our proof of linear convergence is more direct just from quadratic growth condition without using KŁ inequality as in [18–21]
or the error bound [28] as in [23,25] and reveals theQ-linear convergence for the FBS sequence rather than theR-one obtained in all the aforementioned works. Some of our linear rates are sharper than those in [19,25]. The property of quadratic growth condi- tion is indeed automatic in many classes of optimization problems including Poisson inverse regularized problem [14–16], least-square1regularized problems [3,5,6,19], group Lasso [23] or under mild second-order conditions on initial data [2,30–34]; see our second part [17] for further studies in this direction.
2 Preliminaries
Throughout the paper, H is a Hilbert space, where · and·,·denote the cor- responding norm and inner product in H. We use 0(H) to denote the set of proper, lower semicontinuous, and convex functions on H. Let h ∈ 0(H), we write domh := {x ∈ H :h(x) < +∞}. The subdifferential ofh atx¯ ∈ domh is defined by
∂h(¯x):= {v∈H : v,x− ¯x ≤h(x)−h(x),¯ x∈H}. (1) We sayh satisfies thequadratic growth conditionatx¯ with modulusκ >0 if there existsε >0 such that
h(x)≥h(x¯)+κ
2d2(x;(∂h)−1(0)) for all x∈Bε(x¯). (2) Moreover, if in additionally(∂h)−1(0)= { ¯x},his said to satisfy thestrongquadratic growth condition atx¯with modulusκ.
Some relationship between the quadratic growth condition and the so-called metric subregularity of the subdifferential could be found in [18,29–31,34] even for the case of nonconvex functions. The quadratic growth condition (2) is also calledquadratic functional growthproperty in [26] whenhis continuously differentiable over a closed convex set. In [19,20],his said to be 2-conditioned onBε(x¯)if it satisfies the quadratic growth condition (2).
The quadratic growth condition is slightly different in [25] as follows:
h(x)≥h(x)¯ +c
2d2(x;(∂h)−1(0)) for all x∈ [h <h∗+ν] (3) for some constantsc, ν > 0, where[h < h∗+ν] := {x ∈ H :h(x) < h∗+ν}
andh∗:=infh(x)=h(x). It is easy to check that this growth condition implies (2).¯ Indeed, suppose that (3) is satisfied for somec, ν >0 Defineη:=
2ν
c and note that
¯
x∈ [h <h∗+ν]. Take anyx∈Bη(¯x), ifx∈ [h<h∗+ν], inequality (2) is trivial.
Ifx∈ [h/ <h∗+ν], it follows that h(x)≥h(¯x)+ν=h(¯x)+c
2η2≥h(x)¯ +c
2x− ¯x2≥h(x)¯ +c
2d2(x;(∂h)−1(0)), which clearly verifies (2). Thus, (2) is weaker than (3), but it is equivalent to the local version of (3):
h(x)≥h(x)¯ +c
2d2(x;(∂h)−1(0)) for all x ∈ [h <h∗+ν] ∩Bε(¯x) for some constantsc, ν, ε >0. This property has been showed recently in [18, Theo- rem 5] to be equivalent to the fact thathsatisfies the Kurdyka-Łojasiewicz inequality with order21.
To complete this section, we recall two important notions of linear convergence in our study. Letμbe a real number such that 0< μ <1. A sequence(xk)k∈N⊂H is said to be R-linearly convergent tox∗with rateμif there existsM >0 such that
xk−x∗ ≤Mμk =O(μk) for all k∈N.
We say(xk)k∈NisQ-linearly convergenttox∗with rateμif there existsK ∈Nwith xk+1−x∗ ≤μxk−x∗ for all k≥K.
From the definitions, it can be directly verified that Q-linear convergence implies R-linear convergence with the same rate. On the other hand, in general, R-linear convergence does not imply Q-linear convergence.
3 Global Convergence of Forward–Backward Splitting Methods in Hilbert Spaces
Throughout the paper, we consider the following optimization problem
xmin∈H F(x):= f(x)+g(x), (4)
where f,g∈0(H)and f is differentiable on int(domf)∩domg; see our standing assumptions below. This section reanalyzes the theory for forward–backward splitting
(FBS) methods:
xk+1=proxαkg(xk−αk∇f(xk)) (5) with the proximal operator defined later in (8) and the stepsizeαk > 0, when the gradient∇f is not globally Lipschitz continuous [12,13]. The results here will be employed in our main topic about the linear convergence of the FBS method in Sect.4.
The convergence of the FBS method without Lipschitz continuity of∇f was stud- ied in [12] via some line searches with the standard assumption domg ⊂ dom f together with some assumptions on the continuity of∇f. In [13], Salzo simplified these assumptions and significantly extended it to more general frameworks via many line searches. It is worth noting that the standard assumption domg⊂dom f, used in [12], is not satisfied by several important optimization problems including the Poisson inverse regularized problems with Kullback–Leibler divergence [15,16]. This situa- tion was overcome in [13, Section 4 and 5] with the following assumptions on f and g:
H1. f,g ∈0(H)are bounded from below with int(dom f)∩domg = ∅.
H2. f is differentiable on int(domf)∩domg,∇f is uniformly continuous on any weakly compact subset of int(dom f)∩domg, and∇f is bounded on any sub- level sets ofF.
H3. Forx∈int(dom f)∩domg,{F ≤F(x)} ⊂int(domf)∩domg, and
d({F≤ F(x)};H \int(dom f)) >0. (6) These assumptions hold under some mild conditions; see [13, Proposition 4.2] for details. However, they are not trivial even in finite dimensions. Throughout the paper, we focus on the FBS method with the popular Beck–Teboulle line search [6] that was also studied in [13]. Next, following the techniques used in [12,13], we show that the global convergence of the FBS method holds true under the below simplified standing assumptions that relax some conditions fromH1–H3:
A1. f,g ∈0(H)and int(dom f)∩domg= ∅.
A2. f is differentiable at any point in int(dom f)∩domg,∇fis uniformly continuous on any weakly compact subset of int(dom f)∩domg.
A3. For anyx ∈int(dom f)∩domg,{F ≤ F(x)} ⊂int(domf)∩domg and for any weakly compact subsetX of{F ≤F(x)}we have
d(X;H \int(dom f)) >0. (7)
Remark 3.1 (The standing assumptions in finite dimensions) When dimH <∞,A2 means that f is continuously differentiable on int(domf)∩domg. Moreover, inA3 the distance gap requirement (7) is automatically true asX ∩(H \int(domf))= ∅ andX is compact. On the other hand, the condition (6) is still nontrivial; see the example below.
Clearly, some boundedness assumptions in H1–H2are relaxed in our A1–A2.
Moreover, the positive distance gap requirement (6) in (H3) is slightly stronger than (7) in(A3). But, it should be noted that condition (6) can, in general, be easier to verify
in infinite-dimensional spaces. Even in finite dimensions,(A1–A3)are strictly weaker than (H1–H3). To see this, consider p∈ {1,2},
Cp:= {x=(x1,x2)∈R2:x1x2≥ p,x1>0,x2>0},
and f,g :R2→R∪ {+∞}be defined as f(x)= −logx1−logx2ifx∈R2++and +∞otherwise, andg(x)=δC1(x)whereδC1 is the indicator function of the setC1
defined above. It can be directly verified that the assumptions (A1–A3) are satisfied.
On the other hand,H1fails because f is unbounded on
int(dom f)∩domg=R2++∩C1=C1= {(x1,x2)
∈R2:x1x2≥1,x1>0,x2>0}.
Moreover,(2,1)∈int(dom f)∩domgand sinceF(2,1)=(f+g)(2,1)= −log 2, we get
{x: F(x)≤ −log 2} = {(x1,x2)∈R2:x1x2≥2,x1>0,x2>0} =C2. So, clearly H2 also fails because ∇f(x1,x2) = (−x11,−x12) is unbounded on the sublevel set of F above. Finally, observing that (2k,k) ∈ C2 and (0,k) ∈ R2\int(domf)=R2\R2++, we see that
d
{F≤ −log 2};R2\int(dom f)
=d
C2;R2\R2++
≤ lim
k→∞
(2
k,k)−(0,k) = lim
k→∞
2 k =0.
This shows thatH3fails in this case.
Next, let us recall the proximal operator proxg:H →domggiven by
proxg(z):=(Id +∂g)−1(z) for all z∈H, (8) which is well-known to be a single-valued mapping with full domain. Withα >0, it is easy to check that
z−proxαg(z)
α ∈∂g(proxαg(z)) for all z∈H. (9) LetS∗ be the optimal solution set to problem (4) andx∗be in int(dom f)∩domg due to our assumptionA3. Then,x∗∈ S∗iff
0∈∂(f +g)(x∗)= ∇f(x∗)+∂g(x∗). (10) The following lemma, a consequence of [4, Theorem 23.47] is helpful in our proof of the finite termination of Beck–Teboulle’s line search under our standing assumptions.
Lemma 3.1 Let g∈0(H)and letα >0. Then, for every x∈domg, we have
proxαg(x)→x as α↓0. (11)
Under our standing assumptions, we define theproximal forward–backward oper- ator
J :
int(domf)∩domg
×R++→domg by
J(x, α):=proxαg(x−α∇f(x)) for all x ∈int(dom f)∩domg, α >0. (12) The following result is essentially from [12, Lemma 2.4]. Even if the standing assump- tions are different, the proof is quite similar. A more general variant of this result could be found in [35, Lemma 3].
Lemma 3.2 For any x ∈int(dom f)∩domg, we have α2
α1
x−J(x, α1) ≥ x−J(x, α2) ≥ x−J(x, α1) for all α2≥α1>0. (13) Proof By using (9) and (12) withz=x−α∇f(x), we have
x−α∇f(x)−J(x, α)
α ∈∂g(J(x, α)) (14)
for allα >0. For anyα2≥α1>0, it follows from the monotonicity of∂gand (14) that
0≤
x−α2∇f(x)−J(x, α2)
α2 − x−α1∇f(x)−J(x, α1)
α1 ,J(x, α2)−J(x, α1)
=
x−J(x, α2)
α2 −x−J(x, α1)
α1 , (x−J(x, α1))−(x−J(x, α2))
= −x−J(x, α2)2
α2 −x−J(x, α1)2 α1
+ 1
α2
+ 1 α1
x−J(x, α2),x−J(x, α1)
≤ −x−J(x, α2)2
α2 −x−J(x, α1)2 α1
+ 1
α2+ 1 α1
x−J(x, α2) · x−J(x, α1).
This gives us that
x−J(x, α2) − x−J(x, α1)
·
x−J(x, α2) −α2
α1x−J(x, α1)
≤0.
Since α2
α1 ≥1, we derive (13) and thus complete the proof of the lemma.
Next, let us present Beck–Teboulle’s backtracking line search [6], which is specifi- cally useful for forward–backward methods when the Lipschitz constant of∇f is not known or hard to estimate.
Linesearch BT(Beck–Teboulle’s Line Search) Givenx∈int(dom f)∩domg,σ >0 and 0< θ <1.
Input.Setα=σ andJ(x, α)=proxαg(x−α∇f(x))withx∈domg.
While f(J(x, α)) > f(x)+ ∇f(x),J(x, α)−x + 1
2αx−J(x, α)2,do α=θα.
End While Output.α.
The outputαin this line search will be denoted byL S(x, σ, θ). Let us show the well- definedness and finite termination of this line search under the standing assumptions A1–A3below.
Proposition 3.1 (Finite Termination of Beck–Teboulle’s Line Search) Suppose that assumptionsA1–A2hold. Then, for any x ∈int(dom f)∩domg, we have
(i) The above line search terminates after finitely many iterations with the positive outputα¯ =L S(x, σ, θ).
(ii) x−u2− J(x,α)¯ −u2≥2α[F(J¯ (x,α))¯ −F(u)]for any u∈H. (iii) F(J(x,α))¯ −F(x)≤ − 1
2α¯J(x,α)−x¯ 2≤0. Consequently, byA3, J(x,α)¯ ∈ int(dom f)∩domg.
Proof Take anyx ∈int(dom f)∩domg. Let us justify (i) first. Note that J(x, α)is well-defined for anyα >0 because∇f(x)exists due to assumptionA2. Ifx ∈ S∗, where S∗ is the optimal solution set to problem (4), thenx = J(x, σ)due to (10) and (9). Thus, the line search stops with zero step and gives us the output σ and x = J(x, σ) ∈ int(dom f)∩domg. Ifx ∈/ S∗, suppose by contradiction that the line search does not terminate after finitely many steps. Hence, for allα ∈ P :=
{σ, σθ, σθ2, . . .}it follows that
∇f(x),J(x, α)−x + 1
2αx−J(x, α)2< f(J(x, α))− f(x). (15) Since proxαgis non-expansive, we have
J(x, α)−x ≤ proxαg(x−α∇f(x))−proxαg(x) + proxαg(x)−x
≤α∇f(x) + proxαg(x)−x. (16)
Lemma3.1tells us that
J(x, α)−x →0 as α↓0. (17)
Sincex ∈int(dom f), there exists∈Nsuch thatJ(x, α)∈int(dom f)for all α∈P:= {σ θ, σθ+1, . . .} ⊆P.
Thanks to the convexity of f, we have
f(x)− f(J(x, α))≥ ∇f(J(x, α)),x−J(x, α) for α∈P. (18) This inequality together with (15) implies
1
2αJ(x, α)−x2<∇f
J(x, α)
− ∇f(x),J(x, α)−x
≤∇f
J(x, α)
− ∇f(x)· J(x, α)−x, which yieldsJ(x, α)=xand
0<J(x, α)−x
α <2∇f
J(x, α)
− ∇f(x) for all α∈P. (19) Sincex− J(x, α) → 0 asα → 0 by (17), J(x, α) ∈ int(domf)∩domg for sufficiently smallα >0. Due to the continuity of∇f atx ∈int(dom f)∩domgby assumptionA2, we obtain from (19) that
α→lim0,α∈P
x−J(x, α)
α =0. (20)
Applying (9) withz=x−α∇f(x)gives us thatx−J(x, α)
α −∇f(x)∈∂g(J(x, α)).
Takingα → 0, we have−∇f(x) ∈ ∂g(x)due to the demiclosedness of the subd- ifferentials; see, e.g., [38, Theorem 4.7.1 and Proposition 4.2.1(i)]. It follows that 0 ∈ ∇f(x)+∂g(x), which contradicts the hypothesis thatx ∈/ S∗by (10). Hence, the line search terminates after finitely many steps with the outputα¯ .
To proceed the proof of (ii), note that
f(J(x,α))¯ ≤ f(x)+ ∇f(x),J(x,α)¯ −x + 1
2α¯x−J(x,α)¯ 2. (21) Moreover, by (9) , we have
x−J(x,α)¯
¯
α − ∇f(x)∈∂g(J(x,α)).¯ Pick anyu∈H, we get from the latter that
g(u)−g(J(x,α))¯ ≥
x−J(x,α)¯
¯
α − ∇f(x),u−J(x,α)¯ . (22)
Note also that f(u)−f(x)≥ ∇f(x),u−x.This together with (22) and (21) gives us that
F(u)=(f+g)(u)≥ f(x)+g(J(x,α))¯ +
x−J(x,α)¯
¯
α − ∇f(x),u−J(x,α)¯ + ∇f(x),u−x
= f(x)+g(J(x,α))¯ +1
¯
αx−J(x,α),¯ u−J(x,α) + ∇¯ f(x),J(x,α)¯ −x
≥ f(J(x,α))¯ +g(J(x,α))+¯ 1
¯
αx−J(x,α),¯ u−J(x,α)−¯ 1
2α¯J(x,α)¯ −x2. It follows that
x−J(x,α),¯ J(x,α)¯ −u ≥ ¯α[F(J(x,α))¯ −F(u)] −1
2J(x,α)¯ −x2. Since 2x−J(x,α),¯ J(x,α)¯ −u = x−u2− J(x,α)¯ −x2− J(x,α)¯ −u2, the latter implies that
x−u2− J(x,α)¯ −u2≥2α[F¯ (J(x,α))¯ −F(u)], which clearly ensures (ii).
Finally, (iii) is a direct consequence of (ii) with u = x. It follows that J(x,α)¯ belongs to the sublevel set{F ≤ F(x)}. By assumptionA3,J(x,α)¯ ∈ int(dom f).
The proof is complete.
Now, we recall the forward–backward splitting method with line search proposed by [6] as following.
Forward–Backward Splitting Method with Backtracking Line Search(FBS method)
Step 0.Takex0∈int(domf)∩domg,σ >0 and 0< θ <1.
Step k.Set
xk+1:=J(xk, αk)=proxαkg(xk−αk∇f(xk)) (23) withα−1:=σ and
αk :=L S(xk, αk−1, θ). (24) The following result which is a direct consequence of Proposition3.1plays the central role in our study.
Corollary 3.1 (Well-definedness of the FBS method) Let x0 ∈int(domf)∩domg.
The sequence(xk)k∈Nfrom the FBS method is well-defined, xk ∈ int(domf)∩g, and f is differentiable at any xkfor all k∈N. Moreover, for any x∈H, we have
(i) xk−x2− xk+1−x2≥2αk
F(xk+1)−F(x) .
(i) F(xk+1)−F(xk)≤ − 1
2αkxk+1−xk2.
Proof Thanks to Proposition3.1(iii),xk∈int(dom f)∩domgand f is differentiable at anyxkby assumptionA2. This verifies the well-definedness of(xk)k∈N. Moreover, both (i) and (ii) are consequence of (ii) and (iii) from Proposition3.1by replacing u =x,x=xk,α¯ =αk, andJ(x,α)¯ =J(xk, αk)=xk+1. The following result recovers the global convergence of the FBS method without supposing the Lipschitz continuity on the gradient∇f in [13, Theorem 3.18] and [12, Theorem 4.2] with our simplified assumptions(A1−A3). The proof is similar to that of [12, Theorem 4.2] with some modifications and ideas from [13].
Theorem 3.1 (Global Convergence of the FBS Method) Let(xk)k∈Nbe the sequence generated from the FBS method. The following statements hold:
(i) If S∗= ∅, then(xk)k∈Nweakly converges to a point in S∗. Moreover,
klim→∞F(xk)= min
x∈H F(x). (25)
(ii) If S∗= ∅, then we have
klim→∞xk = +∞ and lim
k→∞F(xk)= inf
x∈H F(x).
Proof Let us justify (i) by supposing thatS∗= ∅. By Corollary3.1(i), for anyx∗∈S∗ we have
xk−x∗2− xk+1−x∗2≥2αk[F(xk+1)−F(x∗)] ≥0. (26) It follows that the sequence(xk)k∈Nis Fejér monotone with respect toS∗. Thus, it is bounded according to [4, Proposition 5.4]. Define M :=sup{xk−x∗: k∈N}<
+∞for somex∗∈ S∗, we get from (26) that
0≤2αk[F(xk+1)−F(x∗)] ≤ xk−x∗2− xk+1−x∗2
=(xk−x∗ + xk+1−x∗)·(xk−x∗ − xk+1−x∗)
≤2Mxk−xk+1. This yields that
0≤ F(xk+1)−F(x∗)≤M xk−xk+1
αk . (27)
Note further from Corollary3.1(ii) that
2σ[F(xk)−F(xk+1)] ≥2αk[F(xk)−F(xk+1)] ≥ xk−xk+12.
As(F(xk))k∈Nis decreasing and bounded below byF(x∗), the latter implies
∞>2σ[F(x0)−F(x∗)] ≥ ∞ k=0
2σ[F(xk)−F(xk+1)] ≥ ∞ k=0
xk−xk+12. (28)
Thus,xk−xk+1 →0 ask→ ∞. Letx¯be a weak accumulation point of(xk)k∈N. Let us find a subsequence(xnk)k∈Nweakly converging tox. As¯ Fis l.s.c. and convex, it is also weakly l.s.c.. Moreover, since(F(xk))k∈Nis decreasing, we haveF(x0)≥F(x).¯ According to assumptionsA2andA3, f is differentiable atx¯∈int(domf)∩domg.
As(αk)k∈Nis decreasing by the construction of the line search in the FBS method, α:= lim
k→∞αk = lim
k→∞αnkexists.
Case 1:α >0. Note from (9) and (23that xnk−αnk∇f(xnk)−xnk+1
αnk ∈∂g(xnk+1), which implies
xnk −xnk+1
αnk + ∇f(xnk+1)− ∇f(xnk)∈ ∇f(xnk+1)+∂g(xnk+1). (29)
Applying Corollary 3.1(ii), we have xnk −xnk+12 αnk
→ 0 as the sequence (F(xk))k∈Nis decreasing and thus converging. Sinceαnk ≥α >0, xnk−xnk+1
αnk
also converges to 0. Moreover, note from (28) thatxnk−xnk+1 →0,∇f(xnk)−
∇f(xnk+1) → 0 due to the uniform continuity of∇f on the weakly compact set x, (x¯ nk)k∈N, (xnk+1)k∈N
of[F ≤F(x0)]as in assumptionA2.
Takingk→ ∞in (29) gives us that 0∈ ∇f(¯x)+∂g(x)¯ due to the demiclosedness of the subdifferentials, which implies x¯ ∈ S∗. Furthermore, since(F(xk))k∈N is decreasing, (25) is a consequence of (27).
Case 2:α=0. Defineαˆnk := αnk
θ > αnk >0 andxˆnk := J xnk,αˆnk
∈domg. It follows from the definition ofLinesearch BTthat
f(xˆnk) > f(xnk)+ ∇f(xnk),xˆnk −xnk + 1 2αˆnk
xnk− ˆxnk2. (30)
Due to Lemma3.2, we have
xnk − ˆxnk = xnk−J(xnk,αˆnk) ≤ αˆnk
αnkxnk−J(xnk, αnk)
= 1
θxnk−xnk+1 →0,
which tells us that(xˆnk)k∈Nweakly converges tox¯ ∈ int(dom f)∩domg. Define A := { ¯x, (xnk)k∈N}. It follows thatA ⊆ [F ≤ F(x0)], which is weakly compact.
By assumptionA3,d(A,H \int(dom f)) >0. Asxnk− ˆxnk →0, we find some K >0 such thatxˆnk ∈int(domf)and thusxˆnk ∈int(dom f)∩domgfor anyk>K. Hence,∇f(xˆnk)is well-defined by assumptionA2. It follows that
f(xnk)≥ f(xˆnk)+ ∇f(xˆnk),xnk− ˆxnk. This together with (30) gives us that
1
2αˆnkxnk − ˆxnk2<∇f(xnk)− ∇f(ˆxnk),xnk − ˆxnk
≤ ∇f(xnk)− ∇f(xˆnk) · xnk− ˆxnk. Hence, we have
∇f(xnk)− ∇f(xˆnk) ≥ 1 2αˆnk
xnk− ˆxnk. (31)
As
¯
x, (xnk)k∈N, (xˆnk)k∈N
is a weakly compact subset of int(dom f)∩domg and ˆxnk−xnk →0, assumptionA2tells us that∇f(xnk)−∇f(ˆxnk) →0 ask→ ∞. We derive from (31) that
klim→∞
1 ˆ
αnkxnk− ˆxnk =0. (32)
Applying (9) withz=xnk− ˆαnk∇f(xnk)gives us that xnk− ˆxnk
ˆ αnk
= ∇f(ˆxnk)+xnk− ˆαnk∇f(xnk)− ˆxnk ˆ
αnk
∈ ∇f(xˆnk)+∂g(ˆxnk).
By lettingk→ ∞, it is similar to the conclusion after (29) that 0∈ ∇f(x)¯ +∂g(¯x) due to the demiclosedness of the subdifferentials, i.e.,x¯ ∈ S∗. To proceed the proof of (25) in this case, we observe from Lemma3.2that
xnk − ˆxnk =xnk −J(xnk,αnk
θ )≥ xnk −J(xnk, αnk) = xnk−xnk+1. This together with (32) yieldxnk−xnk+1
αnk
→0 ask → ∞. Since(F(xk))k∈Nis decreasing, we derive from the latter and (27) that
0= lim
k→∞M xnk−xnk+1
αnk ≥ lim
k→∞F(xnk+1)−F(x∗)= lim
k→∞F(xk)−F(x∗)≥0, which ensures (25).
From the above two cases, we have (25) and the fact that any weak accumulation point of the Fejér sequence(xk)k∈Nbelongs toS∗. The classical Fejér theorem [4,
Theorem 5.5] tells us that(xk)k∈Nweakly converges to some point ofS∗. This verifies (i) of the theorem.
The proof for the second part (ii) follows the same lines in the proof of [12, Theo-
rem 4.2].
The following proposition shows that when f is locally Lipschitz continuous, the step sizeαk is bounded below by a positive number. The second part of this result coincides with [6, Remark 1.2].
Proposition 3.2 (Boundedness from Below for the Step Sizes) Let (xk)k∈N and (αk)k∈Nbe the two sequences generated from the FBS method. Suppose that S∗= ∅ and that the sequence(xk)k∈Nis converging to some x∗ ∈ S∗. If∇f is locally Lip- schitz continuous around x∗ with modulus L, then there exists some K ∈ Nsuch that
αk≥min
αK,θ L
>0 for all k>K. (33) Consequently, for k > K the line search L S(xk, αk−1, θ) needs at most logθ
min
1,αKθL steps.
Furthermore, if∇f is globally Lipschitz continuous onint(domf)∩domg with uniform modulus L thenαk ≥ min{σ,θL}for all k ∈ N. In this case, line search L S(xk, αk−1, θ)needs at mostlogθ
min{1,σθL}
steps for any k.
Proof To justify, suppose thatS∗= ∅, the sequence(xk)k∈Nis converging tox∗∈ S∗, and that∇f is locally Lipschitz continuous aroundx∗with constantL >0. We find someε >0 such that
∇f(x)− ∇f(y) ≤Lx−y for all x,y∈Bε(x∗), (34) whereBε(x∗)is the closed ball inH with centerx∗and radiusε. Since(xk)k∈Nis converging tox∗, there exists someK ∈Nsuch that
xk−x∗ ≤ θε
2+θ < ε for all k>K (35) with 0< θ <1 defined inLinesearch BT. We claim that
αk≥min
αk−1,θ L
for all k>K. (36)
Suppose by contradiction thatαk <min{αk−1,θL}. Then,αk < αk−1, and so, the loop inLinesearch BTat(xk, αk−1)needs more than one iteration. Defineαˆk := αθk >0 andxˆk:= J(xk,αˆk), we have
f(xˆk) > f(xk)+ ∇f(xk),xˆk−xk + 1
2αˆkxk− ˆxk2. (37)
Furthermore, it follows from Lemma3.2that xk− ˆxk = xk−J(xk,αˆk) ≤ αˆk
αkxk−J(xk, αk) = 1
θxk−xk+1. This together with (35) yields
ˆxk−x∗ ≤ ˆxk−xk + xk−x∗
≤ 1
θxk−xk+1 + xk−x∗
≤ 1 θ · 2θε
2+θ + θε
2+θ =ε. (38)
Sincexk,xˆk∈Bε(x∗)by (35) and (38), we get from (34) that f(xˆk)− f(xk)− ∇f(xk),xˆk−xk
= 1
0
∇f(xk+t(xˆk−xk))− ∇f(xk),xˆk−xkdt
≤ 1
0
t L ˆxk−xk2dt= L
2 ˆxk−xk2.
Combining this with (37) yieldsαˆk≥ L1 and thusαk ≥ θL. This is a contradiction.
If there is some H > K withH ∈ Nsuch thatαH > θL, we get from (36) that αk ≥ Lθ for allk ≥ H. Otherwise, αk < θL for any k > K, which implies that αk =αk−1=αKfor allk>K due to (36) and the decreasing property of(αk)k∈N. In both cases, we have (33).
Now suppose that the line searchL S(xk, αk−1, θ)needsd repetitions. It follows from (33) that
αKθd≥αk =αk−1θd ≥min
αK, θ L
, which tells us thatd ≤logθ
min
1,αKθL .
Finally, suppose that ∇f is globally Lipschitz continuous with modulus L on int(dom f)∩domgsubset of int(domf). By using Proposition3.1(iii), we can repeat the above proof without concerningε,K and replace (36) byαk≥min
σ,θL
.
Next, we present the conventional complexity o(k−1) of the FBS method [10, Theorem 3] but under our standing assumptions and that ∇f is locallyLipschitz continuous. The complexity remains valid when replacing the local Lipschitz condition there by the weaker one that the step size{αk}is bounded below by a positive number;
see Proposition3.2. This idea was initiated in [12, Theorem 4.3] in finite dimensions and extended to different kinds of line searches in Hilbert spaces in [13], e.g., [13,
Corollary 3.20]. For the completeness, we provide a short proof2, which could be obtained by following the method of proof as in [13, Theorem 3.18(iii)(d)] as well.
Theorem 3.2 (Sublinear Convergence of the FBS Method) Let (xk)k∈N be the sequence generated in the FBS method. Suppose that S∗ = ∅and that∇f is locally Lipschitz continuous around any point in S∗. Then,
klim→∞k
F(xk)− min
x∈H F(x)
=0. (39)
Proof By Proposition3.2,αk is bounded below by someα > 0. Take anyx∗ ∈ S∗, we obtain from Corollary3.1(i) that
xk−x∗2− xk+1−x∗2≥2α[F(xk+1)−F(x∗)] ≥0. (40) Asxk−x∗2is decreasing,(xk −x∗2)k∈Nis converging. Givenε > 0, there existsK ∈N(large enough) such that|xK+k−x∗2− xK −x∗2| ≤ε, for any k∈N. It follows from (40) that
ε≥ xK−x∗2− xk+K −x∗2=
k+K−1
=K
(x−x∗2− x+1−x∗2)
≥2α
k+K−1
=K
[F(x+1)−F(x∗)].
Since(F(xk))is decreasing, we get that ε≥2αk(F(xK+k)−F(x∗))=2α k
K+k(K+k)(F(xK+k)−F(x∗)), which implies that
0≤lim inf
k→∞ k(F(xk)−F(x∗))≤lim sup
k→∞ k(F(xk)−F(x∗))
=lim sup
k→∞ (K+k)(F(xK+k)−F(x∗))≤ ε 2α.
This verifies (39) and completes the proof of the theorem.
4 Local Linear Convergence of Forward–Backward Splitting Methods In this section, we obtain the local Q-linear convergence for the FBS method under the quadratic growth condition and local Lipschitz continuity of∇f, which is automatic in many problems including Lasso problem and Poisson linear inverse regularized
2 We are grateful to one of the referees whose remarks lead us to this simple proof.