On the Q-linear convergence of forward-backward splitting method. Part II: Quadratic growth conditions and uniqueness of optimal solution to Lasso

(1)

(will be inserted by the editor)

On the Q-linear convergence of forward-backward splitting method. Part II: Quadratic growth conditions and uniqueness of optimal solution to Lasso

J.Y. Bello-Cruz · G. Li · T.T.A. Nghia

the date of receipt and acceptance should be inserted later

Abstract In the previous paper [1], we show that the quadratic growth condition is the key tool in study- ing Q-linear convergence of forward-backward splitting method. In this paper, the property of quadratic growth condition is mainly analyzed via second-order variational analysis in several structured optimization problems that arise in machine learning and signal processing including Poisson linear inverse problem as well as the`₁-regularized optimization problems. Moreover, via this approach, we obtain several full characterizations for the uniqueness of optimal solution to Lasso problem, which covers some recent results in this direction.

This work was partially supported by the National Science Foundation (NSF) Grant DMS - 1816386 and 1816449 J.Y. Bello-Cruz

Department of Mathematical Sciences, Northern Illinois University, DeKalb, IL 60115, USA E-mail: [email protected]

G. Li

Department of Applied Mathematics, University of New South Wales, Sydney 2052, Australia E-mail: [email protected]

T.T.A. Nghia

Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309, USA E-mail: [email protected]

(2)

Keywords:Nonsmooth and convex optimization problems; Forward-backward splitting method; Linear convergence; Uniqueness; Lasso; Quadratic growth condition; Variational Analysis.

Mathematics Subject Classification (2010):65K05; 90C25; 90C30.

1 Introduction

In this paper, we mainly consider an optimization problem of minimizing the sum of two convex functions, one of which is differentiable in its domain and the other is nondifferentiable. Many problems in this format have appeared in different fields of science and engineering including machine learning, compressed sensing, and image processing.

Among many methods solving the aforementioned optimization problems, the forward-backward splitting method (FBS in brief, known also as the proximal gradient method) [2, 3, 6–11] is very pop- ular due to its simplicity and efficiency. It is well-known that this method is globally convergent to an optimal solution with the complexityO(k⁻¹)on the iterative cost values under the assumption that the gradient of the differentiable function is globally Lipschitz continuous. In our previous part [1], we also obtain the global convergence of FBS without the Lipschitz continuity of the gradient and under mild assumptions on the initial data that are weaker than many known ones [13,15,16]. The complexity of cost value is improved too(k⁻¹)when the aforementioned gradient is onlylocallyLipschitz continuous; see also [13,14,16] for further development in this direction. Furthermore, when the cost function satisfies the quadratic growth condition, we show that both FBS iterative sequence and cost sequence are Q-linearly convergent to an optimal solution and optimal value, respectively. Our results extend several works in this direction [19, 21, 23, 24, 26, 27] without borrowing conventional techniques of error bound [25] or Kurdya-Łojasiewicz inequality.

One of the main parts in this sequent paper is to analyze the quadratic growth condition in several structured optimization problems. This allows us to understand more the performance of FBS methods for solving specific optimization problems. In particular, we show that quadratic growth condition is valid

(3)

for the standard Poisson inverse regularized problems with Kullback-Liebler divergence [17, 18], which does not have Lipschitz continuous gradient. Using FBS to solve Poisson inverse regularized problems was raised in [15]. Recently, Salzo [16] proved that FBS method could solve it with possible rateo(k⁻¹).

We advance this direction by indicating that the convergence rate is indeed Q-linear.

Linear convergence of FBS iterative sequence to solve some structured optimization problems was also studied in [3,7,32–34] when the nonsmooth function ispartly smooth relative to a manifoldby using the idea of finite support identification. The latter notion introduced by Lewis [35] allows Liang, Fadili, and Peyr´e [32, 33] to cover many important problems such as the total variation semi-norm, the`₁-norm, the`_∞-norm, and the nuclear norm problems. In their paper, a second-order condition was introduced to guarantee the Q-local linear convergence of FBS sequence under the non-degeneracy assumption [35].

When considering the`₁-regularized problem, we are able to avoid the non-degeneracy assumption. This allows us to improve the well-known work of Hale, Yin, and Zhang [3] in two aspects: (a) We completely drop the aforementioned non-degeneracy assumption (b) Our second-order condition is strictly weaker than the one in [3, Theorem 4.10]. The wider view is that when considering particular optimization problems listed in the spirit of [32, 33], the assumption of non-degeneracy may be not necessary. Furthermore, we revisit theiterative shrinkage thresholding algorithm(ISTA) [2, 8], which is indeed FBS for solving Lasso problem [4]. It is well-known that the complexity of this algorithm is O(k⁻¹); however, recent works [32, 34] indicates the local linear convergence of ISTA. The stronger conclusion in this direction is obtained lately by Bolte, Nguyen, Peypouquet, and Suter [21, 26] that: the iterative sequence of ISTA isR-linearly convergent and its corresponding cost sequence is globallyQ-linearly convergent, but the rate may depend on the initial point. Inspired by these achievements, we provide two new information:

(c) The iterative sequence from ISTA are indeedglobally Q-linearly convergent (d) They are eventually Q-linearly convergent to an optimal solution with a uniform rate that does not depend on the initial point.

Finally, we study the uniqueness of optimal solution to Lasso problem as one of the main applications from our approach of using second-order variational analysis. This property of optimal solution to

(4)

our problem has been investigated vastly in the literature with immediate implementations to recover- ing sparse signals in compressed sensing; see, e.g., [5, 40–44] and the references therein. It is also used in [7, 34] to establish the linear convergence of ISTA. To the best of our knowledge, Fuchs [40] initializes this direction by introducing a simple sufficient condition for this property, which has been extended in other cited papers. Then Tibshirani in [41] shows that a sufficient condition closely related to Fuchs’ is also necessary for almost everywhere. The first full characterization for this property has been obtained recently in [43, 44] by using results of strong duality in linear programming. This characterization, which is based on an existence of a vector satisfying a system of linear equations and inequalities, allows [43,44]

to recover the aforementioned sufficient conditions and provide some situations in which these conditions turn necessary. As a direct application of our new approach via second-order variational analysis, we also derive several new full characterizations. Our conditions in terms ofpositively linear independenceand Slater typeare well-recognized to be verifiable.

The outline of our paper is as follows. Section 2 briefly presents some second-order characterization for quadratic growth condition in terms ofsubgradient graphical derivative [45] and recall some convergence analysis from our part I [1]. Section 3 devotes to the study of quadratic growth condition in some structured optimization problems involving Poisson inverse regularized, partial smoothness,`₁- regularized, and`₁-regularized least square optimization problems. In Section 4, we obtain several new full characterizations to the uniqueness of optimal solution to Lasso problem. The final Section 5 gives some conclusions and potential future works in this direction.

2 Preliminary results on metric subregularity of the subdifferential and quadratic growth condition

Throughout the paper,Rⁿis the usual Euclidean space with dimensionnwherek · kandh·,·idenote the corresponding Euclidean norm and inner product inRⁿ. We useΓ0(Rⁿ)to denote the set of proper, lower

(5)

semicontinuous, and convex functions onRⁿ. Leth∈Γ0(Rⁿ), we write domh:={x∈Rⁿ|h(x)<+∞}.

The subdifferential ofhat ¯x∈domhis defined by

∂h(x)¯ :={v∈Rⁿ| hv,x−xi ≤¯ h(x)−h(x),¯ x∈Rⁿ}. (1)

We sayhsatisfies thequadratic growth conditionat ¯xwith modulusκ>0 if there existε>0 such that

h(x)≥h(x) +¯ κ

2d²(x;(∂h)⁻¹(0)) for all x∈Bε(x).¯ (2) Moreover, if in additionally(∂h)⁻¹(0) ={x},¯ his said to satisfy thestrongquadratic growth condition at

¯

xwith modulusκ.

Some relationship between the quadratic growth condition and the so-called metric subregularity of the subdifferential could be found in [26, 29–31, 38] even for the case of nonconvex functions. The quadratic growth condition (2) is also called quadratic functional growthproperty in [23] when h is continuously differentiable over a closed convex set. In [21, 22],his said to be 2-conditioned onBε(x)¯ if it satisfies the quadratic growth condition (2).

The following proposition, a slight improvement of [38, Corollary 3.7] provides an useful characterization for strong quadratic growth condition via the subgradient graphical derivative [45, Chapter 13].

Proposition 2.1 (Characterization of strong quadratic growth condition)Let h∈Γ₀(Rⁿ)andx be an¯ optimal solution, i.e.,0∈∂h(x). The following are equivalent:¯

(i) h satisfies the strong quadratic growth condition atx.¯ (ii) D(∂h)(¯x|0)is positive-definite in the sense that

hv,ui>0 for all v∈D(∂h)(¯x|0)(u),u∈Rⁿ,u6=0, (3)

where D(∂h)(x|0)¯ :Rⁿ→→Rⁿis the subgradient graphical derivative defined by

D(∂h)(x|0)(u)¯ :={v∈Rⁿ| ∃(u_n,vn)→(u,v),t_n↓0 :tnvn∈∂h(x¯+t_nun)}.

(6)

Moreover, if (ii) is satisfied then

`:=inf hv,ui

kuk²|v∈D(∂h)(x|0)(u),¯ u∈Rⁿ

>0 (4)

with convention0

0=∞and h satisfies the strong quadratic growth condition atx with any modulus¯ κ< `.

Proof The implication [(i)⇒(ii)] follows from [38, Theorem 3.6 and Corollary 3.7]. If (ii) is satisfied, we obtain from (3) thatkvk ≥`kuk. Combining [39, Theorem 4C.1] and [31, Corollary 3.3] tells us that hsatisfies the strong quadratic growth condition at ¯xwith any modulusκ< `. The proof is complete. ut

Next let us recall here some main results from our part I [1] regarding the convergence of forward- backward splitting method (FBS) for solving the following optimization problem:

x∈min_Rⁿ F(x):=f(x) +g(x), (5)

where f,g:Rⁿ→R∪ {∞}are proper, lower semi-continuous, and convex functions. The standing assumptions on the initial data for (5) used throughout the paper:

A1. f,g∈Γ0(Rⁿ)and int(domf)∩domg6=/0.

A2. f is continuously differentiable at any point in int(domf)∩domg

A3. For anyx∈int(domf)∩domg, the sublevel set{F≤F(x)}is contained in int(domf)∩domg.

The forward-backward splitting methods for solving (5) is described by

x^k+1=prox_α

kg(x^k−αk∇f(x^k)):= (Id+αk∂g)⁻¹(x^k−αk∇f(x^k)) (6)

with stepsizeα_k>0 determined from the Beck–Teboulle’s line search as follows:

(7)

Linesearch BT(Beck–Teboulle’s line search) Givenσ>0 andθ∈(0,1).

Input.Setαk=αk−1andJ(x^k,αk) =prox_α

kg(x^k−αk∇f(x^k)).

While f(J(x^k,α_k))> f(x^k) +h∇f(x^k),J(x^k,α_k)−x^ki+ 1

2α_kkx^k−J(x^k,α_k)k²,do α_k=θ α_k.

End While Output.α_k

withα−1:=σ andx⁰∈int(domf)∩domg.

In [1, Proposition 3.1 and Corollary 3.1], we show that the linesearch above terminates after finite steps, the FBS sequence(x^k)_k∈_N⊂int(domf)∩domgare well-defined, and thusf is differentiable atx^k by assumptionA2. The global convergence [1, Theorem 3.1] is recalled here.

Theorem 2.1 (Global convergence of FBS method)Let(x^k)_k∈_Nbe the sequence generated from FBS method. Suppose that the solution set is not empty. Then(x^k)_k∈_Nconverges to an optimal solution point.

Moreover,(F(x^k))_k∈_Nalso converges to the optimal value.

When the cost functionFsatisfies the quadratic growth condition and∇f is locally Lipschitz continuous, our [1, Theorem 4.1] shows that both iterative and cost sequences of FBS are Q-linearly convergent.

Theorem 2.2 (Q-linear convergence under quadratic growth condition)Let(x^k)k∈_Nbe the sequences generated from FBS method. Suppose that S^∗6=/0and let x^∗∈S^∗be the convergent point of(x^k)k∈_Nas in Theorem 2.1. Suppose further that∇f is locally Lipschitz continuous around x^∗with constant L>0. If F satisfies the quadratic growth condition at x^∗with modulusκ>0, there exists K∈Nsuch that

kx^k+1−x^∗k ≤ 1 q

1+^{α κ}₄

kx^k−x^∗k (7)

|F(x^k+1)−F(x^∗)| ≤

√1+α κ+1 2√

1+α κ |F(x^k)−F(x^∗)| (8)

(8)

for any k>K, whereα:=min αK,^θ_L .

If, in addition,∇f is globally Lipschitz continuous onint(domf)∩domg with constant L>0,α could be chosen asmin

σ,^θ_L .

Under the strong quadratic growth condition, a shaper rate is obtained in [1, Corollary 4.1].

Corollary 2.1 (Sharper Q-linear convergence rate under strong quadratic growth condition)Let (x^k)k∈_Nbe the sequence generated from FBS method. Suppose that the solution set S^∗is not empty and let x^∗∈S^∗ be the convergent point of(x^k)k∈_N as in Theorem 2.1. Suppose further that ∇f is locally Lipschitz continuous around x^∗with constant L>0. If F satisfies the strong quadratic growth condition at x^∗with modulusκ>0, then there exists some K∈Nsuch that for any k>K we have

kx^k+1−x^∗k ≤ 1

√1+α κ

kx^k−x^∗k with α:=min α_K,θ

L .

Additionally, if ∇f is globally Lipschitz continuous onint(domf)∩domg with constant L>0,α above could be chosen asmin

σ,^θ_L .

3 Quadratic growth conditions and linear convergence of forward-backward splitting method in some structured optimization problems

In this section, we mainly show that quadratic growth condition is automatic or fulfilled under mild assumptions in several important classes of convex optimization.

3.1 Poisson linear inverse problem

This subsection devotes to the study of theeventuallylinear convergence of FBS when solving the following standard Poisson regularized problem [17, 18]

x∈min_Rⁿ₊

m

∑

i=1

b_ilog b_i

(Ax)_i+ (Ax)_i−b_i, (9)

(9)

whereA∈R^m×n+ is anm×nmatrix with nonnegative entries and nontrivial rows, andb∈R^m₊₊is a positive vector. This problem is usually used to recover a signalx∈Rⁿ+ from the measurementbcorrupted by Poisson noise satisfyingAx'b. The problem (9) could be written in term of (5) in which

f(x):=h(Ax), g(x) =δ_Rⁿ₊(x), and F₁(x):=h(Ax) +g(x), (10) wherehis the Kullback-Leibler divergence defined by

h(y) =











m

∑

i=1

b_ilogb_i

y_i+y_i−b_i if y∈R^m++, +∞ if y∈R^m+\R^m++.

(11)

Note from (10) and (11) that domf =A⁻¹(R^m++), which is an open set. Moreover, sinceA∈R^m×n+ , we have domf ∩domg=A⁻¹(R^m++)∩Rⁿ+6= /0 and f is continuously differentiable at any point on domf∩domg. The standing assumptionsA1andA2are satisfied for Problem (9). Moreover, since the functionF1is bounded below and coercive, the optimal solution set to problem (9) is always nonempty.

It is worth noting further that∇f is locally Lipschitz continuous at any point int(domf)∩domgbut not globally Lipschitz continuous on int(domf)∩domg. Our [1, Theorem 3.2] is applicable to solving (9) with global convergence rateo(¹_k). In the recent work [15], a new algorithm rather close to FBS was designed with applications to solving (9). However, the theory developed in [15] could not guarantee the global convergence of their optimal sequence(x^k)k∈_N when solving (9), since one of their assumptions on the closedness of the domain of their auxiliary Legendre function in [15, Theorem 2] is not satisfied.

Our intent here is to reveal the Q-linear convergence of our method when solving (9) in the sense of Theorem 2.2. In order to do so, we need to verify the quadratic growth condition forF1at any optimal minimizer for 0. Note further that the Kullback-Leibler divergencehis not strongly convex and∇f is not globally Lipschitz continuous; hence, standing assumptions in [19] are not satisfied. Proving the quadratic growth condition forF₁at an optimal solution via the approach of [19] needs to be proceeded with caution.

Lemma 3.1 Letx be an optimal solution to problem¯ (9). Then for any R>0, we have F₁(x)−F₁(x)¯ ≥ν

2d²(x;S^∗) for all x∈BR(x)¯ (12)

(10)

with some constantν>0.

Proof Pick anyR>0 andx∈BR(x). We only need to prove (9) for the case that¯ x∈domF₁∩BR(x),¯ i.e.,x∈A⁻¹(Rⁿ++)∩Rⁿ+∩BR(¯x). Note that

∇f(x) =

m i=1

∑

1− bi

ha_i,xi

ai and h∇²f(x)d,di=

m i=1

∑

bi

ha_i,di²

ha_i,xi² for all d∈Rⁿ,

wherea_iis the i-th row ofA. Define ¯y:=Ax, for any¯ x,u∈BR(x)∩¯ domf we have[x,u]⊂BR(x)∩¯ domf and obtain from the mean-value theorem that

f(x)−f(u)− h∇f(u),x−ui=1 2

Z 1 0

h∇²f(u+t(x−u))x−u,x−uidt

=1 2

Z 1 0

m

∑

i=1

b_i ha_i,x−ui² ha_i,u+t(x−u)i²dt

≥1 2

Z 1 0

m i=1

∑

b_i ha_i,x−ui²

[|ha_i,xi|¯ +ka_ik(ku−xk¯ +tkx−uk)]²dt

≥1 2

m i=1

∑

b_i

[|ha_i,xi|¯ +3ka_ikR]²ha_i,x−ui². Similarly, we have

f(u)−f(x)− h∇f(x),u−xi ≥1 2

m

∑

i=1

b_i

[|ha_i,xi|¯ +3ka_ikR]²ha_i,u−xi² for x,u∈BR(¯x)∩domf. (13) Adding the above two inequalities gives us that

h∇f(x)−∇f(u),x−ui ≥

m i=1

∑

b_i

[|ha_i,xi|¯ +3ka_ikR]²ha_i,x−ui² for all x,u∈BR(x)¯ ∩domf. (14) We claim that the optimal solution setS^∗to problem (9) satisfies that

S^∗=A⁻¹(¯y)∩(∂g)⁻¹(−∇f(¯x)) with y¯=Ax.¯ (15) Pick another optimal solution ¯u∈S^∗, we have ¯u_t :=x¯+t(x¯−u)¯ ∈S^∗⊂domf for any t∈[0,1] due to the convexity ofS^∗. By choosingt sufficiently small, we have ¯u_t∈BR(x)¯ ∩domf. Note further that

−∇f(u¯_t)∈∂g(u¯_t)and−∇f(x)¯ ∈∂g(x). Since¯ ∂gis a monotone operator, we obtain that 0≥ h∇f(x)¯ −∇f(u¯t),x¯−u¯ti.

(11)

This together with (14) tells us thatha_i,x−¯ u¯_ti=0 for alli=1, . . . ,m. HenceAx¯=Au¯=y¯for any ¯u∈S^∗, which also implies that

∇f(u) =¯ A^T∇h(Au) =¯ A^T∇h(Ax) =¯ ∇f(x).¯ (16) This verifies the inclusion “⊂” in (15). The opposite inclusion is trivial. Indeed, take anyusatisfying that Au=y¯and−∇f(¯x)∈∂g(u), similarly to (16) we have−∇f(u) =−∇f(x)¯ ∈∂g(u). This shows that 0∈∇f(u) +∂g(u), i.e.,u∈S^∗. The proof for equality (15) is completed.

Note from (15) that the optimal solution setS^∗is a polyhedral with the following format S^∗={u∈Rⁿ|Au=y¯=Ax,h∇¯ f(x),ui¯ =0,u∈Rⁿ+}

due to the fact that(∂g)⁻¹(−∇f(x)) =¯ {u∈Rⁿ+| h∇f(x),¯ ui=0=h∇f(x),¯ xi}.¯ Thanks to Hoffman’s lemma, there exists a constantγ>0 such that

d(x;S^∗)≤γ(kAx−Axk¯ +|h∇f(x),¯ x−xi|)¯ for all x∈Rⁿ+. (17) Fix anyx∈BR(x)¯ ∩Rⁿ+, (13) tells us that

f(x)−f(x)¯ − h∇f(x),x¯ −xi ≥¯ 1 2 min

1≤i≤m

h b_i

[|ha_i,xi|¯ +3ka_ikR]²

ikAx−Axk¯ ². (18)

Since−∇f(x)¯ ∈∂g(¯x), we haveh∇f(x),¯ x−xi ≥¯ 0. This together with (18) implies that F₁(x)−F₁(x)¯ ≥1

2 min

1≤i≤m

h b_i

ikAx−Axk¯ ²+h∇f(¯x),x−xi¯

≥1 2 min

1≤i≤m

h b_i

ikAx−Axk¯ ²+ 1

(k∇f(x)k¯ +1)kx−xk¯ h∇f(¯x),x−xi¯ ²

≥minn1 2 min

1≤i≤m

h b_i

[|ha_i,xi|¯ +3ka_ikR]² i

, 1

(k∇f(x)k¯ +1)R

o[kAx−Axk¯ ²+h∇f(x),x¯ −xi¯²]

≥1 2minn1

2 min

1≤i≤m

h b_i

, 1

(k∇f(x)k¯ +1)R o

kAx−Axk¯ +|h∇f(x),¯ x−xi|¯ 2

≥ 1

2γ²minn1 2 min

1≤i≤m

h b_i

, 1

(k∇f(¯x)k+1)R o

d²(x;S^∗),

where the fourth inequality follows from the elementary inequality that ^(a+b)₂ ² ≤a²+b²witha,b≥0, and the last inequality is from (17). This clearly ensures (12). ut

(12)

When applying FBS to solving problem (9), we have x^k+1=PRⁿ+ x^k−α_k

m i=1

∑

1− bi

ha_i,x^ki a_i

!

with x⁰∈A⁻¹(Rⁿ₊₊)∩Rⁿ₊, (19) whereP_Rⁿ₊(·)is the projection mapping toRⁿ+.

Corollary 3.1 (Q-linear convergence of method (19))Let(x^k)_k∈_Nbe the sequence generated from(19) with x⁰∈A⁻¹(Rⁿ+)∩Rⁿ+for solving the Poisson regularized problem(9). Then the sequences(x^k)k∈_Nand (F₁(x^k))k∈_Nare Q-linearly convergent to an optimal solution and the optimal value to(9)respectively.

Proof Since both functions f andg in problem (9) satisfy our standing assumptionsA1 andA2, and problem (9) always has optimal solutions, the sequence(x^k)_k∈_N converges to an optimal solution ¯xto problem (9) by Theorem 2.1. Since∇f is locally Lipschitz continuous around ¯x, the combination of Theorem 2.2 and Lemma 3.1 tells us that(x^k)k∈_Nis Q-linearly convergent to ¯x. ut By using this approach, it is similar to show that quadratic growth condition in Lemma 3.1 is also valid for the Poisson inverse problem with sparse regularization [15]:

x∈min_Rⁿ₊

m i=1

∑

b_ilog bi

(Ax)_i+ (Ax)_i−b_i+µkxk1, (20) whereµ>0 is the penalty parameter. In deed, whenk(x) =kxk₁, the FBS method for solving (20) is practical by modifying the functionf(x)in (10) toh(Ax) +he,xiwithe= (1,1, . . . ,1)∈Rⁿ. This together with Corollary 3.1 clearly shows that FBS (9) solves (20) linearly.

3.2`₁-regularized optimization problems

In this section we consider the`₁-regularized optimization problems

x∈min_Rⁿ F₂(x):=f(x) +µkxk₁ (21) In order to use Proposition 2.1 for characterizing the strong quadratic growth condition forF2, we need the following calculation of subgradient graphical derivative of∂ µk · k₁.

(13)

Proposition 3.1 (Subgradient graphical derivative of∂ µk · k₁)Suppose thats¯∈∂ µkx^∗k₁. Define I:=

{j∈ {1, . . . ,n} | |s¯_j|=µ}, J:={j∈I|x^∗_j6=0}, K:={j∈I|x^∗_j=0}, and H(x^∗):={u∈Rⁿ|u_j=0,j∈/ I and u_js¯_j≥0,j∈K}. Then D∂ µk · k₁(x^∗|s)(u)¯ is nonempty if and only if u∈H(x^∗). Furthermore, we have

D∂(µk · k₁)(x^∗|s)(u) =¯









 v∈Rⁿ

vj=0,j∈J

u_jv_j=0,s¯_jv_j≤0,j∈K











for all u∈H(x^∗). (22)

Proof For anyx∈Rⁿ, note that

∂ µkxk₁=









 s∈Rⁿ

s_j=µsgn(x_j)if x_j6=0 sj∈[−µ,µ] if xj=0











, (23)

where sgn :R→ {−1,1}is thesign function. Take anyv∈D∂k · k₁(x^∗|s)(u), there exists sequence¯ t^k↓0 and(u^k,v^k)→(u,v)such that (x^∗,s) +¯ t^k(u^k,v^k)∈gph∂ µk · k₁. Let us consider three partitions of j described below:

Partition 1.1: j∈/ I, i.e.,|s¯_j|<µ. It follows from (23) that x^∗_j =0. For sufficiently large k, we have

|(s¯+t^kv^k)_j|<µand thus|(x^∗+t^ku^k)_j|=0 by (23) again. Henceu^k_j=0, which implies thatu_j=0 for all j∈/I.

Partition 1.2: j∈J, i.e.,|s¯_j|=µandx^∗_j6=0. Whenkis sufficiently large, we have(x^∗+t^ku^k)_j6=0 and derive from (23) that

(¯s+t^kv^k)_j=µsgn(x^∗+t^ku^k)_j=µsgnx^∗_j=s¯_j, which implies thatv_j=0 for all j∈J.

Partition 1.3: j∈K, i.e.,|s¯j|=µandx^∗_j=0. If there is a subsequence of(x^∗,s)¯ _j+t^k(u^k,v^k)_j (without relabeling) such that|(¯s+t^kv^k)_j|<µ=|s¯_j|, we have ¯s_jv^k_j<0 and(x^∗+t^ku^k)_j=0 by (23). It follows that u^k_j =0. Lettingk→∞, we have u_j=0 and ¯s_jv_j ≤0. Otherwise, we find someL>0 such that

|(s¯+t^kv^k)_j|=µ=|s¯_j|for allk>L, which yieldsv^k_j=0. Takingk→∞gives us thatv_j=0. Furthermore, by (23) again, we have

s¯j= (s¯+t^kv^k)_j=µsgn(x^∗+t^ku^k)_j=µsgn(u^k_j) or 0= (x^∗+t^ku^k)_j=t^ku^k_j, i.e.,u^k_j=0,

(14)

which imply that ¯s_ju_j≥0 after passing the limitk→∞.

Combining the conclusions in three cases above gives us thatu∈H(x^∗)and also verifies the inclusion

“⊂” in (22). To justify the converse inclusion “⊃”, takeu∈H(x^∗)and anyv∈Rⁿwithv_j=0 for j∈J andu_jv_j=0,s¯_jv_j≤0 for j∈K. For anyt^k↓0, we prove that(x^∗,s) +¯ t^k(u,v)∈gph∂ µk · k₁and thus verify thatv∈D∂ µk · k₁(x^∗|s)(u). For any¯ t∈R, define the set-valued mapping:

SGN(t):=∂|t|=











sgn(t) if t6=0 [−1,1]if t=0.

Similarly to the proof of “⊂” inclusion, we consider three partitions of jas follows:

Partition 2.1: j∈/ I, i.e., |s¯_j|<µ. Sinceu∈H(x^∗), we have u_j =0. Note also that x^∗_j =0. Hence we get(x^∗+t^ku)_j=0 and(s¯+t^kv)_j∈[−µ,µ]whenkis sufficiently large, which means(¯s+t^kv)_j∈ µSGN(x^∗+t^ku)_j.

Partition 2.2:j∈J, i.e.,|s¯_j|=µandx^∗_j6=0. Sincev_j=0, we have

sgn(s¯+t^kv)_j=sgn ¯s_j=sgn(x^∗_j) =sgn(x^∗+t^ku)_j and(x^∗+t^ku)_j6=0 whenkis large. It follows that(s¯+t^kv)_j∈µSGN(x^∗+t^ku)_j.

Partition 2.3:j∈K, i.e.,|s¯_j|=µandx^∗_j=0. Ifu_j=0, we have(x^∗+t^ku)_j=0 and|(s+t¯ ^kv)_j| ≤ |s¯_j| ≤µ for sufficiently largek, since ¯sjvj≤0. Ifuj6=0, we havevj=0 and

(s¯+t^kv)_j=s¯_j=sgn(u_j) =sgn(x^∗+t^ku)_j

whenkis large, sinceu_js¯_j≥0. In both cases, we have(¯s+t^kv)_j∈µSGN(x^∗+t^ku)_j.

From those cases, we always have(x^∗,s) +t¯ ^k(u,v)∈gph∂ µk · k₁and thusv∈D∂ µk · k₁(x^∗|s)(u).¯ ut As a consequence, we establish a characterization of strong quadratic growth condition forF2. Theorem 3.1 (Characterization of strong quadratic growth condition forF₂)Let x^∗be an optimal solution to problem(21). Suppose that∇f is differentiable at x^∗. DefineE :=

j∈ {1, . . . ,n} | |(∇f(x^∗))_j|= µ , K:={j∈E|x^∗_j=0},U :={u∈R^E|u_j(∇f(x^∗))_j≤0,j∈K}, andH_E(x^∗):= [∇²f(x^∗)_i,j]_i,j∈_E. Then the following statements are equivalent:

(15)

(i) F₂satisfies the strong quadratic growth condition at x^∗. (ii) HE(x^∗)is positive definite overU in the sense that

hHE(x^∗)u,ui>0 for all u∈U \ {0}. (24)

(iii) HE(x^∗)is nonsingular overU in the sense that

kerHE(x^∗)∩U ={0}. (25)

Moreover, if (24)is satisfied then F₂satisfies the strong quadratic growth condition with any positive modulusκ< `with

`:=min

hHE(x^∗)u,ui kuk²

u∈U

>0 (26)

with the convention ⁰₀=∞.

Proof First let us verify the equivalence between (i) and (ii) by using Proposition 2.1. Indeed, for any v∈D(∂F₃)(x^∗|0)(u)we have get from the sum rule [39, Proposition 4A.2] that

v−∇²f(x^∗)u∈D∂ µk · k₁(x^∗| −∇f(x^∗))(u).

DefineV :={u∈Rⁿ|u_j=0,j∈/E,u_j(∇f(x^∗))_j≤0,j∈K}. Thanks to Proposition 3.1, we have hv−∇²f(x^∗)u,ui=0 for all u∈V. (27)

This tells us that (24) is the same with (3) whenh=F₂. By Proposition 2.1, (i) and (ii) are equivalent.

Moreover,F2satisfies the strong quadratic growth condition with any positive modulusκ< `.

Finally, the equivalence between (ii) and (iii) is trivial due to the fact thatf is convex and thusHE(x^∗)

is positive semi-definite. ut

Corollary 3.2 (Linear convergence of FBS method for`₁-regularized problems)Let(x^k)_k∈_Nbe the sequences generated from FBS method for problem(21). Suppose that the solution set S^∗is not empty, (x^k)_k∈_Nis converging to some x^∗∈S^∗, and that f isC²around x^∗. If condition(24)holds, then(x^k)_k∈_N

(16)

and(F₂(x^k))_k∈_Nare Q-linearly convergent to x^∗and F₂(x^∗)respectively with rates determined inCorol- lary 2.1, whereκis any positive number smaller than`in(26).

Proof Since f isC²aroundx^∗,∇f is locally Lipschitz continuous aroundx^∗. The result follows from

Corollary 2.1 and Theorem 3.1. ut

Remark 3.1 It is worth noting that condition (25) is strictly weaker than the assumption used in [3] that H_E has full rank to obtain the linear convergence of FBS for (21). Indeed, let us take into account the case n=2,µ=1, andf(x₁,x2) =¹₂(x₁+x₂)²+x1+x₂. Note thatx^∗= (0,0)is an optimal solution to problem (21). Moreover, direct computation gives us that∇f(x^∗) = (1,1),E ={1,2}, andH_E(x^∗) =





 1 1 1 1





 . It is clear thatH_E(x^∗)does not have full rank, but condition (24) and its equivalence (25) hold.

3.3 Global Q-linear convergence of ISTA on Lasso problem

In this section we study the linear convergence of ISTA for Lasso problem

x∈min_Rⁿ F₃(x):=1

2kAx−bk²+µkxk₁, (28)

whereAis anm×nreal matrix andbis a vector inR^m.

The following lemma taken from [26, Lemma 10] plays an important role in our proof.

Lemma 3.2 (Global error bound)Fix any R>^kbk_2µ². Suppose that x^∗is an optimal solution to problem (28). Then we have

F₃(x)−F₃(x^∗)≥γR

2 d²(x;S^∗) for all kxk₁≤R, where

γ_R⁻¹:=ν² 1+

√5

2 µR+ (RkAk+kbk)(4RkAk+kbk

!−1

(29) whileνis the Hoffman constant defined in[26, Definition 1]only depending on the initial data A,b,µ.

(17)

Global R-linear convergence of(x^k)_k∈_N from ISTA and Q-linear convergence of (F₃(x^k))_k∈_N for solving Lasso problem were obtained in [21, Theorem 4.2 and Remark 4.3] and also [22, Theorem 4.8].

We add another feature: the iterative sequence(x^k)k∈_Nis also globally Q-linearly convergent.

Theorem 3.2 (Global Q-linear convergence of ISTA)Let(x^k)_k∈_Nbe the sequence generated by ISTA for problem(28)that converges to an optimal solution x^∗∈S^∗. Then(x^k)_k∈_Nand(F₃(x^k))_k∈_Nare globally Q-linearly convergent to x^∗and F₃(x^∗)respectively:

kx^k+1−x^∗k ≤ 1 q

1+^{α γ}₄^R

kx^k−x^∗k (30)

|F₃(x^k+1)−F3(x^∗)| ≤ 2√ 1+α γ_R

√1+α γR+1|F₃(x^k)−F₃(x^∗)| (31) for all k∈N, where R is any number bigger thankx⁰k+^kbk²

µ and γR is given as in (29)while α:=

minn σ, ^θ

λmax(A^TA)

o .

Proof Note that Lasso always has optimal solutions. Withx^∗∈S^∗, we have F₃(0) =1

2kbk²≥F₃(x^∗)≥µkx^∗k₁,

which implies thatkx^∗k ≤ kx^∗k₁≤_2µ¹ kbk². It follows from [1, Corollary 3.1] that kx^kk ≤ kx^k−x^∗k+kx^∗k ≤ kx⁰−x^∗k+kx^∗k ≤ kx⁰k+2kx^∗k ≤ kx⁰k+kbk²

µ <R for allk∈N. Thanks to Lemma 3.2, [1, Corollary 3.1], and [1, Proposition 3.2], we have

kx^k−x^∗k²− kx^k+1−x^∗k²≥α γRd²(x^k+1;S^∗) (32) withα=¹₂minn

σ, ^θ

λ_max(A^TA)

o

and the note thatλ_max(A^TA)is the global Lipschitz constant of the gradient of ¹₂kAx−bk². The proof of (30) and (31) are quite similar to the one of (7) and (8) in Theorem 2.2;

see [1, Theorem 4.1] for further details. ut

Observe further that the linear rates in Theorem 3.2 depends on the initial pointx⁰; see also [22, The- orem 4.8]. Next we show that the local linear rates around optimal solutions are uniform and independent of the choice ofx⁰.

(18)

Corollary 3.3 (Local Q-linear convergence of ISTA with uniform rate)Let(x^k)_k∈_Nbe the sequence generated by ISTA for problem(28)that converges to an optimal solution x^∗∈S^∗. Then (30)and (31) are satisfied when k is sufficiently large, whereα=minn

σ, ^θ

λmax(A^TA)

o

and R is any number bigger than

kbk² 2µ .

Proof Note from the proof of Theorem 3.2 thatkx^∗k ≤^kbk_2µ² <R. Sincex^kis converging tox^∗∈S^∗, there existsK∈Nsuch thatkx^kk<Rfor anyk>K. By using Lemma 3.2 and also [1, Corollary 3.1], we also obtain (32) for allk>K. Following the same arguments as in Theorem 3.2 justifies the corollary. ut

4 Uniqueness of optimal solution to`₁-regularized least square optimization problems

As discussed in Section 1, the linear convergence of ISTA for Lasso was sometimes obtained by imposing an additional assumption that Lasso has a unique optimal solutionx^∗; see, e.g., [34]. SinceF₃satisfies the quadratic growth condition atx^∗(3.2), the uniqueness ofx^∗is equivalent to the strong quadratic growth condition ofF3atx^∗. This observation together with Theorem 3.1 allows us to characterize the uniqueness of optimal solution to Lasso in the next result. A different characterization for this property could be found in [43, Theorem 2.1]. Suppose thatx^∗is an optimal solution, which means−A^T(Ax^∗−b)∈µ ∂kx^∗k₁. In the spirit of Proposition 3.1 with f(x) =¹₂kAx−bk², define

E :=

j∈ {1, . . . ,n}

|(A^T(Ax^∗−b))_j|=µ ,K:={j∈E|x^∗_j=0},J:=E\K. (33) Since−A^T(Ax^∗−b)∈∂ µkx^∗k₁, if x^∗_j 6=0 then(A^T(Ax^∗−b))_j=−µsign(x^∗_j). This tells us that J= {j∈ {1, . . . ,n}|x^∗_j 6=0}:=supp(x^∗). Furthermore, given an index setI⊂ {1, . . . ,n}, we denoteA_I by the submatrix ofAformed by its columnsA_i,i∈Iandx_Iby the subvector ofx∈Rⁿformed byx_i,i∈I.

For anyx∈Rⁿ, we also define sign(x):= (sign(x₁), . . . ,sign(x_n))^T and Diag(x)by the square diagonal matrix with the main entriesx1,x2, . . . ,xn.

Theorem 4.1 (Uniqueness of optimal solution to Lasso problem) Let x^∗ be an optimal solution to problem(28). The following statements are equivalent:

(19)

(i) x^∗is the unique optimal solution to Lasso(28).

(ii) The system A_Jx_J−A_KQ_Kx_K=0and x_K∈R^K+has a unique solution(x_J,x_K) = (0_J,0_K)∈R^J×R^K, where Q_K:=Diag

sign(A^T_K(A_Jx^∗_J−b)) .

(iii) The submatrix A_J has full column rank and the columns of A_JA^†_JA_KQ_K−A_KQ_K arepositively linearly independentin the sense that

Ker(A_JA^†_JA_KQ_K−A_KQ_K)∩R^K+={0_K}, (34) where A^†_J:= (A^T_JA_J)⁻¹A^T_J is the Moore-Penrose pseudoinverse of A_J.

(iv) The submatrix A_Jhas full column rank and there exists aSlater pointy∈R^msuch that

(Q_KA^T_KA_JA^†_J−Q_KA^T_K)y<0. (35) Proof SinceF₃satisfies the quadratic growth condition atx^∗as in Lemma 3.2, (i) means thatF₃satisfies the strong quadratic growth condition atx^∗. Thus, by Theorem 3.1, (i) is equivalent to

hHE(x^∗)u,ui>0 for all u∈U \ {0} (36) withf(x) =¹₂kAx−bk²andU ={u∈R^E|u_j(∇f(x^∗))_j≤0,j∈K}. Note thatHE = [∇²f(x^∗)_i,_j]_i,_j∈_E = [(A^TA)_i,j]_i,j∈_E =A^T_EA_E. Hence (36) means the system

0=A_Eu=A_Ju_J+A_Ku_K and u_K∈UK (37)

has a unique solutionu= (u_J,u_K) = (0_J,0_K)∈R^J×R^K, whereUKis defined by UK:={u∈R^K|u_k(A^T(Ax^∗−b))_k≤0,k∈K}.

As observed after (33),J=supp(x^∗), for eachk∈Kwe have

(A^T(Ax^∗−b))_k= (A^T(A_Jx^∗_J−b))_k= (A^T_K(A_Jx^∗_J−b))_k.

It follows thatUK=−Q_K(R^K+)andQ_K is a nonsingular diagonal square matrix (each diagonal entry is either 1 or−1). Uniqueness of system (37) is equivalent to (ii). This verifies the equivalence between (i) and (ii).

(20)

Let us justify the equivalence between (ii) and (iii). To proceed, suppose that (ii) is valid, i.e., the system

A_Jx_J−A_KQ_Kx_K=0 with (x_J,x_K)∈R^J×R^K+. (38) has a unique solution(0_J,0_K)∈R^J×R^K. Choosex_K=0_K, the latter tells us that equationA_Jx_J=0 has a unique solutionx_J=0, i.e.,A_Jhas full column rank. ThusA^T_JA_Jis nonsingular. Furthermore, it follows from (38) thatA^T_JA_Jx_J=A^T_JA_KQ_Kx_K, which means

x_J= (A^T_JA_J)⁻¹A^T_JA_KQ_Kx_K =A^†_JA_KQ_Kx_K. (39) This together with (38) tells us that the system

A_JA^†_JA_KQ_Kx_K−A_KQ_Kx_K= (A_JA^†_JA_KQ_K−A_KQ_K)x_K=0,x_K∈R^K+ (40) has a unique solutionx_K=0_K∈R^K, which clearly verifies (34) and thus (iii).

To justify the converse implication, suppose that (iii) is valid. Consider the equation (38) in (ii), since A_J has the full rank column, we also have (39). It is similar to the above justification thatx_K satisfies equation (40). Thanks to (34) in (iii), we get from (40) thatx_K =0_K and thusx_J=0_J by (39). This verifies that the equation (38) in (ii) has a unique solution(x_J,x_K) = (0_J,0_K).

Finally, the equivalence between (iii) and (iv) follows from the well-known Gordan’s lemma and the

fact that the matrixA_JA^†_Jis symmetric. ut

Next let us discuss some known conditions relating the uniqueness of optimal solution to Lasso.

In [40], Fuchs introduced a sufficient condition for the above property:

A^T_J(A_Jx^∗_J−b) =−µsign(x^∗_J), (41) kA^T_Jc(A_Jx^∗_J−b)k_∞<µ, (42)

A_Jhas full column rank. (43)

The first equality (41) indeed tells us thatx^∗ is an optimal solution to Lasso problem. Inequality (42) means thatE =J, i.e.,K=/0 in Theorem 4.1. (43) is also present in our characterizations. Hence Fuchs’

(21)

condition implies (iii) in Theorem 4.1 and is clearly not a necessary condition for the uniqueness of optimal solution to Lasso problem, since in many situations the setKis not empty.

Furthermore, in the recent work [41] Tibshirani shows that the optimal solutionx^∗to problem (28) is unique when the matrixA_E has full column rank. This condition is sufficient for our (ii) in Theorem 4.1.

Indeed, if(x_J,x_K)satisfies system (38) in (ii), we haveA_E[x_J −Q_Kx_K]^T =0, which implies thatx_J=0 andQ_Kx_K=0 when kerA_E =0. SinceQ_Kis invertible, the latter tells us thatx_J=0 andx_K=0, which clearly verifies (ii). Tibshirani’s condition is also necessary for the uniqueness of optimal solution to Lasso problem foralmost all bin (28), but it is not for anyb; a concrete example could be found in [43].

In the recent works [43, 44], the following useful characterization of unique solution to Lasso has been established under mild assumptions:

There existsy∈R^msatisfyingA^T_Jy=sign(x^∗_J)andkA^T_Kyk_∞<1, (44) AJhas full column rank.

It is still open to us to connect directly this condition to those ones in Theorem 4.1, although they must be logically equivalent under the assumptions required in [43, 44]. However, our approach via second- order variational analysis is completely different and also provides several new characterizations for the uniqueness of optimal solution to Lasso. It is also worth mentioning here that the standing assumption in [43] thatAhas full row rank is relaxed in our study.

5 Conclusion

In this paper we analyze quadratic growth conditions for some structured optimization problems that allows us to show the Q-linear convergence of FBS with no assumption on the initial data or with mild assumptions via second-order conditions. In future research we intend to study the quadratic growth condition for the well-knownnuclear norm regularized least squareoptimization problem

min

X∈_R^p×q h(X):=kAX−Bk²+µkXk∗,

(22)

whereA :R^p×q→R^m×n3Bis a linear operator andkXk∗is the trace norm (known as well the nuclear norm) ofX. Quadratic growth condition of this problem could be obtained under the nondengenracy condition [24], which could be very restrictive. We plan to use second-order information to relax this assumption and extend the approach in Section 4 to investigate the uniqueness of optimal solution with further applications to matrix completion.

Acknowledgements The authors are indebted to both anonymous referees for their careful readings and thoughtful suggestions that allowed us to improve the original presentation significantly.

References

1. Bello-Cruz, J.Y., Li, G., Nghia, T. T.A.: On the Q-linear convergence of forward-backward splitting method. Part I: Convergence analysis. (2019)

2. Daubechies, I., Defrise, M., De Mol, D.: An iterative thresholding algorithm for linear inverse problems with a sparsity con- straint. Comm. Pure Appl. Math.57, 1413–1457 (2004)

3. Hale, E. T., Yin, W., Zhang, Y.: Fixed-point continuation for`1-minimization: methodology and convergence. SIAM J. Optim.

19, 1107–1130 (2008)

4. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc.58, 267–288 (1996)

5. Tropp, J.: Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory.

52,1030–1051 (2006)

6. Bauschke, H. H., Combettes, P. L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)

7. Bredies, K., Lorenz, D. A.: Linear convergence of iterative soft-thresholding. J. Fourier Anal. Appl.14, 813–837 (2008) 8. Beck, A., Teboulle, M.: Gradient-Based Algorithms with Applications to Signal Recovery Problems. inConvex Optimization in

Signal Processing and Communications, (D. Palomar and Y. Eldar, eds.) 42–88 University Press, Cambribge (2010)

9. Combettes, P. L., Pesquet, J.-C.: Proximal splitting methods in signal processing. inFixed-Point Algorithms for Inverse Problems.

Science and Engineering. Springer Optimization and Its Applications49, 185–212 Springer, New York, (2011)

10. Combettes, P. L., Wajs, V. R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul.4, 1168–1200 (2005)

11. Neal, P., Boyd, S.: Proximal Algorithms, Found. Trends in Optim.1, 127–239 (2014)

(23)

12. Tseng, P.: A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control Optim.38, 431–446 (2000)

13. Bello Cruz, J. Y., Nghia, T. T. A.: On the convergence of the proximal forward-backward splitting method with linesearches.

Optim. Method Softw.31, 1209–1238 (2016)

14. Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes.Splitting Methods in Communications, Image Science, and Engineering. Scientific Computation, Springer, Cham, 2016.

15. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res.42, 330–348 (2017)

16. Salzo, S.: The variable metric forward-backward splitting algorithm under mild differentiability assumptions. SIAM J. Optim., 27, 2153–2181 (2017)

17. Csisz´ar, I.: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann.

Statist.19, 2032–2066 (1991)

18. Vardi, Y., Shepp, L.A., Kaufman, L.: A statistical model for positron emission tomography. J. Amer. Statist. Assoc.80, 8–

37(1985)

19. Drusvyatskiy, D., Lewis, A.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res.

43, 693–1050 (2018)

20. Tseng P. and Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization,Math. Program.117, pp.

387–423 (2000)

21. Garrigos, G., Rosasco, L., and Villa, S.: Convergence of the forward-backward algorithm: beyond the worst case with the help of geometry, arXiv:1703.09477 (2017)

22. Garrigos, G., Rosasco, L., and Villa, S.: Thresholding gradient methods in Hilbert spaces: support identification and linear convergence, arXiv:1712.00357 (2017)

23. Necoara, I., Nesterov, Yu., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math.

Program. (2018) doi.org/10.1007/s10107-018-1232-1.

24. Zhou, Z., So, A. M-C.: A unified approach to error bounds for structured convex optimization. Math. Program.,165, 689–728 (2017)

25. Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res.

46, 157–178 (1993)

26. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B. W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program.165, 471–507 (2017)

27. Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comp. Math.18, 1199–1232 (2018)