• Nenhum resultado encontrado

On the Q-linear convergence of forward-backward splitting method. Part II: Quadratic growth conditions and uniqueness of optimal solution to Lasso

N/A
N/A
Protected

Academic year: 2022

Share "On the Q-linear convergence of forward-backward splitting method. Part II: Quadratic growth conditions and uniqueness of optimal solution to Lasso"

Copied!
24
0
0

Texto

(1)

(will be inserted by the editor)

On the Q-linear convergence of forward-backward splitting method. Part II: Quadratic growth conditions and uniqueness of optimal solution to Lasso

J.Y. Bello-Cruz · G. Li · T.T.A. Nghia

the date of receipt and acceptance should be inserted later

Abstract In the previous paper [1], we show that the quadratic growth condition is the key tool in study- ing Q-linear convergence of forward-backward splitting method. In this paper, the property of quadratic growth condition is mainly analyzed via second-order variational analysis in several structured optimiza- tion problems that arise in machine learning and signal processing including Poisson linear inverse prob- lem as well as the`1-regularized optimization problems. Moreover, via this approach, we obtain several full characterizations for the uniqueness of optimal solution to Lasso problem, which covers some recent results in this direction.

This work was partially supported by the National Science Foundation (NSF) Grant DMS - 1816386 and 1816449 J.Y. Bello-Cruz

Department of Mathematical Sciences, Northern Illinois University, DeKalb, IL 60115, USA E-mail: yunierbello@niu.edu

G. Li

Department of Applied Mathematics, University of New South Wales, Sydney 2052, Australia E-mail: g.li@unsw.edu.au

T.T.A. Nghia

Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309, USA E-mail: nttran@oakland.edu

(2)

Keywords:Nonsmooth and convex optimization problems; Forward-backward splitting method; Linear convergence; Uniqueness; Lasso; Quadratic growth condition; Variational Analysis.

Mathematics Subject Classification (2010):65K05; 90C25; 90C30.

1 Introduction

In this paper, we mainly consider an optimization problem of minimizing the sum of two convex func- tions, one of which is differentiable in its domain and the other is nondifferentiable. Many problems in this format have appeared in different fields of science and engineering including machine learning, compressed sensing, and image processing.

Among many methods solving the aforementioned optimization problems, the forward-backward splitting method (FBS in brief, known also as the proximal gradient method) [2, 3, 6–11] is very pop- ular due to its simplicity and efficiency. It is well-known that this method is globally convergent to an optimal solution with the complexityO(k−1)on the iterative cost values under the assumption that the gradient of the differentiable function is globally Lipschitz continuous. In our previous part [1], we also obtain the global convergence of FBS without the Lipschitz continuity of the gradient and under mild assumptions on the initial data that are weaker than many known ones [13,15,16]. The complexity of cost value is improved too(k−1)when the aforementioned gradient is onlylocallyLipschitz continuous; see also [13,14,16] for further development in this direction. Furthermore, when the cost function satisfies the quadratic growth condition, we show that both FBS iterative sequence and cost sequence are Q-linearly convergent to an optimal solution and optimal value, respectively. Our results extend several works in this direction [19, 21, 23, 24, 26, 27] without borrowing conventional techniques of error bound [25] or Kurdya-Łojasiewicz inequality.

One of the main parts in this sequent paper is to analyze the quadratic growth condition in several structured optimization problems. This allows us to understand more the performance of FBS methods for solving specific optimization problems. In particular, we show that quadratic growth condition is valid

(3)

for the standard Poisson inverse regularized problems with Kullback-Liebler divergence [17, 18], which does not have Lipschitz continuous gradient. Using FBS to solve Poisson inverse regularized problems was raised in [15]. Recently, Salzo [16] proved that FBS method could solve it with possible rateo(k−1).

We advance this direction by indicating that the convergence rate is indeed Q-linear.

Linear convergence of FBS iterative sequence to solve some structured optimization problems was also studied in [3,7,32–34] when the nonsmooth function ispartly smooth relative to a manifoldby using the idea of finite support identification. The latter notion introduced by Lewis [35] allows Liang, Fadili, and Peyr´e [32, 33] to cover many important problems such as the total variation semi-norm, the`1-norm, the`-norm, and the nuclear norm problems. In their paper, a second-order condition was introduced to guarantee the Q-local linear convergence of FBS sequence under the non-degeneracy assumption [35].

When considering the`1-regularized problem, we are able to avoid the non-degeneracy assumption. This allows us to improve the well-known work of Hale, Yin, and Zhang [3] in two aspects: (a) We completely drop the aforementioned non-degeneracy assumption (b) Our second-order condition is strictly weaker than the one in [3, Theorem 4.10]. The wider view is that when considering particular optimization prob- lems listed in the spirit of [32, 33], the assumption of non-degeneracy may be not necessary. Furthermore, we revisit theiterative shrinkage thresholding algorithm(ISTA) [2, 8], which is indeed FBS for solving Lasso problem [4]. It is well-known that the complexity of this algorithm is O(k−1); however, recent works [32, 34] indicates the local linear convergence of ISTA. The stronger conclusion in this direction is obtained lately by Bolte, Nguyen, Peypouquet, and Suter [21, 26] that: the iterative sequence of ISTA isR-linearly convergent and its corresponding cost sequence is globallyQ-linearly convergent, but the rate may depend on the initial point. Inspired by these achievements, we provide two new information:

(c) The iterative sequence from ISTA are indeedglobally Q-linearly convergent (d) They are eventually Q-linearly convergent to an optimal solution with a uniform rate that does not depend on the initial point.

Finally, we study the uniqueness of optimal solution to Lasso problem as one of the main applica- tions from our approach of using second-order variational analysis. This property of optimal solution to

(4)

our problem has been investigated vastly in the literature with immediate implementations to recover- ing sparse signals in compressed sensing; see, e.g., [5, 40–44] and the references therein. It is also used in [7, 34] to establish the linear convergence of ISTA. To the best of our knowledge, Fuchs [40] initializes this direction by introducing a simple sufficient condition for this property, which has been extended in other cited papers. Then Tibshirani in [41] shows that a sufficient condition closely related to Fuchs’ is also necessary for almost everywhere. The first full characterization for this property has been obtained recently in [43, 44] by using results of strong duality in linear programming. This characterization, which is based on an existence of a vector satisfying a system of linear equations and inequalities, allows [43,44]

to recover the aforementioned sufficient conditions and provide some situations in which these conditions turn necessary. As a direct application of our new approach via second-order variational analysis, we also derive several new full characterizations. Our conditions in terms ofpositively linear independenceand Slater typeare well-recognized to be verifiable.

The outline of our paper is as follows. Section 2 briefly presents some second-order characteriza- tion for quadratic growth condition in terms ofsubgradient graphical derivative [45] and recall some convergence analysis from our part I [1]. Section 3 devotes to the study of quadratic growth condition in some structured optimization problems involving Poisson inverse regularized, partial smoothness,`1- regularized, and`1-regularized least square optimization problems. In Section 4, we obtain several new full characterizations to the uniqueness of optimal solution to Lasso problem. The final Section 5 gives some conclusions and potential future works in this direction.

2 Preliminary results on metric subregularity of the subdifferential and quadratic growth condition

Throughout the paper,Rnis the usual Euclidean space with dimensionnwherek · kandh·,·idenote the corresponding Euclidean norm and inner product inRn. We useΓ0(Rn)to denote the set of proper, lower

(5)

semicontinuous, and convex functions onRn. Leth∈Γ0(Rn), we write domh:={x∈Rn|h(x)<+∞}.

The subdifferential ofhat ¯x∈domhis defined by

∂h(x)¯ :={v∈Rn| hv,x−xi ≤¯ h(x)−h(x),¯ x∈Rn}. (1)

We sayhsatisfies thequadratic growth conditionat ¯xwith modulusκ>0 if there existε>0 such that

h(x)≥h(x) +¯ κ

2d2(x;(∂h)−1(0)) for all x∈Bε(x).¯ (2) Moreover, if in additionally(∂h)−1(0) ={x},¯ his said to satisfy thestrongquadratic growth condition at

¯

xwith modulusκ.

Some relationship between the quadratic growth condition and the so-called metric subregularity of the subdifferential could be found in [26, 29–31, 38] even for the case of nonconvex functions. The quadratic growth condition (2) is also called quadratic functional growthproperty in [23] when h is continuously differentiable over a closed convex set. In [21, 22],his said to be 2-conditioned onBε(x)¯ if it satisfies the quadratic growth condition (2).

The following proposition, a slight improvement of [38, Corollary 3.7] provides an useful characteri- zation for strong quadratic growth condition via the subgradient graphical derivative [45, Chapter 13].

Proposition 2.1 (Characterization of strong quadratic growth condition)Let h∈Γ0(Rn)andx be an¯ optimal solution, i.e.,0∈∂h(x). The following are equivalent:¯

(i) h satisfies the strong quadratic growth condition atx.¯ (ii) D(∂h)(¯x|0)is positive-definite in the sense that

hv,ui>0 for all v∈D(∂h)(¯x|0)(u),u∈Rn,u6=0, (3)

where D(∂h)(x|0)¯ :Rn→→Rnis the subgradient graphical derivative defined by

D(∂h)(x|0)(u)¯ :={v∈Rn| ∃(un,vn)→(u,v),tn↓0 :tnvn∈∂h(x¯+tnun)}.

(6)

Moreover, if (ii) is satisfied then

`:=inf hv,ui

kuk2|v∈D(∂h)(x|0)(u),¯ u∈Rn

>0 (4)

with convention0

0=∞and h satisfies the strong quadratic growth condition atx with any modulus¯ κ< `.

Proof The implication [(i)⇒(ii)] follows from [38, Theorem 3.6 and Corollary 3.7]. If (ii) is satisfied, we obtain from (3) thatkvk ≥`kuk. Combining [39, Theorem 4C.1] and [31, Corollary 3.3] tells us that hsatisfies the strong quadratic growth condition at ¯xwith any modulusκ< `. The proof is complete. ut

Next let us recall here some main results from our part I [1] regarding the convergence of forward- backward splitting method (FBS) for solving the following optimization problem:

x∈minRn F(x):=f(x) +g(x), (5)

where f,g:Rn→R∪ {∞}are proper, lower semi-continuous, and convex functions. The standing as- sumptions on the initial data for (5) used throughout the paper:

A1. f,g∈Γ0(Rn)and int(domf)∩domg6=/0.

A2. f is continuously differentiable at any point in int(domf)∩domg

A3. For anyx∈int(domf)∩domg, the sublevel set{F≤F(x)}is contained in int(domf)∩domg.

The forward-backward splitting methods for solving (5) is described by

xk+1=proxα

kg(xk−αk∇f(xk)):= (Id+αk∂g)−1(xk−αk∇f(xk)) (6)

with stepsizeαk>0 determined from the Beck–Teboulle’s line search as follows:

(7)

Linesearch BT(Beck–Teboulle’s line search) Givenσ>0 andθ∈(0,1).

Input.Setαkk−1andJ(xkk) =proxα

kg(xk−αk∇f(xk)).

While f(J(xkk))> f(xk) +h∇f(xk),J(xkk)−xki+ 1

kkxk−J(xkk)k2,do αk=θ αk.

End While Output.αk

withα−1:=σ andx0∈int(domf)∩domg.

In [1, Proposition 3.1 and Corollary 3.1], we show that the linesearch above terminates after finite steps, the FBS sequence(xk)k∈N⊂int(domf)∩domgare well-defined, and thusf is differentiable atxk by assumptionA2. The global convergence [1, Theorem 3.1] is recalled here.

Theorem 2.1 (Global convergence of FBS method)Let(xk)k∈Nbe the sequence generated from FBS method. Suppose that the solution set is not empty. Then(xk)k∈Nconverges to an optimal solution point.

Moreover,(F(xk))k∈Nalso converges to the optimal value.

When the cost functionFsatisfies the quadratic growth condition and∇f is locally Lipschitz continu- ous, our [1, Theorem 4.1] shows that both iterative and cost sequences of FBS are Q-linearly convergent.

Theorem 2.2 (Q-linear convergence under quadratic growth condition)Let(xk)k∈Nbe the sequences generated from FBS method. Suppose that S6=/0and let x∈Sbe the convergent point of(xk)k∈Nas in Theorem 2.1. Suppose further that∇f is locally Lipschitz continuous around xwith constant L>0. If F satisfies the quadratic growth condition at xwith modulusκ>0, there exists K∈Nsuch that

kxk+1−xk ≤ 1 q

1+α κ4

kxk−xk (7)

|F(xk+1)−F(x)| ≤

√1+α κ+1 2√

1+α κ |F(xk)−F(x)| (8)

(8)

for any k>K, whereα:=min αK,θL .

If, in addition,∇f is globally Lipschitz continuous onint(domf)∩domg with constant L>0,α could be chosen asmin

σ,θL .

Under the strong quadratic growth condition, a shaper rate is obtained in [1, Corollary 4.1].

Corollary 2.1 (Sharper Q-linear convergence rate under strong quadratic growth condition)Let (xk)k∈Nbe the sequence generated from FBS method. Suppose that the solution set Sis not empty and let x∈S be the convergent point of(xk)k∈N as in Theorem 2.1. Suppose further that ∇f is locally Lipschitz continuous around xwith constant L>0. If F satisfies the strong quadratic growth condition at xwith modulusκ>0, then there exists some K∈Nsuch that for any k>K we have

kxk+1−xk ≤ 1

√1+α κ

kxk−xk with α:=min αK

L .

Additionally, if ∇f is globally Lipschitz continuous onint(domf)∩domg with constant L>0,α above could be chosen asmin

σ,θL .

3 Quadratic growth conditions and linear convergence of forward-backward splitting method in some structured optimization problems

In this section, we mainly show that quadratic growth condition is automatic or fulfilled under mild assumptions in several important classes of convex optimization.

3.1 Poisson linear inverse problem

This subsection devotes to the study of theeventuallylinear convergence of FBS when solving the fol- lowing standard Poisson regularized problem [17, 18]

x∈minRn+

m

i=1

bilog bi

(Ax)i+ (Ax)i−bi, (9)

(9)

whereA∈Rm×n+ is anm×nmatrix with nonnegative entries and nontrivial rows, andb∈Rm++is a positive vector. This problem is usually used to recover a signalx∈Rn+ from the measurementbcorrupted by Poisson noise satisfyingAx'b. The problem (9) could be written in term of (5) in which

f(x):=h(Ax), g(x) =δRn+(x), and F1(x):=h(Ax) +g(x), (10) wherehis the Kullback-Leibler divergence defined by

h(y) =





m

i=1

bilogbi

yi+yi−bi if y∈Rm++, +∞ if y∈Rm+\Rm++.

(11)

Note from (10) and (11) that domf =A−1(Rm++), which is an open set. Moreover, sinceA∈Rm×n+ , we have domf ∩domg=A−1(Rm++)∩Rn+6= /0 and f is continuously differentiable at any point on domf∩domg. The standing assumptionsA1andA2are satisfied for Problem (9). Moreover, since the functionF1is bounded below and coercive, the optimal solution set to problem (9) is always nonempty.

It is worth noting further that∇f is locally Lipschitz continuous at any point int(domf)∩domgbut not globally Lipschitz continuous on int(domf)∩domg. Our [1, Theorem 3.2] is applicable to solving (9) with global convergence rateo(1k). In the recent work [15], a new algorithm rather close to FBS was designed with applications to solving (9). However, the theory developed in [15] could not guarantee the global convergence of their optimal sequence(xk)k∈N when solving (9), since one of their assumptions on the closedness of the domain of their auxiliary Legendre function in [15, Theorem 2] is not satisfied.

Our intent here is to reveal the Q-linear convergence of our method when solving (9) in the sense of Theorem 2.2. In order to do so, we need to verify the quadratic growth condition forF1at any optimal minimizer for 0. Note further that the Kullback-Leibler divergencehis not strongly convex and∇f is not globally Lipschitz continuous; hence, standing assumptions in [19] are not satisfied. Proving the quadratic growth condition forF1at an optimal solution via the approach of [19] needs to be proceeded with caution.

Lemma 3.1 Letx be an optimal solution to problem¯ (9). Then for any R>0, we have F1(x)−F1(x)¯ ≥ν

2d2(x;S) for all x∈BR(x)¯ (12)

(10)

with some constantν>0.

Proof Pick anyR>0 andx∈BR(x). We only need to prove (9) for the case that¯ x∈domF1∩BR(x),¯ i.e.,x∈A−1(Rn++)∩Rn+∩BR(¯x). Note that

∇f(x) =

m i=1

1− bi

hai,xi

ai and h∇2f(x)d,di=

m i=1

bi

hai,di2

hai,xi2 for all d∈Rn,

whereaiis the i-th row ofA. Define ¯y:=Ax, for any¯ x,u∈BR(x)∩¯ domf we have[x,u]⊂BR(x)∩¯ domf and obtain from the mean-value theorem that

f(x)−f(u)− h∇f(u),x−ui=1 2

Z 1 0

h∇2f(u+t(x−u))x−u,x−uidt

=1 2

Z 1 0

m

i=1

bi hai,x−ui2 hai,u+t(x−u)i2dt

≥1 2

Z 1 0

m i=1

bi hai,x−ui2

[|hai,xi|¯ +kaik(ku−xk¯ +tkx−uk)]2dt

≥1 2

m i=1

bi

[|hai,xi|¯ +3kaikR]2hai,x−ui2. Similarly, we have

f(u)−f(x)− h∇f(x),u−xi ≥1 2

m

i=1

bi

[|hai,xi|¯ +3kaikR]2hai,u−xi2 for x,u∈BR(¯x)∩domf. (13) Adding the above two inequalities gives us that

h∇f(x)−∇f(u),x−ui ≥

m i=1

bi

[|hai,xi|¯ +3kaikR]2hai,x−ui2 for all x,u∈BR(x)¯ ∩domf. (14) We claim that the optimal solution setSto problem (9) satisfies that

S=A−1(¯y)∩(∂g)−1(−∇f(¯x)) with y¯=Ax.¯ (15) Pick another optimal solution ¯u∈S, we have ¯ut :=x¯+t(x¯−u)¯ ∈S⊂domf for any t∈[0,1] due to the convexity ofS. By choosingt sufficiently small, we have ¯ut∈BR(x)¯ ∩domf. Note further that

−∇f(u¯t)∈∂g(u¯t)and−∇f(x)¯ ∈∂g(x). Since¯ ∂gis a monotone operator, we obtain that 0≥ h∇f(x)¯ −∇f(u¯t),x¯−u¯ti.

(11)

This together with (14) tells us thathai,x−¯ u¯ti=0 for alli=1, . . . ,m. HenceAx¯=Au¯=y¯for any ¯u∈S, which also implies that

∇f(u) =¯ AT∇h(Au) =¯ AT∇h(Ax) =¯ ∇f(x).¯ (16) This verifies the inclusion “⊂” in (15). The opposite inclusion is trivial. Indeed, take anyusatisfying that Au=y¯and−∇f(¯x)∈∂g(u), similarly to (16) we have−∇f(u) =−∇f(x)¯ ∈∂g(u). This shows that 0∈∇f(u) +∂g(u), i.e.,u∈S. The proof for equality (15) is completed.

Note from (15) that the optimal solution setSis a polyhedral with the following format S={u∈Rn|Au=y¯=Ax,h∇¯ f(x),ui¯ =0,u∈Rn+}

due to the fact that(∂g)−1(−∇f(x)) =¯ {u∈Rn+| h∇f(x),¯ ui=0=h∇f(x),¯ xi}.¯ Thanks to Hoffman’s lemma, there exists a constantγ>0 such that

d(x;S)≤γ(kAx−Axk¯ +|h∇f(x),¯ x−xi|)¯ for all x∈Rn+. (17) Fix anyx∈BR(x)¯ ∩Rn+, (13) tells us that

f(x)−f(x)¯ − h∇f(x),x¯ −xi ≥¯ 1 2 min

1≤i≤m

h bi

[|hai,xi|¯ +3kaikR]2

ikAx−Axk¯ 2. (18)

Since−∇f(x)¯ ∈∂g(¯x), we haveh∇f(x),¯ x−xi ≥¯ 0. This together with (18) implies that F1(x)−F1(x)¯ ≥1

2 min

1≤i≤m

h bi

[|hai,xi|¯ +3kaikR]2

ikAx−Axk¯ 2+h∇f(¯x),x−xi¯

≥1 2 min

1≤i≤m

h bi

[|hai,xi|¯ +3kaikR]2

ikAx−Axk¯ 2+ 1

(k∇f(x)k¯ +1)kx−xk¯ h∇f(¯x),x−xi¯ 2

≥minn1 2 min

1≤i≤m

h bi

[|hai,xi|¯ +3kaikR]2 i

, 1

(k∇f(x)k¯ +1)R

o[kAx−Axk¯ 2+h∇f(x),x¯ −xi¯2]

≥1 2minn1

2 min

1≤i≤m

h bi

[|hai,xi|¯ +3kaikR]2 i

, 1

(k∇f(x)k¯ +1)R o

kAx−Axk¯ +|h∇f(x),¯ x−xi|¯ 2

≥ 1

2minn1 2 min

1≤i≤m

h bi

[|hai,xi|¯ +3kaikR]2 i

, 1

(k∇f(¯x)k+1)R o

d2(x;S),

where the fourth inequality follows from the elementary inequality that (a+b)2 2 ≤a2+b2witha,b≥0, and the last inequality is from (17). This clearly ensures (12). ut

(12)

When applying FBS to solving problem (9), we have xk+1=PRn+ xk−αk

m i=1

1− bi

hai,xki ai

!

with x0∈A−1(Rn++)∩Rn+, (19) wherePRn+(·)is the projection mapping toRn+.

Corollary 3.1 (Q-linear convergence of method (19))Let(xk)k∈Nbe the sequence generated from(19) with x0∈A−1(Rn+)∩Rn+for solving the Poisson regularized problem(9). Then the sequences(xk)k∈Nand (F1(xk))k∈Nare Q-linearly convergent to an optimal solution and the optimal value to(9)respectively.

Proof Since both functions f andg in problem (9) satisfy our standing assumptionsA1 andA2, and problem (9) always has optimal solutions, the sequence(xk)k∈N converges to an optimal solution ¯xto problem (9) by Theorem 2.1. Since∇f is locally Lipschitz continuous around ¯x, the combination of Theorem 2.2 and Lemma 3.1 tells us that(xk)k∈Nis Q-linearly convergent to ¯x. ut By using this approach, it is similar to show that quadratic growth condition in Lemma 3.1 is also valid for the Poisson inverse problem with sparse regularization [15]:

x∈minRn+

m i=1

bilog bi

(Ax)i+ (Ax)i−bi+µkxk1, (20) whereµ>0 is the penalty parameter. In deed, whenk(x) =kxk1, the FBS method for solving (20) is practical by modifying the functionf(x)in (10) toh(Ax) +he,xiwithe= (1,1, . . . ,1)∈Rn. This together with Corollary 3.1 clearly shows that FBS (9) solves (20) linearly.

3.2`1-regularized optimization problems

In this section we consider the`1-regularized optimization problems

x∈minRn F2(x):=f(x) +µkxk1 (21) In order to use Proposition 2.1 for characterizing the strong quadratic growth condition forF2, we need the following calculation of subgradient graphical derivative of∂ µk · k1.

(13)

Proposition 3.1 (Subgradient graphical derivative of∂ µk · k1)Suppose thats¯∈∂ µkxk1. Define I:=

{j∈ {1, . . . ,n} | |s¯j|=µ}, J:={j∈I|xj6=0}, K:={j∈I|xj=0}, and H(x):={u∈Rn|uj=0,j∈/ I and ujj≥0,j∈K}. Then D∂ µk · k1(x|s)(u)¯ is nonempty if and only if u∈H(x). Furthermore, we have

D∂(µk · k1)(x|s)(u) =¯





 v∈Rn

vj=0,j∈J

ujvj=0,s¯jvj≤0,j∈K





for all u∈H(x). (22)

Proof For anyx∈Rn, note that

∂ µkxk1=





 s∈Rn

sj=µsgn(xj)if xj6=0 sj∈[−µ,µ] if xj=0





, (23)

where sgn :R→ {−1,1}is thesign function. Take anyv∈D∂k · k1(x|s)(u), there exists sequence¯ tk↓0 and(uk,vk)→(u,v)such that (x,s) +¯ tk(uk,vk)∈gph∂ µk · k1. Let us consider three partitions of j described below:

Partition 1.1: j∈/ I, i.e.,|s¯j|<µ. It follows from (23) that xj =0. For sufficiently large k, we have

|(s¯+tkvk)j|<µand thus|(x+tkuk)j|=0 by (23) again. Henceukj=0, which implies thatuj=0 for all j∈/I.

Partition 1.2: j∈J, i.e.,|s¯j|=µandxj6=0. Whenkis sufficiently large, we have(x+tkuk)j6=0 and derive from (23) that

(¯s+tkvk)j=µsgn(x+tkuk)j=µsgnxj=s¯j, which implies thatvj=0 for all j∈J.

Partition 1.3: j∈K, i.e.,|s¯j|=µandxj=0. If there is a subsequence of(x,s)¯ j+tk(uk,vk)j (without relabeling) such that|(¯s+tkvk)j|<µ=|s¯j|, we have ¯sjvkj<0 and(x+tkuk)j=0 by (23). It follows that ukj =0. Lettingk→∞, we have uj=0 and ¯sjvj ≤0. Otherwise, we find someL>0 such that

|(s¯+tkvk)j|=µ=|s¯j|for allk>L, which yieldsvkj=0. Takingk→∞gives us thatvj=0. Furthermore, by (23) again, we have

j= (s¯+tkvk)j=µsgn(x+tkuk)j=µsgn(ukj) or 0= (x+tkuk)j=tkukj, i.e.,ukj=0,

(14)

which imply that ¯sjuj≥0 after passing the limitk→∞.

Combining the conclusions in three cases above gives us thatu∈H(x)and also verifies the inclusion

“⊂” in (22). To justify the converse inclusion “⊃”, takeu∈H(x)and anyv∈Rnwithvj=0 for j∈J andujvj=0,s¯jvj≤0 for j∈K. For anytk↓0, we prove that(x,s) +¯ tk(u,v)∈gph∂ µk · k1and thus verify thatv∈D∂ µk · k1(x|s)(u). For any¯ t∈R, define the set-valued mapping:

SGN(t):=∂|t|=





sgn(t) if t6=0 [−1,1]if t=0.

Similarly to the proof of “⊂” inclusion, we consider three partitions of jas follows:

Partition 2.1: j∈/ I, i.e., |s¯j|<µ. Sinceu∈H(x), we have uj =0. Note also that xj =0. Hence we get(x+tku)j=0 and(s¯+tkv)j∈[−µ,µ]whenkis sufficiently large, which means(¯s+tkv)j∈ µSGN(x+tku)j.

Partition 2.2:j∈J, i.e.,|s¯j|=µandxj6=0. Sincevj=0, we have

sgn(s¯+tkv)j=sgn ¯sj=sgn(xj) =sgn(x+tku)j and(x+tku)j6=0 whenkis large. It follows that(s¯+tkv)j∈µSGN(x+tku)j.

Partition 2.3:j∈K, i.e.,|s¯j|=µandxj=0. Ifuj=0, we have(x+tku)j=0 and|(s+t¯ kv)j| ≤ |s¯j| ≤µ for sufficiently largek, since ¯sjvj≤0. Ifuj6=0, we havevj=0 and

(s¯+tkv)j=s¯j=sgn(uj) =sgn(x+tku)j

whenkis large, sinceujj≥0. In both cases, we have(¯s+tkv)j∈µSGN(x+tku)j.

From those cases, we always have(x,s) +t¯ k(u,v)∈gph∂ µk · k1and thusv∈D∂ µk · k1(x|s)(u).¯ ut As a consequence, we establish a characterization of strong quadratic growth condition forF2. Theorem 3.1 (Characterization of strong quadratic growth condition forF2)Let xbe an optimal so- lution to problem(21). Suppose that∇f is differentiable at x. DefineE :=

j∈ {1, . . . ,n} | |(∇f(x))j|= µ , K:={j∈E|xj=0},U :={u∈RE|uj(∇f(x))j≤0,j∈K}, andHE(x):= [∇2f(x)i,j]i,j∈E. Then the following statements are equivalent:

(15)

(i) F2satisfies the strong quadratic growth condition at x. (ii) HE(x)is positive definite overU in the sense that

hHE(x)u,ui>0 for all u∈U \ {0}. (24)

(iii) HE(x)is nonsingular overU in the sense that

kerHE(x)∩U ={0}. (25)

Moreover, if (24)is satisfied then F2satisfies the strong quadratic growth condition with any positive modulusκ< `with

`:=min

hHE(x)u,ui kuk2

u∈U

>0 (26)

with the convention 00=∞.

Proof First let us verify the equivalence between (i) and (ii) by using Proposition 2.1. Indeed, for any v∈D(∂F3)(x|0)(u)we have get from the sum rule [39, Proposition 4A.2] that

v−∇2f(x)u∈D∂ µk · k1(x| −∇f(x))(u).

DefineV :={u∈Rn|uj=0,j∈/E,uj(∇f(x))j≤0,j∈K}. Thanks to Proposition 3.1, we have hv−∇2f(x)u,ui=0 for all u∈V. (27)

This tells us that (24) is the same with (3) whenh=F2. By Proposition 2.1, (i) and (ii) are equivalent.

Moreover,F2satisfies the strong quadratic growth condition with any positive modulusκ< `.

Finally, the equivalence between (ii) and (iii) is trivial due to the fact thatf is convex and thusHE(x)

is positive semi-definite. ut

Corollary 3.2 (Linear convergence of FBS method for`1-regularized problems)Let(xk)k∈Nbe the sequences generated from FBS method for problem(21). Suppose that the solution set Sis not empty, (xk)k∈Nis converging to some x∈S, and that f isC2around x. If condition(24)holds, then(xk)k∈N

(16)

and(F2(xk))k∈Nare Q-linearly convergent to xand F2(x)respectively with rates determined inCorol- lary 2.1, whereκis any positive number smaller than`in(26).

Proof Since f isC2aroundx,∇f is locally Lipschitz continuous aroundx. The result follows from

Corollary 2.1 and Theorem 3.1. ut

Remark 3.1 It is worth noting that condition (25) is strictly weaker than the assumption used in [3] that HE has full rank to obtain the linear convergence of FBS for (21). Indeed, let us take into account the case n=2,µ=1, andf(x1,x2) =12(x1+x2)2+x1+x2. Note thatx= (0,0)is an optimal solution to problem (21). Moreover, direct computation gives us that∇f(x) = (1,1),E ={1,2}, andHE(x) =

 1 1 1 1

 . It is clear thatHE(x)does not have full rank, but condition (24) and its equivalence (25) hold.

3.3 Global Q-linear convergence of ISTA on Lasso problem

In this section we study the linear convergence of ISTA for Lasso problem

x∈minRn F3(x):=1

2kAx−bk2+µkxk1, (28)

whereAis anm×nreal matrix andbis a vector inRm.

The following lemma taken from [26, Lemma 10] plays an important role in our proof.

Lemma 3.2 (Global error bound)Fix any R>kbk2. Suppose that xis an optimal solution to problem (28). Then we have

F3(x)−F3(x)≥γR

2 d2(x;S) for all kxk1≤R, where

γR−1:=ν2 1+

√5

2 µR+ (RkAk+kbk)(4RkAk+kbk

!−1

(29) whileνis the Hoffman constant defined in[26, Definition 1]only depending on the initial data A,b,µ.

(17)

Global R-linear convergence of(xk)k∈N from ISTA and Q-linear convergence of (F3(xk))k∈N for solving Lasso problem were obtained in [21, Theorem 4.2 and Remark 4.3] and also [22, Theorem 4.8].

We add another feature: the iterative sequence(xk)k∈Nis also globally Q-linearly convergent.

Theorem 3.2 (Global Q-linear convergence of ISTA)Let(xk)k∈Nbe the sequence generated by ISTA for problem(28)that converges to an optimal solution x∈S. Then(xk)k∈Nand(F3(xk))k∈Nare globally Q-linearly convergent to xand F3(x)respectively:

kxk+1−xk ≤ 1 q

1+α γ4R

kxk−xk (30)

|F3(xk+1)−F3(x)| ≤ 2√ 1+α γR

√1+α γR+1|F3(xk)−F3(x)| (31) for all k∈N, where R is any number bigger thankx0k+kbk2

µ and γR is given as in (29)while α:=

minn σ, θ

λmax(ATA)

o .

Proof Note that Lasso always has optimal solutions. Withx∈S, we have F3(0) =1

2kbk2≥F3(x)≥µkxk1,

which implies thatkxk ≤ kxk11 kbk2. It follows from [1, Corollary 3.1] that kxkk ≤ kxk−xk+kxk ≤ kx0−xk+kxk ≤ kx0k+2kxk ≤ kx0k+kbk2

µ <R for allk∈N. Thanks to Lemma 3.2, [1, Corollary 3.1], and [1, Proposition 3.2], we have

kxk−xk2− kxk+1−xk2≥α γRd2(xk+1;S) (32) withα=12minn

σ, θ

λmax(ATA)

o

and the note thatλmax(ATA)is the global Lipschitz constant of the gradi- ent of 12kAx−bk2. The proof of (30) and (31) are quite similar to the one of (7) and (8) in Theorem 2.2;

see [1, Theorem 4.1] for further details. ut

Observe further that the linear rates in Theorem 3.2 depends on the initial pointx0; see also [22, The- orem 4.8]. Next we show that the local linear rates around optimal solutions are uniform and independent of the choice ofx0.

(18)

Corollary 3.3 (Local Q-linear convergence of ISTA with uniform rate)Let(xk)k∈Nbe the sequence generated by ISTA for problem(28)that converges to an optimal solution x∈S. Then (30)and (31) are satisfied when k is sufficiently large, whereα=minn

σ, θ

λmax(ATA)

o

and R is any number bigger than

kbk2 .

Proof Note from the proof of Theorem 3.2 thatkxk ≤kbk2 <R. Sincexkis converging tox∈S, there existsK∈Nsuch thatkxkk<Rfor anyk>K. By using Lemma 3.2 and also [1, Corollary 3.1], we also obtain (32) for allk>K. Following the same arguments as in Theorem 3.2 justifies the corollary. ut

4 Uniqueness of optimal solution to`1-regularized least square optimization problems

As discussed in Section 1, the linear convergence of ISTA for Lasso was sometimes obtained by imposing an additional assumption that Lasso has a unique optimal solutionx; see, e.g., [34]. SinceF3satisfies the quadratic growth condition atx(3.2), the uniqueness ofxis equivalent to the strong quadratic growth condition ofF3atx. This observation together with Theorem 3.1 allows us to characterize the uniqueness of optimal solution to Lasso in the next result. A different characterization for this property could be found in [43, Theorem 2.1]. Suppose thatxis an optimal solution, which means−AT(Ax−b)∈µ ∂kxk1. In the spirit of Proposition 3.1 with f(x) =12kAx−bk2, define

E :=

j∈ {1, . . . ,n}

|(AT(Ax−b))j|=µ ,K:={j∈E|xj=0},J:=E\K. (33) Since−AT(Ax−b)∈∂ µkxk1, if xj 6=0 then(AT(Ax−b))j=−µsign(xj). This tells us that J= {j∈ {1, . . . ,n}|xj 6=0}:=supp(x). Furthermore, given an index setI⊂ {1, . . . ,n}, we denoteAI by the submatrix ofAformed by its columnsAi,i∈IandxIby the subvector ofx∈Rnformed byxi,i∈I.

For anyx∈Rn, we also define sign(x):= (sign(x1), . . . ,sign(xn))T and Diag(x)by the square diagonal matrix with the main entriesx1,x2, . . . ,xn.

Theorem 4.1 (Uniqueness of optimal solution to Lasso problem) Let x be an optimal solution to problem(28). The following statements are equivalent:

(19)

(i) xis the unique optimal solution to Lasso(28).

(ii) The system AJxJ−AKQKxK=0and xK∈RK+has a unique solution(xJ,xK) = (0J,0K)∈RJ×RK, where QK:=Diag

sign(ATK(AJxJ−b)) .

(iii) The submatrix AJ has full column rank and the columns of AJAJAKQK−AKQK arepositively lin- early independentin the sense that

Ker(AJAJAKQK−AKQK)∩RK+={0K}, (34) where AJ:= (ATJAJ)−1ATJ is the Moore-Penrose pseudoinverse of AJ.

(iv) The submatrix AJhas full column rank and there exists aSlater pointy∈Rmsuch that

(QKATKAJAJ−QKATK)y<0. (35) Proof SinceF3satisfies the quadratic growth condition atxas in Lemma 3.2, (i) means thatF3satisfies the strong quadratic growth condition atx. Thus, by Theorem 3.1, (i) is equivalent to

hHE(x)u,ui>0 for all u∈U \ {0} (36) withf(x) =12kAx−bk2andU ={u∈RE|uj(∇f(x))j≤0,j∈K}. Note thatHE = [∇2f(x)i,j]i,j∈E = [(ATA)i,j]i,j∈E =ATEAE. Hence (36) means the system

0=AEu=AJuJ+AKuK and uK∈UK (37)

has a unique solutionu= (uJ,uK) = (0J,0K)∈RJ×RK, whereUKis defined by UK:={u∈RK|uk(AT(Ax−b))k≤0,k∈K}.

As observed after (33),J=supp(x), for eachk∈Kwe have

(AT(Ax−b))k= (AT(AJxJ−b))k= (ATK(AJxJ−b))k.

It follows thatUK=−QK(RK+)andQK is a nonsingular diagonal square matrix (each diagonal entry is either 1 or−1). Uniqueness of system (37) is equivalent to (ii). This verifies the equivalence between (i) and (ii).

(20)

Let us justify the equivalence between (ii) and (iii). To proceed, suppose that (ii) is valid, i.e., the system

AJxJ−AKQKxK=0 with (xJ,xK)∈RJ×RK+. (38) has a unique solution(0J,0K)∈RJ×RK. ChoosexK=0K, the latter tells us that equationAJxJ=0 has a unique solutionxJ=0, i.e.,AJhas full column rank. ThusATJAJis nonsingular. Furthermore, it follows from (38) thatATJAJxJ=ATJAKQKxK, which means

xJ= (ATJAJ)−1ATJAKQKxK =AJAKQKxK. (39) This together with (38) tells us that the system

AJAJAKQKxK−AKQKxK= (AJAJAKQK−AKQK)xK=0,xK∈RK+ (40) has a unique solutionxK=0K∈RK, which clearly verifies (34) and thus (iii).

To justify the converse implication, suppose that (iii) is valid. Consider the equation (38) in (ii), since AJ has the full rank column, we also have (39). It is similar to the above justification thatxK satisfies equation (40). Thanks to (34) in (iii), we get from (40) thatxK =0K and thusxJ=0J by (39). This verifies that the equation (38) in (ii) has a unique solution(xJ,xK) = (0J,0K).

Finally, the equivalence between (iii) and (iv) follows from the well-known Gordan’s lemma and the

fact that the matrixAJAJis symmetric. ut

Next let us discuss some known conditions relating the uniqueness of optimal solution to Lasso.

In [40], Fuchs introduced a sufficient condition for the above property:

ATJ(AJxJ−b) =−µsign(xJ), (41) kATJc(AJxJ−b)k<µ, (42)

AJhas full column rank. (43)

The first equality (41) indeed tells us thatx is an optimal solution to Lasso problem. Inequality (42) means thatE =J, i.e.,K=/0 in Theorem 4.1. (43) is also present in our characterizations. Hence Fuchs’

(21)

condition implies (iii) in Theorem 4.1 and is clearly not a necessary condition for the uniqueness of optimal solution to Lasso problem, since in many situations the setKis not empty.

Furthermore, in the recent work [41] Tibshirani shows that the optimal solutionxto problem (28) is unique when the matrixAE has full column rank. This condition is sufficient for our (ii) in Theorem 4.1.

Indeed, if(xJ,xK)satisfies system (38) in (ii), we haveAE[xJ −QKxK]T =0, which implies thatxJ=0 andQKxK=0 when kerAE =0. SinceQKis invertible, the latter tells us thatxJ=0 andxK=0, which clearly verifies (ii). Tibshirani’s condition is also necessary for the uniqueness of optimal solution to Lasso problem foralmost all bin (28), but it is not for anyb; a concrete example could be found in [43].

In the recent works [43, 44], the following useful characterization of unique solution to Lasso has been established under mild assumptions:

There existsy∈RmsatisfyingATJy=sign(xJ)andkATKyk<1, (44) AJhas full column rank.

It is still open to us to connect directly this condition to those ones in Theorem 4.1, although they must be logically equivalent under the assumptions required in [43, 44]. However, our approach via second- order variational analysis is completely different and also provides several new characterizations for the uniqueness of optimal solution to Lasso. It is also worth mentioning here that the standing assumption in [43] thatAhas full row rank is relaxed in our study.

5 Conclusion

In this paper we analyze quadratic growth conditions for some structured optimization problems that allows us to show the Q-linear convergence of FBS with no assumption on the initial data or with mild assumptions via second-order conditions. In future research we intend to study the quadratic growth condition for the well-knownnuclear norm regularized least squareoptimization problem

min

X∈Rp×q h(X):=kAX−Bk2+µkXk,

(22)

whereA :Rp×q→Rm×n3Bis a linear operator andkXkis the trace norm (known as well the nuclear norm) ofX. Quadratic growth condition of this problem could be obtained under the nondengenracy condition [24], which could be very restrictive. We plan to use second-order information to relax this assumption and extend the approach in Section 4 to investigate the uniqueness of optimal solution with further applications to matrix completion.

Acknowledgements The authors are indebted to both anonymous referees for their careful readings and thoughtful suggestions that allowed us to improve the original presentation significantly.

References

1. Bello-Cruz, J.Y., Li, G., Nghia, T. T.A.: On the Q-linear convergence of forward-backward splitting method. Part I: Convergence analysis. (2019)

2. Daubechies, I., Defrise, M., De Mol, D.: An iterative thresholding algorithm for linear inverse problems with a sparsity con- straint. Comm. Pure Appl. Math.57, 1413–1457 (2004)

3. Hale, E. T., Yin, W., Zhang, Y.: Fixed-point continuation for`1-minimization: methodology and convergence. SIAM J. Optim.

19, 1107–1130 (2008)

4. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc.58, 267–288 (1996)

5. Tropp, J.: Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory.

52,1030–1051 (2006)

6. Bauschke, H. H., Combettes, P. L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)

7. Bredies, K., Lorenz, D. A.: Linear convergence of iterative soft-thresholding. J. Fourier Anal. Appl.14, 813–837 (2008) 8. Beck, A., Teboulle, M.: Gradient-Based Algorithms with Applications to Signal Recovery Problems. inConvex Optimization in

Signal Processing and Communications, (D. Palomar and Y. Eldar, eds.) 42–88 University Press, Cambribge (2010)

9. Combettes, P. L., Pesquet, J.-C.: Proximal splitting methods in signal processing. inFixed-Point Algorithms for Inverse Problems.

Science and Engineering. Springer Optimization and Its Applications49, 185–212 Springer, New York, (2011)

10. Combettes, P. L., Wajs, V. R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul.4, 1168–1200 (2005)

11. Neal, P., Boyd, S.: Proximal Algorithms, Found. Trends in Optim.1, 127–239 (2014)

(23)

12. Tseng, P.: A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control Optim.38, 431–446 (2000)

13. Bello Cruz, J. Y., Nghia, T. T. A.: On the convergence of the proximal forward-backward splitting method with linesearches.

Optim. Method Softw.31, 1209–1238 (2016)

14. Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes.Splitting Methods in Communications, Image Science, and Engineering. Scientific Computation, Springer, Cham, 2016.

15. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res.42, 330–348 (2017)

16. Salzo, S.: The variable metric forward-backward splitting algorithm under mild differentiability assumptions. SIAM J. Optim., 27, 2153–2181 (2017)

17. Csisz´ar, I.: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann.

Statist.19, 2032–2066 (1991)

18. Vardi, Y., Shepp, L.A., Kaufman, L.: A statistical model for positron emission tomography. J. Amer. Statist. Assoc.80, 8–

37(1985)

19. Drusvyatskiy, D., Lewis, A.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res.

43, 693–1050 (2018)

20. Tseng P. and Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization,Math. Program.117, pp.

387–423 (2000)

21. Garrigos, G., Rosasco, L., and Villa, S.: Convergence of the forward-backward algorithm: beyond the worst case with the help of geometry, arXiv:1703.09477 (2017)

22. Garrigos, G., Rosasco, L., and Villa, S.: Thresholding gradient methods in Hilbert spaces: support identification and linear convergence, arXiv:1712.00357 (2017)

23. Necoara, I., Nesterov, Yu., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math.

Program. (2018) doi.org/10.1007/s10107-018-1232-1.

24. Zhou, Z., So, A. M-C.: A unified approach to error bounds for structured convex optimization. Math. Program.,165, 689–728 (2017)

25. Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res.

46, 157–178 (1993)

26. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B. W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program.165, 471–507 (2017)

27. Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comp. Math.18, 1199–1232 (2018)

Referências

Documentos relacionados

Assim, em relação ao menor impacto que usinas hidrelétricas a fio d’água possam vir a causar pela redução da área alagada (preservando parte da floresta), pelas

Ao considerarmos o relatório Global Peace Index, observamos que em alguns dos países mais pacíficos do mundo, como por exemplo, Suécia, Finlândia e Noruega, não foram

Esse movimento de concentração de investimentos no exterior em países mais próximos psiquicamente do Brasil encontra respaldo nas principais teorias comportamentais

É importante ressaltar que partículas com diâmetro entre 0,001 e 1,2 µm são coloidais (suspensão), mas pela metodologia analítica padronizada são quantificadas como

Tabela 1 - Valores para os descritores fitossociológicos de uma comunidade arbórea de Cerrado, Município de Cáceres, sudoeste do Estado de Mato Grosso, fronteira Brasil –

O uso de ferramentas tecnológicas, o que exigia acesso a internet e competências de literacia digital na lecionação à distância de um curso cujos estudantes estavam

We prove a series of existence, uniqueness and regularity results for viscosity, weak and dissipative solutions for such forward-backward diffusion flows.. In particular, we introduce

De uma forma geral, a DII tem um impacto significativo ao nível da QDV, enquanto avaliação subjectiva que o sujeito faz do impacto do seu estado de saúde em diversas dimensões