• Nenhum resultado encontrado

Sequential constant rank constraint qualifications for nonlinear semidefinite programming with algorithmic applications

N/A
N/A
Protected

Academic year: 2022

Share "Sequential constant rank constraint qualifications for nonlinear semidefinite programming with algorithmic applications"

Copied!
21
0
0

Texto

(1)

Sequential constant rank constraint qualifications for nonlinear semidefinite programming with algorithmic applications

Roberto Andreani

Gabriel Haeser

Leonardo M. Mito

H´ ector Ram´ırez C.

June 7, 2021. Last revised on September 15, 2022

Abstract

We present new constraint qualification conditions for nonlinear semidefinite programming that extend some of the constant rank-type conditions from nonlinear programming. As an application of these condi- tions, we provide a unified global convergence proof of a class of algorithms to stationary points without assuming neither uniqueness of the Lagrange multiplier nor boundedness of the Lagrange multipliers set.

This class of algorithms includes, for instance, general forms of augmented Lagrangian, sequential quadratic programming, and interior point methods. In particular, we do not assume boundedness of the dual sequence generated by the algorithm. The weaker sequential condition we present is shown to be strictly weaker than Robinson’s condition while still implying metric subregularity.

Keywords: Constant rank, Constraint qualifications, Semidefinite programming, Algorithms, Global con- vergence.

1 Introduction

Constraint qualification (CQ) conditions play a crucial role in optimization. They permit to establish first- and second-order necessary optimality conditions for local minima and support the convergence theory of many practical algorithms (see, for instance, a unified convergence analysis for a whole class of algorithms by Andreani et al. [10, Thm. 6]). Some of the well-known CQs in nonlinear programming (NLP) are theconstant- rank constraint qualification (CRCQ), introduced by Janin [25], and the constant positive linear dependence (CPLD) condition. The latter was first conceptualized by Qi and Wei [30], and then proved to be a constraint qualification by Andreani et al. [14]. Moreover, it has been a source of inspiration for other authors to define even weaker constraint qualifications for NLP, such as the constant rank of the subspace component (CRSC) [11], and the relaxed versions of CRCQ [27] and CPLD [10]. Our interest in constant rank-type conditions is motivated, mainly, by their applications towards obtaining global convergence results of iterative algorithms to stationary points without relying on boundedness or uniqueness of Lagrange multipliers. However, several other applications that we do not pursue in this paper may be expected to be extended to the conic context, such as the computation of the derivative of the value function [25,28] and the validity of strong second-order necessary optimality conditions that do not rely on the whole set of Lagrange multipliers [1]. Besides, their ability of dealing with redundant constraints, up to some extent, gives modellers some degree of freedom without losing regularity or convergence guarantees on algorithms. For instance, the standard NLP trick of replacing one nondegenerate equality constraint by two inequalities of opposite sign does not violate CRCQ, while violating the standardMangasarian-Fromovitz CQ (MFCQ).

Constant-rank type CQs have been proposed in conic programming only very recently. The first extension of CRCQ to nonlinear second-order cone programming (NSOCP) appeared in [36], but it was shown to be incorrect in [2]. A second proposal [9], which encompasses also nonlinear semidefinite programming (NSDP) problems, consists of transforming some of the conic constraints into NLP constraints via a reduction function, whenever it was possible, and then demanding constant linear dependence of the reduced constraints, locally.

The authors received financial support from FAPESP (grants 2017/18308-2, 2017/17840-2, and 2018/24293-0), CNPq (grants 301888/2017-5, 303427/2018-3, 404656/2018-8, and 306988/2021-6), and ANID (FONDECYT grant 1201982 and Basal Fund CMM FB210005).

Department of Applied Mathematics, University of Campinas, Campinas, SP, Brazil. Email: andreani@unicamp.br

Department of Applied Mathematics, University of S˜ao Paulo, S˜ao Paulo, SP, Brazil. Emails: ghaeser@ime.usp.br, leokoto@ime.usp.br

Departamento de Ingenier´ıa Matem´atica and Centro de Modelamiento Matem´atico (AFB170001 - CNRS IRL2807), Universidad de Chile, Santiago, Chile. Email: hramirez@dim.uchile.cl

(2)

This was considered by the authors a naive extension, since it basically avoids the main difficulties that are expected from a conic framework. What both these works have in common is that they somehow neglected the conic structure of the problem.

In a recent article [6], we introduced weak notions of regularity for nonlinear semidefinite programming (NSDP) that were defined in terms of the eigenvectors of the constraints – therein calledweak-nondegeneracy andweak-Robinson’s CQ. These conditions take into consideration only the diagonal entries of some particular transformation of the matrix constraint. Noteworthy, weak-nondegeneracy happens to be equivalent to thelinear independence CQ (LICQ) when an NLP constraint is modeled as a structurally diagonal matrix constraint, unlike the standardnondegeneracy condition [34], which in turn is considered the usual extension of LICQ to NSDP. Moreover, the proof technique we employed in [6] induces a direct application in the convergence theory of an external penalty method. In this paper, we use these conditions to derive our extension proposals for CRCQ and CPLD to NSDP. These CQs are called, in this paper, as weak-CRCQ and weak-CPLD, respectively.

However, to provide support for algorithms other than the external penalty method, we present stronger variants of these conditions, called sequential-CRCQ and sequential-CPLD (abbreviated as seq-CRCQ and seq-CPLD, respectively), by incorporating perturbations in their definitions. This makes them robust and easily connectible with algorithms that keep track of approximate Lagrange multipliers, but also more exigent.

Nevertheless, seq-CRCQ is still strictly weaker than nondegeneracy, and independent of Robinson’s CQ, while seq-CPLD is strictly weaker than Robinson’s CQ. On the other hand, weak-CRCQ is strictly weaker than seq-CRCQ, while weak-CPLD is strictly weaker than weak-CRCQ and seq-CPLD. Moreover, we show that seq- CPLD implies themetric subregularity CQ. The global convergence results obtained under seq-CPLD, strictly weaker than Robinson’s CQ, are the best results known by the authors for the augmented Lagrangian, sequential quadratic programming, and interior point methods discussed in this paper. In particular, we do not assume boundedness of the dual sequence, which is a somewhat stringent but usually employed assumption on the behavior of these algorithms.

The content of this paper is organized as follows: Section 2 introduces notation and some well-known theorems and definitions that will be useful in the sequel. Our main results for NSDP are presented in Sections3 and4. Indeed, Section3is devoted to the study of weak-CRCQ and weak-CPLD and their properties, which in turn need to invoke weak-nondegeneracy and weak-Robinson’s CQ as a motivation. Section4studies seq-CRCQ and seq-CPLD – the main CQs of this paper – and some algorithms supported by them. In Section5, we discuss the relationship between seq-CPLD and the metric subregularity CQ. Lastly, some final remarks are given in Section6. To abbreviate the paper, we will focus on weak- and seq-CPLD since the extensions of CRCQ follow analogously.

2 A nonlinear semidefinite programming review

In this section,Smdenotes the linear space of all m×mreal symmetric matrices equipped with the inner product defined ashM, Ni .

= trace(M N) =Pm

i,j=1MijNij for allM, N ∈Sm, andSm+ is the cone of all positive semidefinite matrices in Sm. Additionally, for every M ∈ Sm and every τ > 0, we denote by B(M, τ) .

= {Z ∈ Sm: kM −Zk < τ} the open ball centered at M with radius τ with respect to the Frobenius norm kMk .

=p

hM, Mi, and its closure will be denoted by cl(B(M, τ)).

We consider the NSDP problem in standard (dual) form:

Minimize

x∈Rn f(x), subject to G(x)0,

(NSDP) wheref:Rn→RandG:Rn→Smare continuously differentiable functions, andis the partial order induced bySm+; that is,M N if, and only if,M −N ∈Sm+.

Equality constraints are omitted in (NSDP) for simplicity of notation, but our definitions and results are flexible regarding inclusion of such constraints, which should be done in the same way as in [9]. Moreover, throughout the whole paper, we will denote the feasible set of (NSDP) byF.

Let us recall that the orthogonal projection of an elementM ∈SmontoSm+, which is defined as ΠSm

+(M) .

= argmin

NSm+

kM−Nk,

is a Lipschitz continuous function ofM with modulus 1. Furthermore, sinceSm+ is self-dual, everyM ∈Smhas aMoreau decomposition [29, Prop. 1] in the form

M = ΠSm

+(M)−ΠSm

+(−M)

(3)

withhΠSm

+(M),ΠSm

+(−M)i= 0, and aspectral decomposition in the form

M =λ1(M)u1(M)u1(M)>+. . .+λm(M)um(M)um(M)>, (1) whereu1(M), . . . , um(M)∈Rmare arbitrarily chosen orthonormal eigenvectors associated with the eigenvalues λ1(M), . . . , λm(M), respectively. In turn, these eigenvalues are assumed to be arranged in non-increasing order.

Equivalently, we can write (1) asM =UDU>, whereU is an orthogonal matrix whose i-th column isui(M), andD .

= Diag(λ1(M), . . . , λm(M)) is a matrix whose diagonal entries areλ1(M), . . . , λm(M) and the remaining entries are zero.

A convenient property of the orthogonal projection onto Sm+ is that, for everyM ∈Sm, we have ΠSm

+(M) = [λ1(M)]+u1(M)u1(M)>+. . .+ [λm(M)]+um(M)um(M)>, where [· ]+ .

= max{ ·,0}.

Given a sequence of sets{Sk}k∈N, recall itsouter limit (orupper limit) in the sense of Painlev´e-Kuratowski (cf. [32, Def. 4.1] or [17, Def. 2.52]), defined as

Lim supk∈NSk .

=

y:∃I⊆N, ∃{yk}k∈I →y, ∀k∈I, yk ∈Sk ,

which is the collection of all cluster points of sequences {yk}k∈N such that yk ∈ Sk for every k ∈ N. The notationI⊆Nmeans thatI is an infinite subset of the set of natural numbersN.

We denote the Jacobian of Gat a given pointx∈Rn byDG(x), and theadjoint operator ofDG(x) will be denoted by DG(x). Moreover, the i-th partial derivative of Gat xwill be denoted by DxiG(x), and the gradient off atxwill be written as∇f(x), for everyx∈Rn.

2.1 Classical optimality conditions and constraint qualifications

As usual in continuous optimization, we drive our attention towards local solutions of (NSDP) that satisfy the so-calledKarush-Kuhn-Tucker (KKT) conditions, defined as follows:

Definition 2.1. We say that the Karush-Kuhn-Tuckerconditions hold atx∈ F when there exists someY 0 such that

xL(x, Y) = 0 and hG(x), Yi= 0, where L(x, Y) .

=f(x)− hG(x), Yiis the Lagrangian function of (NSDP). The matrix Y is called a Lagrange multiplierassociated withx, and the set of all Lagrange multipliers associated withxwill be denoted by Λ(x).

Of course, not every local minimizer satisfies the KKT conditions in the absence of a CQ. In order to recall some classical CQs, it is necessary to use the(Bouligand) tangent conetoSm+ at a pointM 0. This object can be characterized in terms of any matrix E ∈Rm×(m−r), whose columns form an orthonormal basis of KerM, as follows (e.g., [17, Ex. 2.65]):

TSm

+(M) ={N ∈Sm: E>N E0}, (2) whererdenotes the rank ofM. So, itslineality space, defined as the largest linear space contained inTSm

+(M), is computed as follows:

lin(TSm

+(M)) ={N ∈Sm:E>N E= 0}. (3)

The latter is a direct consequence of the identity lin(C) =C∩(−C), satisfied for any closed convex coneC.

One of the most recognized constraint qualifications in NSDP is the nondegeneracy (or transversality) condition introduced by Shapiro and Fan [34], which can be characterized [17, Eq. 4.172] at a point x ∈ F when the following relation is satisfied:

ImDG(x) + lin(TSm

+(G(x))) =Sm.

Ifxis a local solution of (NSDP) that satisfies nondegeneracy, then Λ(x) is a singleton, but the converse is not necessarily true unless rank(G(x)) + rank(Y) =mholds for someY ∈Λ(x) [17, Prop. 4.75]. This last condition is known asstrict complementarityin this NSDP framework. By (2) it is possible to characterize nondegeneracy atxby means of any given matrixE with orthonormal columns that span KerG(x). Indeed, following [17, Sec.

4.6.1], nondegeneracy holds atxif, and only if, either KerG(x) ={0} or the linear mapping ψx:Rn →Sm−r given by

ψx(·) .

=E>DG(x)[·]E (4)

(4)

is surjective, which is in turn equivalent to saying that the vectors vij(x, E) .

=

e>iDx1G(x)ej, . . . , e>iDxnG(x)ej

>

, 1≤i≤j≤m−r, (5)

are linearly independent [33, Prop. 6], whereei denotes thei-th column ofE andris the rank of G(x).

Another widespread constraint qualification is Robinson’s CQ [31], which can be characterized atx∈ F by the existence of somed∈Rn such that

G(x) +DG(x)[d]∈intSm+ (6)

where intSm+ stands for the topological interior ofSm+. It is known (e.g., [17, Props. 3.9 and 3.17]) that whenx is a local solution of (NSDP), then Λ(x) is nonempty and compact if, and only if, Robinson’s CQ holds atx.

Given the properties and characterizations recalled above, the nondegeneracy condition is typically consid- ered the natural extension of LICQ from NLP to NSDP, while Robinson’s CQ is considered the extension of MFCQ.

2.2 A sequential optimality condition connected to the external penalty method

If we do not assume any CQ, every local minimizer of (NSDP) can still be proved to satisfy at least a sequential type of optimality condition that is deeply connected to the classical external penalty method.

Namely:

Theorem 2.1. Let x be a local minimizer of (NSDP), and let {ρk}k∈N → +∞. Then, there exists some {xk}k∈N→x, such that for eachk∈N,xk is a local minimizer of the regularized penalized function

F(x) .

=f(x) +1

2kx−xk22k

2 kΠSm+(−G(x))k2.

Proof. See [12, Thm. 2]. For a more general proof, see the first part of the proof of [4, Thm. 2].

Note that Theorem2.1provides a sequence{xk}k∈N→xsuch that eachxksatisfies, with an errorεk→0+, the first-order optimality condition of the unconstrained minimization problem

Minimize

x∈Rn f(x) +ρk

2 kΠSm+(−G(x))k2,

so {xk}k∈N characterizes an output sequence of an external penalty method. Moreover, the sequence {Yk}k∈N⊆Sm+, where

Yk .

kΠSm

+(−G(xk))

for every k ∈ N, consists of approximate Lagrange multipliers for x, in the sense that ∇xL(xk, Yk) →0 and complementarity and feasibility are approximately fulfilled, in view of Moreau’s decomposition – indeed, note thathG(xk) + ∆k, Yki= 0 andG(xk) + ∆k0, with ∆k = ΠSm

+(−G(xk))→0, for everyk∈N.

These sequences will suffice to obtain the results of the first part of this paper (Section 3), but in order to extend their scope to a larger class of iterative algorithms, in Section4, we will need a more general sequential optimality condition, which will be presented later on.

2.3 Reviewing constant rank-type constraint qualifications for NLP

This section is meant to be a brief review of the main results regarding the classical nonlinear programming problem:

Minimize

x∈Rn f(x),

subject to g1(x)≥0, . . . , gm(x)≥0,

(NLP) wheref, g1, . . . , gm:Rn→Rare continuously differentiable functions.

As far as we know, the first constant rank-type constraint qualification was introduced by Janin [25], to obtain directional derivatives for the optimal value function of a perturbed NLP problem. Janin’s condition is defined as follows:

Definition 2.2. Letx∈ F. The constant rank constraint qualification for (NLP)(CRCQ) holds atxif there exists a neighborhoodV ofxsuch that, for every subsetJ ⊆ {i∈ {1, . . . , m}:gi(x) = 0}, the rank of the family {∇gi(x)}i∈J remains constant for allx∈ V.

(5)

As noticed by Qi and Wei [30] it is possible to rephrase Definition 2.2 in terms of the “constant linear dependence” of{∇gi(x)}i∈J for everyJ. That is, CRCQ holds atxif, and only if, there exists a neighborhood V of x such that, for every J ⊆ {i ∈ {1, . . . , m}: gi(x) = 0}, if {∇gi(x)}i∈J is linearly dependent, then {∇gi(x)}i∈J remains linearly dependent for everyx∈ V. Based on this characterization, Qi and Wei proposed a relaxed version of CRCQ, called constant positive linear dependence (CPLD) condition, to obtain improved global convergence results for a sequential quadratic programming method, but this was only proven to be a constraint qualification a few years later, in [14]. To properly present CPLD, recall that a family of vectors {zi}i∈J ofRn is said to be positively linearly independent when

X

i∈J

ziαi= 0, αi ≥0, ∀i∈J ⇒ αi= 0, ∀i∈J.

Next, we recall the CPLD constraint qualification:

Definition 2.3. Let x∈ F. The constant positive linear dependence condition for (NLP) (CPLD) holds at x if there exists a neighborhood V of x such that, for every J ⊆ {i ∈ {1, . . . , m}: gi(x) = 0}, if the family {∇gi(x)}i∈J is positively linearly dependent, then{∇gi(x)}i∈J remains linearly dependent for allx∈ V.

Clearly, CPLD is implied by CRCQ, which is in turn implied by LICQ and is independent of MFCQ.

Moreover, CPLD is implied by MFCQ, and all those implications are strict [14,25]. To prove the main results of this paper (Theorems3.1 and4.2), we shall take inspiration in [10], where the authors employ Theorem2.1 together with the well-knownCarath´eodory’s Lemma:

Lemma 2.1(Exercise B.1.7 of [15]). Let z1, . . . , zp ∈Rn, and letα1, . . . , αp∈Rbe arbitrary. Then, there exist someJ ⊆ {1, . . . , p}and some scalars α˜i withi∈J, such that {zi}i∈J is linearly independent,

p

X

i=1

αizi =X

i∈J

˜ αizi,

andαiα˜i>0, for alli∈J.

If one considers equality constraints in (NSDP) separately, one should employ an adapted version of Carath´eodory’s Lemma that fixes a particular subset of vectors, which can be found in [10, Lem. 2]. In our current setting, Lemma2.1will suffice as is.

3 Weak constant rank constraint qualifications for NSDP

The main purpose of this section is to present an extension of the CPLD condition for NSDP, but to do so we first need to know how CRCQ looks like in NSDP, which is a challenging problem on its own. Based on the relation between LICQ and CRCQ in NLP, the most natural candidate for CRCQ in NSDP consists of demanding every subset of

{vij(x, E) : 1≤i≤j≤m−r}

to remain with constant rank (or constant linear dependent) in a neighborhood ofx. However, this candidate cannot be a CQ, as shown in the following counterexample, adapted from [2, Eq. 2]:

Example 3.1. Consider the problem to minimizef(x) .

=−xsubject to G(x) .

=

x x+x2 x+x2 x

0.

For this problem, x .

= 0 is the only feasible point and, therefore, the unique global minimizer of the problem.

SinceG(x) = 0, the columns of the matrixE .

=I2form an orthonormal basis of KerG(x)(the whole spaceR2).

For this choice ofE, we have

v11(x, E) =v22(x, E) = 1 and v12(x, E) = 1 + 2x.

Since they are all bounded away from zero, the rank of every subset of {vij(x, E) : 1 ≤ i ≤ j ≤ 2} remains constant for every x around x. However, Note that x does not satisfy the KKT conditions because any Y .

= Y11 Y12

Y12 Y22

∈Λ(x)would necessarily be a solution of the system Y11≥0,

Y22≥0,

Y11Y22−Y212≥0, Y11+ 2Y12+Y22=−1,

(6)

which has no solution.

Besides, it is well known that even if G is affine, not all local minimizers of (NSDP) satisfy the KKT conditions, but in this case every subfamily of{vij(x, E) : 1≤i≤j ≤m−r} remains with constant rank for every x∈Rn.

What Example 3.1 tells us is thatE =I2 may be a bad choice ofE. In fact, let us choose a different E, namely, denote the columns ofE bye1

= [a, b]. >ande2

= [c, d]. >, and takea=−1/√

2 andb=c=d= 1/√ 2.

This election ofE happens to diagonalizeG(x) for everyx, but it follows that v11(x, E) = 1 + 2ab(1 + 2x) =−2x;

v22(x, E) = 1 + 2cd(1 + 2x) = 2(1 +x);

v12(x, E) = (ad+bc)(1 + 2x) = 0,

and the rank of{v11(x, E)} does not remain constant in a neighborhood ofx= 0.

In light of our previous work [6], the situation presented above is not surprising. Therein, we already noted that identifying the “good” matricesE allows us to obtain relaxed versions of nondegeneracy and Robinson’s CQ for NSDP. This identification can also be used to extend constant-rank type conditions to NSDP and is the starting point for the results we will present.

For the sake of completeness, let us quickly summarize a discussion raised in [6] before presenting the results of this paper. Consider a feasible pointx∈ Fand denote byrthe rank ofG(x). Observe thatλr(M)> λr+1(M) for everyM ∈Smclose enough toG(x). Thus, whenr < m, define the set

Er(M) .

=

E∈Rm×(m−r): M E=EDiag(λr+1(M), . . . , λm(M)) E>E=Im−r

, (7)

which consists of all matrices whose columns are orthonormal eigenvectors associated with them−rsmallest eigenvalues of M, which is well defined whenever λr(M) > λr+1(M). In (7), Diag(λr+1(M), . . . , λm(M)) denotes the diagonal matrix whose diagonal entries areλr+1(M), . . . , λm(M). By convention,Er(M) .

=∅when r=m. By construction,Er(M) is nonempty provided r < mandM is close enough toG(x). In particular, in this situation,Er(G(x)) is the set of all matrices with orthonormal columns that span KerG(x).

We showed, in [6, Prop. 3.2], that nondegeneracy can be equivalently stated as the linear independence of the smaller family,{vii(x, E)}i∈{1,...,m−r}, as long as this holds for allE∈ Er(G(x)) instead of a fixed one. Similarly, Robinson’s CQ can be translated as the positive linear independence of the family {vii(x, E)}i∈{1,...,m−r} for every E ∈ Er(G(x)) [6, Prop. 5.1]. This characterization suggested a weak form of nondegeneracy (and Robinson’s CQ) that takes into account only a particular subset of Er(G(x)) instead of the whole set, which reads as follows:

Definition 3.1(Def. 3.2 and Def. 5.1 of [6]). Letx∈ F and letrbe the rank ofG(x). We say thatxsatisfies:

• Weak-nondegeneracy conditionfor NSDP if eitherr=mor, for each sequence{xk}k∈N→x, there exists some E∈Lim supk∈NEr(G(xk))1 such that the family {vii(x, E)}i∈{1,...,m−r} is linearly independent;

• Weak-Robinson’s CQ condition for NSDP if either r = m or, for each sequence {xk}k∈N → x, there exists some E ∈ Lim supk∈NEr(G(xk)) such that the family {vii(x, E)}i∈{1,...,m−r} is positively linearly independent.

Note that, in general, Lim supk∈NEr(G(xk))⊆ Er(G(x)), but the reverse inclusion is not always true, meaning Er(G(x)) is not necessarily continuous at xas a set-valued mapping. It then follows that weak-nondegeneracy is indeed a strictly weaker CQ than nondegeneracy [6, Ex. 3.1]. Moreover, in contrast with nondegeneracy, weak-nondegeneracy happens to fully recover LICQ whenG(x) is a structurally diagonal matrix constraint in the formG(x) .

= Diag(g1(x), . . . , gm(x)) [6, Prop. 3.3]. Similarly, weak-Robinson’s CQ is implied by Robinson’s CQ and coincides with MFCQ whenG(x) is diagonal.

3.1 Weak-CRCQ and weak-CPLD

A straightforward relaxation of weak-nondegeneracy and weak-Robinson’s CQ, likewise NLP, leads to our first extension proposal of CRCQ and CPLD, respectively, to NSDP:

Definition 3.2 (weak-CRCQ and weak-CPLD). Let x ∈ F and let r be the rank of G(x). We say that x satisfies the:

1Without further mention, we will endowRm×(m−r)with a norm, which can be the Frobenius norm or a column-wise maximum norm, for instance.

(7)

• Weak constant rank constraint qualificationfor NSDP (weak-CRCQ) if eitherr=mor, for each sequence {xk}k∈N→x, there exists someE∈Lim supk∈NEr(G(xk))such that, for every subsetJ ⊆ {1, . . . , m−r}:

if the family{vii(x, E)}i∈J is linearly dependent, then{vii(xk, Ek)}i∈J remains linearly dependent, for all k∈I large enough.

• Weak constant positive linear dependence constraint qualificationfor NSDP (weak-CPLD) if eitherr=m or, for each sequence{xk}k∈N→x, there exists someE∈Lim supk∈NEr(G(xk))such that, for every subset J ⊆ {1, . . . , m−r}: if the family {vii(x, E)}i∈J is positively linearly dependent, then {vii(xk, Ek)}i∈J

remains linearly dependent, for all k∈I large enough.

For both definitions,I⊆N, and{Ek}k∈I is a sequence converging toE and such that Ek ∈ Er(G(xk))for every k∈I, as required by the Painlev´e-Kuratowski outer limit.

Observe that if weak-nondegeneracy holds at x, then weak-CRCQ is vacuously true (because there is no linearly dependent family of vector {vii(x, E)}i∈J whenJ ⊆ {1, . . . , m−r}), and weak-CRCQ in turn implies weak-CPLD. Similarly, the condition weak-Robinson’s CQ implies weak-CPLD as well. However, Robinson’s CQ and its weak variant are both independent of weak-CRCQ. In fact, the next example shows that weak-CRCQ is not implied by either (weak-)Robinson’s CQ or weak-CPLD.

Example 3.2. Let us consider the constraint G(x) .

=

2x1+x22 −x22

−x22 2x1+x22

0 and note that, for every orthogonal matrixE in the form

E .

= a c

b d

,

we have

v11(x, E) =

2 2(a−b)2x2

and v22(x, E) =

2 2(c−d)2x2

.

Then, at x= 0, we have v11(x, E) =v22(x, E) = [2,0]>, so they are linearly dependent, but positively linearly independent for all E∈ Er(G(x)). However, choosing any sequence {xk}k∈N→0 such that xk26= 0 for all k, it follows that the eigenvalues ofG(xk):

λ1(G(xk)) = 2(xk1+ (xk2)2) and λ2(G(xk)) = 2xk1, are simple, with associated orthonormal eigenvectors

u1(G(xk)) =

− 1

√2, 1

√2 >

and u2(G(xk)) = 1

√2, 1

√2 >

,

respectively, for every k ∈ N. Then, the only sequence {Ek}k∈N such that Ek ∈ Er(G(xk)) for every k, up to sign, is given by a = −1/√

2 and b = c = d = 1/√

2. However, keep in mind that vii(x, E), i ∈ {1,2}, is invariant to the sign of the columns of E, so v22(xk, Ek) = [2,0]> andv11(xk, Ek) = [2,4xk2]> are linearly independent for all large k. Therefore, we conclude that (weak-)Robinson’s CQ holds at x, and consequently weak-CPLD also holds, but weak-CRCQ does not hold atx.

Conversely, we show with another counterexample, that weak-CRCQ does not imply (weak-)Robinson’s CQ, and neither does weak-CPLD.

Example 3.3. Let us consider the constraint G(x) .

=

x x2 x2 −x

0

and the point x = 0. Take any sequence {xk}k∈N → x such that xk 6= 0 for every k, and consider two subsequences of it, indexed by I+ and I, such that xk >0 for every k ∈ I+, and xk <0 for every k ∈ I. Then, for everyk∈I+, we have that:

λ1(G(xk)) =xk q

(xk)2+ 1 and λ2(G(xk)) =−xk q

(xk)2+ 1,

(8)

are simple, with associated orthonormal eigenvectors uniquely determined (up to sign) by u1(G(xk)) = 1

η1k

"

1 +p

(xk)2+ 1 xk ,1

#>

and u2(G(xk)) = 1 ηk2

"

1−p

(xk)2+ 1 xk ,1

#>

,

where

η1k .

= v u u

t 1 +p

(xk)2+ 1 xk

!2

+ 1 and η2k .

= v u u

t 1−p

(xk)2+ 1 xk

!2

+ 1.

Moreover, one can verify that whenever I+ is an infinite set, lim

k∈I+

u1(G(xk)) = [1,0]> and lim

k∈I+

u2(G(xk)) = [0,1]>. Then, we have that for allE∈Lim supk∈I+Er(G(xk)), the vectors

v11(x, E) = 1 and v22(x, E) =−1

are positively linearly dependent. In addition, since η1k→ ∞andηk2 →1, the vectors v11(xk, Ek) =(ηk1)2+ 4p

(xk)2+ 1 + 2

k1)2 and v22(xk, Ek) = (η2k)2−4p

(xk)2+ 1 + 2 (η2k)2

are nonzero and have opposite signs; and thus, remain positively linearly dependent, for all large k∈I+. For the indices k ∈ I the order of λ1(G(xk)) and λ2(G(xk)) is swapped, together with their respective eigenvectors, and we have limk∈Iu1(G(xk)) = [0,1]>and limk∈Iu2(G(xk)) = [−1,0]>. Hence, for all E ∈ Lim supk∈IEr(G(xk)), the vectors

v11(x, E) =−1 and v22(x, E) = 1

are also positively linearly dependent. The order ofv11(xk, Ek)andv22(xk, Ek)is also swapped, so they remain positively linearly dependent for all largek∈I.

By the above reasoning, observe that any sequence {xk}k∈N→x, such that xk 6= 0 for every k∈N, shows that (weak-)Robinson’s CQ fails at x. Moreover, if xk = 0 for infinitely many indices, we may simply take Ek=E=I2for everyk, and thenv11(xk, Ek) =v11(x, E) = 1andv22(xk, Ek) =v22(x, E) =−1are positively linearly dependent for everyk∈N. This completes checking that weak-CPLD and weak-CRCQ both hold at x, while (weak-)Robinson’s CQ does not.

Just as it happens in NLP, the weak-CPLD condition is strictly weaker than (weak-)Robinson’s CQ, and also weaker than weak-CRCQ, which are in turn, independent. Nevertheless, we will show next that weak-CPLD is still sufficient for the existence of Lagrange multipliers at all local solutions of (NSDP). To do so, we get inspiration from the proof of [14, Thm. 3.1]), originally presented for NLP, and the proof of [6, Thm. 3.2].

That is, we analyse the sequence from Theorem2.1 in terms of the spectral decomposition of its approximate Lagrange multiplier candidates, under weak-CPLD. Then, we use Carath´eodory’s Lemma 2.1 to construct a bounded sequence from it, that converges to a Lagrange multiplier. As an intermediary step, we also obtain a convergence result of the external penalty method to KKT points under weak-CPLD, a fact that is emphasized in the statement of the next theorem. The above statements regarding existence of Lagrange multipliers and convergence of the external penalty are also true for weak-CRCQ. In fact, because it is understood that almost all properties of weak-CPLD that we study in this paper are naturally carried over to weak-CRCQ, we shall omit the latter from this point onwards to avoid redundancy.

Theorem 3.1. Let {ρk}k∈N→ ∞ and{xk}k∈N→x∈ F be such that

xL

xk, ρkΠSm+(−G(xk))

→0.

If xsatisfies weak-CPLD, thenxsatisfies the KKT conditions. In particular, every local minimizer of(NSDP) that satisfies weak-CPLD also satisfies the KKT conditions.

Proof. LetYk .

kΠSm

+(−G(xk)), for everyk ∈N. Recall that we assume λ1(−G(xk))≥. . . ≥λm(−G(xk)), for every k, and denote by r the rank of KerG(x). Note that when k is large enough, say greater than some k0, we necessarily haveλi(−G(xk)) =−λm−i+1(G(xk))<0 for all i∈ {m−r+ 1, . . . , m}. LetI ⊆N, and

(9)

{Ek}k∈I →Ebe such thatEk∈ Er(G(xk)) for everyk∈I, as described in Definition3.2. Then, for eachk∈I greater thank0, the spectral decomposition ofYk is given by

Yk=

m−r

X

i=1

αkieki(eki)>, where αki .

= [ρkλi(−G(xk))]+ ≥0 and eki denotes thei-th column of Ek, for every i ∈ {1, . . . , m−r}. Since

xL(xk, Yk)→0, we have

∇f(xk)−

m−r

X

i=1

αikDG(xk)

eki(eki)>

→0, (8)

but note that

DG(xk)

eki(eki)>

=

hDx1G(xk), eki(eki)>i ...

hDxnG(xk), eki(eki)>i

=

(eki)>Dx1G(xk)eki ...

(eki)>DxnG(xk)eki

=vii(xk, Ek), so we can rewrite (8) as

∇f(xk)−

m−r

X

i=1

αkivii(xk, Ek)→0.

Using Carath´eodory’s Lemma2.1for the family{vii(xk, Ek)}i∈{1,...,m−r}, for each fixed k∈I, we obtain some Jk⊆ {1, . . . , m−r}such that{vii(xk, Ek)}i∈Jk is linearly independent and

∇f(xk)−

m−r

X

i=1

αkivii(xk, Ek) =∇f(xk)−X

i∈J

˜

αkivii(xk, Ek), (9) where ˜αik ≥0 for every k∈I and every i∈Jk. By the pigeonhole principle, we can assumeJk is the same, say equal toJ, for infinitely manyk∈I. Without loss of generality, let us say that this holds for allk∈I. We claim that the sequences{α˜ki}k∈I are all bounded. In order to prove this, suppose that

mk .

= max

i∈J{α˜ki}

is unbounded withk∈I, divide (9) bymk and note that mk → ∞on a subsequence implies that the vectors vii(x, E),i∈J, are positively linearly dependent. On the other hand, the vectorsvii(xk, Ek),i∈J, are linearly independent for all large k, which contradicts weak-CPLD. Finally, note that every collection of limit points {αi:i ∈ J} of their respective sequences {α˜i}k∈I, i ∈ J, generates a Lagrange multiplier associated with x, which isY .

=P

i∈Jαiui(G(x))ui(G(x))>. Here,ui(G(x)) is the eigenvector associated with thei-th eigenvalue ofG(x) (see (1)). Thus, xis a KKT point.

The second part of the statement of the theorem follows from Theorem2.1.

Back to Example 3.1, observe that weak-CPLD does not hold at x = 0, as expected. Indeed, for any sequence{xk}k∈N→0 such thatxk<0 for allk, the matrixG(xk) has only simple eigenvalues, for all largek, soEk ∈ Er(G(xk)) is unique up to sign. Without loss of generality, we can assume

Ek .

= 1

√2

−1 1

1 1

,

and then we have v11(xk, Ek) = −2xk > 0, which is linearly independent for all k while v11(x, E) = 0 is positively linearly dependent. Thus weak-CPLD as in Definition3.2is not satisfied.

Remark 3.1. In [9], we presented a different extension proposal of CPLD to NSDP problems with multiple con- straints, which is weaker than Robinson’s CQ for a single constraint as in (NSDP)only when the zero eigenvalue ofG(x)is simple. We called this definition the “naive extension of CPLD”. We remark that Definition3.2co- incides with the naive extension of CPLD when zero is a simple eigenvalue ofG(x), which makes Definition3.2 an improvement of it, or a “non-naive variant” of it.

The phrasing of Theorem 3.1was chosen to draw the reader’s attention to the fact that it is, essentially, a convergence proof of the external penalty method to KKT points, under weak-CPLD. To obtain a more general convergence result, in the next section we introduce new constant rank-type CQs for NSDP that support every algorithm that converges with a more general type of sequential optimality condition. Then, we prove some properties of these new conditions, and we compare them with weak-CPLD and weak-CRCQ.

(10)

4 Stronger sequential-type constant rank CQs for NSDP and global convergence of algorithms

The so-calledApproximate Karush-Kuhn-Tucker(AKKT) condition, which was brought from NLP to NSDP by Andreani et al. [12], is a general purpose sequential optimality condition that encompasses, for instance, Theorem 2.1. Such a notion has been recently refined and generalized to optimization problems in Banach spaces [18] and also problems with geometric constraints [26]. Let us recall one (the most convenient for this paper) of its many characterizations2.

Definition 4.1 (Def. 4 of [4]). We say that a point x ∈ F satisfies the AKKT condition when there exist sequences {xk}k∈N → x and {Yk}k∈N ⊆Sm+, and perturbation sequences {δk}k∈N ⊆ Rn and {∆k}k∈N ⊆ Sm, such that:

1. ∇xL(xk, Yk) =δk, for everyk∈N;

2. G(xk) + ∆k 0and hG(xk) + ∆k, Yki= 0, for everyk∈N;

3. ∆k→0 andδk→0.

Note that{Yk}k∈Nis a sequence of approximate Lagrange multipliers ofx, in the sense thatYk is an exact Lagrange multiplier, atx=xk, for the perturbed problem

Minimize

x∈Rn

f(x) +hx−x, δki, subject to G(x) + ∆k 0.

The main goal in enlarging the class of approximate Lagrange multipliers Yk and perturbations ∆k as in Definition4.1instead of considering only the ones given by Theorem 2.1, is to capture the output sequences of a larger class of iterative algorithms. In the next two subsections, we illustrate the previous statement. What is remarkable is that the proof of Theorem 3.1can still be somewhat conducted considering this more general class of sequences, arriving at strong global convergence results for such algorithms (Theorem4.2).

4.1 Example 1: A safeguarded augmented Lagrangian method

Let us briefly recall a variant of the Powell-Hestenes-Rockafellar augmented Lagrangian algorithm that employs asafeguarding technique, which is the direct generalization of the one studied in [16]. The variant we use is also a generalization of [10, Pg. 13] and [12, Alg. 1], for instance.

For an arbitrary penalty parameter ρ >0 and a safeguarded multiplier Y˜ 0, we defineLρ,Y˜ :Rn →Ras theAugmented Lagrangian function of (NSDP), which is given by

Lρ,Y˜(x) .

=f(x) +ρ 2

ΠSm+ −G(x) +Y˜ ρ

!

2

− 1 2ρ

2

.

Since it will be useful in the convergence proof, we compute the gradient ofLρ,Y˜ atxbelow:

∇Lρ,Y˜(x) =∇f(x)−DG(x)

"

ρΠSm+ −G(x) +Y˜ ρ

!#

. (10)

Now, we state the algorithm:

2Definition4.1coincides with the AKKT condition presented in [12, Def. 3.1]. See, for instance, [4, Prop. 4].

(11)

Algorithm 1Safeguarded augmented Lagrangian method

Input: A sequence{εk}k∈N of positive scalars such thatεk →0; a nonempty convex compact setB ⊂Sm+; real parametersτ >1,σ∈(0,1), and ρ1>0; and initial points (x0,Y˜1)∈Rn× B. Also, definekV0k=∞.

Initializek←1. Then:

Step 1 (Solving the subproblem): Compute an approximate stationary pointxk ofLρ

k,Y˜k(x), that is, a point xk such that

k∇Lρ

k,Y˜k(xk)k ≤εk; Step 2 (Updating the penalty parameter): Calculate

Vk .

= ΠSm

+ −G(xk) + Y˜k

ρk

!

−Y˜k

ρk ; (11)

Then,

a. Ifk= 1 orkVkk ≤σkVk−1k, set ρk+1

=. ρk; b. Otherwise, take ρk+1 such thatρk+1≥τ ρk.

Step 3 (Estimating a new safeguarded multiplier): Choose any ˜Yk+1∈ B, setk←k+ 1 and go to Step 1.

By the definition of projection we have that ˜Yk= ΠSm

+( ˜Yk−ρkG(xk)) if, and only if, ˜Yk∈Sm+, G(xk)∈Sm+

and hY˜k, G(xk)i = 0, which means that Vk = 0 if, and only if, the pair (xk,Y˜k) is primal-dual feasible and complementary. Moreover, note that Algorithm 1 does not necessarily keep a record of the approximate multiplier sequence associated with{xk}k∈N, which is

Yk .

kΠSm

+ −G(xk) + Y˜k

ρk

!

. (12)

These are usually computed, however, in several practical implementations of it, where ˜Yk+1 is chosen as the projection ofYk ontoB. Also, with these multipliers at hand, it is very easy to prove that any feasible limit pointxof{xk}k∈Nmust satisfy the AKKT condition:

Theorem 4.1. Fix any choice of parameters in Algorithm 1and let{xk}k∈N be the output sequence generated by it. If{xk}k∈Nhas a convergent subsequence {xk}k∈I →x, then:

1. The pointxis stationary for the problem of minimizing 12Sm

+(−G(x))k2; 2. Ifxis feasible, then xsatisfies the AKKT condition.

Proof. Let{εk}k∈N→0,{Y˜k}k∈N⊂ B ⊂Sm+,τ >1,σ∈(0,1), andρ1>0 be given as required by Algorithm1.

Moreover, let{ρk}k∈Nand{Vk}k∈N be computed as in Step 2. For simplicity, let us also assume thatI=N. 1. This part of the proof is standard; see, for instance, [4, Prop. 4.3];

2. Define {Yk}k∈N as in (12) and take ∆k .

=Vk for allk∈N, whereVk is as given in (11). Then, it follows from Step 1 that ∇xL(xk, Yk) =∇Lρ

k,Y˜k(xk)→0. We also have G(xk) + ∆k = ΠSm

+ G(xk)−Y˜k ρk

!

for every k ∈ N, which yields hYk, G(xk) + ∆ki= 0 for every k. If ρk → ∞, then recall that {Y˜k}k∈N

remains bounded and thusVk →ΠSm

+(−G(x)) by definition and ΠSm+(−G(x)) = 0 becausexis assumed to be feasible; on the other hand, ifρkremains bounded, thenVk→0 due to Step 2-a. Therefore, ∆k→0 andxsatisfies the AKKT condition.

(12)

Note that when ˜Yk is set as zero for every k, then Algorithm 1 reduces to the external penalty method, meaning Theorem4.1also covers this method.

4.2 Example 2: A sequential quadratic programming method

Next, we recall Correa and Ram´ırez’s [20] sequential quadratic programming (SQP) method:

Algorithm 2General SQP method

Input:A real parameterτ >1, a pair of initial points (x1, Y0)∈Rn×Sm+, and an approximation of∇2xL(x1, Y0) given byH1.

Initializek←1. Then:

Step 1 (Solving the subproblem): Compute a solution dk, together with its Lagrange multiplier Yk, of the problem

Minimize

d∈Rn

d>Hkd+∇f(xk)>d, subject to G(xk) +DG(xk)d∈Sm+,

(Lin-QP) and ifdk= 0, stop;

Step 2 (Step corrections): Perform line search to find a steplengthαk ∈(0,1) satisfyingArmijo’s rule

f(xkkdk)−f(xk)≤τ αk∇f(xk)>dk. (13) Step 3 (Approximating the Hessian): Setxk+1←xkkdk, compute a positive definite approximationHk+1 of∇2xL(xk+1, Yk), setk←k+ 1, and go to Step 1.

The SQP algorithm generates AKKT sequences as well, as it can be seen in the following proposition:

Proposition 4.1. Assume that Step 1 of Algorithm 2 is always well defined. If there is an infinite subset I ⊆ N such that limk∈Idk = 0 and {kHkk}k∈I is bounded, then any limit point x of {xk}k∈I satisfies the AKKT condition.

Proof. By the KKT conditions for (Lin-QP), there exists someYk0 such that

∇f(xk) +Hkdk−DG(xk)[Yk] = 0, (14) hG(xk) +DG(xk)dk, Yki= 0. (15) Set ∆k .

=DG(xk)dk for every k ∈I and since dk → 0, we obtain that limk∈IHkdk = 0 and limk∈Ik = 0.

Moreover, sincedk is feasible,G(xk) + ∆k0. Thus,xsatisfies the AKKT condition.

The hypothesis on the convergence of a subsequence of {dk}k∈N to zero, directly or indirectly, is somewhat common regarding some types of SQP methods, as well as the boundedness ofHk– see, for instance, [11,20,30].

Moreover, the constant rank condition that will be presented in the next section ensures that (Lin-QP) is well defined as long asd= 0 is feasible for allksuch thatxkis sufficiently close tox(see Remark5.1for a discussion).

4.3 Sequential constant rank CQs for NSDP

Inspired by the AKKT condition, we are led to introduce a small perturbation in weak-CPLD and weak- CRCQ, which makes them stronger, but also brings some useful properties in return. At first, we present it in a form that resembles Definition 3.2, for comparison purposes. Later, for convenience, we will provide a characterization of it without sequences.

Definition 4.2 (seq-CRCQ and seq-CPLD). Let x∈ F and letr be the rank ofG(x). We say that xsatisfies the

(13)

1. Sequential CRCQ condition for NSDP (seq-CRCQ) if r = m or, for all sequences {xk}k∈N → x and {∆k}k∈N ⊆ Sm with ∆k → 0, there exists E ∈ Lim supk∈NEr(G(xk) + ∆k) such that, for every subset J ⊆ {1, . . . , m−r}: if the family {vii(x, E)}i∈J is linearly dependent, then {vii(xk, Ek)}i∈J remains linearly dependent, for allk∈I large enough.

2. Sequential CPLD condition for NSDP (seq-CPLD) if r = m or, for all sequences {xk}k∈N → x and {∆k}k∈N ⊆ Sm with ∆k → 0, there exists E ∈ Lim supk∈NEr(G(xk) + ∆k) such that, for every subset J ⊆ {1, . . . , m−r}: if{vii(x, E)}i∈Jis positively linearly dependent, then{vii(xk, Ek)}i∈J remains linearly dependent, for all k∈I large enough.

For both definitions,I⊆N, and{Ek}k∈I is a sequence converging toE and such that Ek ∈ Er(G(xk) + ∆k) for everyk∈I, as required by the Painlev´e-Kuratowski outer limit.

Note that the only difference between Definitions 3.2 and 4.2 is the perturbation matrix ∆k → 0. In particular, set ∆k .

= 0 for every kto see that seq-CRCQ and seq-CPLD imply weak-CRCQ and weak-CPLD, respectively. Moreover, both implications are strict, as we can see in the following example:

Example 4.1. Consider the constraint

G(x) .

= x 0

0 −x

0

at the point x= 0, so in this caser= 0. For every sequence {xk}k∈N→x, we have Er(G(xk))⊂

1 0 0 1

, 0 1

1 0

,

for everyk∈Nsuch thatxk 6=x, whereas ifxk=x, thenEr(G(xk))is the set of all orthogonal2×2matrices.

Let us assume without loss of generality that xk ≤0, thus we may take Ek = I2 for every k ∈ N to see that both, weak-CRCQ and weak-CPLD, hold atx, since

v11(xk, Ek) = 1 and v22(xk, Ek) =−1 are nonzero and (positively) linearly dependent for everyk∈N.

On the other hand, take

k .

=

2xk(xk+ 1)2 xk(xk+ 1) xk(xk+ 1) 3xk+xk(xk+ 1)2

,

with xk →x,xk>0, and note that the eigenvectors of G(xk) + ∆k =xk

2(xk)2+ 4xk+ 3 xk+ 1 xk+ 1 (xk)2+ 2xk+ 3

=

−1 xk+ 1 xk+ 1 1

xk 0 0 2xk

−1 xk+ 1 xk+ 1 1

are uniquely determined up to sign. Then, because vii(x, E),i∈ {1,2}, is invariant to the sign of the columns of E, we can assume without loss of generality that anyEk ∈ Er(G(xk) + ∆k)has the form

Ek = 1

p1 + (xk+ 1)2

−1 xk+ 1 xk+ 1 1

for everyk∈N. Then, for any sequence{Ek}k∈Nsuch that Ek∈ Er(G(xk) + ∆k)for every k, we have v11(xk, Ek) = 1−(xk+ 1)2 and v22(xk, Ek) = (xk+ 1)2−1,

which are both nonzero, but if E is a limit point of {Ek}k∈N, then v11(x, E) = v22(x, E) = 0. Thus, neither seq-CRCQ nor seq-CPLD hold atx.

Furthermore, since nondegeneracy can be characterized as the linear independence ofvii(x, E),i∈ {1, . . . , m−

r}, for everyE∈ Er(G(x)) [6, Prop. 3.2], we observe that it implies seq-CRCQ (see also Remark4.1at the end of this section), but this implication is also strict. Let us show this with a counterexample.

Referências

Documentos relacionados

These tools allowed us to define an Augmented Lagrangian algorithm whose global convergence analysis can be done without relying on the Mangasarian-Fromovitz constraint

In recent years, the theoretical convergence of iterative methods for solving nonlinear con- strained optimization problems has been addressed using sequential optimality

Tapia, On the global convergence of a modified augmented Lagrangian linesearch interior-point method for Nonlinear Programming, Journal of Optimization The- ory and Applications

The convergence theory of [21] assumes that the nonlinear programming problem is feasible and proves that limit points of sequences generated by the algorithm are ε-global

A mobilidade para pensar, para saber, para conhecer-se e refletir, sem medo de uma “alma penada”, de uma mente que comanda ou, o cérebro senhor do escravo corpo (RENGEL, 2007,

Sequential optimality conditions have played a major role in proving stronger global convergence re- sults of numerical algorithms for nonlinear programming.. Several extensions

In this paper we establish such conditions for two classes of nonlinear conic problems, namely semidefinite and second-order cone programming, assuming Robinson’s

“Uma classificação de estilos classificatórios seria um primeiro passo positivo para se pensar sistematicamente sobre os distintos estilos de raciocínio”