• Nenhum resultado encontrado

ON REGULARIZATION AND ACTIVE-SET METHODS WITH

N/A
N/A
Protected

Academic year: 2022

Share "ON REGULARIZATION AND ACTIVE-SET METHODS WITH"

Copied!
26
0
0

Texto

(1)

COMPLEXITY FOR CONSTRAINED OPTIMIZATION

2

E. G. BIRGIN AND J. M. MART´ıNEZ 3

Abstract. The main objective of this research is to introduce a practical method for smooth 4

bound-constrained optimization that possesses worst-case evaluation complexityO(ε−3/2) for finding 5

anε-approximate first-order stationary point when the Hessian of the objective function is Lipschitz- 6

continuous. As other well established algorithms for optimization with box constraints, the algorithm 7

proceeds visiting the different faces of the domain aiming to reduce the norm of an internal projected 8

gradient and abandoning active constraints when no additional progress is expected in the current 9

face. The introduced method emerges as a particular case of a method for minimization with linear 10

constraints. Moreover, the linearly-constrained minimization algorithm is an instance of a mini- 11

mization algorithm with general constraints whose implementation may be unaffordable when the 12

constraints are complicated. As a procedure for leaving faces, it is employed a different method that 13

may be regarded as an independent device for constrained optimization. Such independent algorithm 14

may be employed to solve linearly-constrained optimization problem on its own, without relying on 15

the active-set strategy. A careful implementation and numerical experiments shows that the algo- 16

rithm that combines active sets with leaving-face iterations is more effective than the independent 17

algorithm on which leaving-face iterations are based, although both exhibits similar complexities 18

O(ε−3/2).

19

Key words. Nonlinear programming, bound-constrained minimization, active-set strategies, 20

regularization, complexity.

21

AMS subject classifications. 90C30, 65K05, 49M37, 90C60, 68Q25.

22

1. Introduction. In this paper, we address the problem of minimizing a smooth

23

and generally nonconvex function within a region of the Euclidean finite dimensional

24

space. Initially, we will present a cubic regularization algorithm with cubic descent

25

that finds an approximate first-order stationary point with arbitrary precision ε or

26

a boundary point of the feasible region with evaluation complexity O(ε−3/2). In ad-

27

dition, complexity for finding second-order stationary points and convergence results

28

will be presented. Secondly, we introduce an algorithm for minimization on arbitrary

29

and generally nonconvex regions defined by inequalities and equalities. The evaluation

30

complexity of this algorithm for finding first-order stationary points is alsoO(ε−3/2),

31

when one uses cubic regularization of the functional quadratic approximation. The

32

most general form, in which we use pth Taylor polynomials to define subproblems,

33

has complexityO(ε−(p+1)/p). A version of this algorithm for minimizing smooth func-

34

tions with linear constraints will be introduced and implemented, for the casep= 2.

35

Moreover, the problem of minimizing with linear constraints will be addressed also

36

in a different way. Namely, we will consider each face of the polytope as a finite di-

37

mensional region within which we may employ the algorithm firstly introduced in the

38

paper. As mentioned above, such algorithm either finds an approximate stationary

39

point within the face or finds a point on its boundary. Then, as a natural combination

40

of the two first algorithms so far introduced, we will use the first one for minimizing

41

within faces and the second one for giving up constraints, when the current face is

42

Submitted to the editors April 24, 2017.

Funding: Supported by FAPESP (grants 2013/03447-6, 2013/05475-7, 2013/07375-0, and 2014/18711-3) and CNPq (grants 309517/2014-1 and 303750/2014-6).

Dept. of Computer Science, Institute of Mathematics and Statistics, University of S˜ao Paulo, Rua do Mat˜ao, 1010, Cidade Universit´aria, 05508-090, S˜ao Paulo, SP, Brazil.egbirgin@ime.usp.br

Dept. of Applied Mathematics, Institute of Mathematics, Statistics, and Scientific Computing, State University of Campinas, 13083-859, Campinas, SP, Brazil. martinez@ime.unicamp.br

1

(2)

exhausted. This gives rise to a third algorithm with complexityO(ε−3/2) for finding

43

ε-approximate first-order stationary points. This method will be implemented and

44

compared with the second one. Therefore, the present paper introduces an algorithm

45

with complexityO(ε−3/2) for finding approximate KKT points of linearly-constrained

46

optimization problems, as a result of the manipulation of a theoretical algorithm for

47

constrained optimization. Theoretical algorithms for finding approximate KKT points

48

of general nonlinear programming problems with complexityO(ε−3/2) have been in-

49

troduced [5], but practical counterparts are not yet known.

50

Several numerical optimization traditions converge on the present work. Algo-

51

rithms that use quadratic models for unconstrained problems and employ regularized

52

or trust-regions subproblems combined with actual-versus-predicted-reduction or cu-

53

bic descent with proven complexityO(ε−3/2) were given in [8,16,17,18,20,21,28,33].

54

Cubic regularization was introduced as a useful optimization tool in [26,35]. The op-

55

timality of the complexity O(ε−3/2) was proved in [18]. The generalization of cubic

56

regularization to arbitrary (p+ 1)th regularization was given in [6]. In [25], the com-

57

plexity (close to O(ε−2)) of quasi-Newton methods for unconstrained optimization

58

was analyzed. The complexity of unconstrained optimization and some constrained

59

optimization algorithms assuming relaxed Lipschitz (H¨older) conditions on the ob-

60

jective function was analyzed in [19, 24, 27]. The idea of minimizing on polytopes

61

using a fast algorithm within faces combined with a suitable procedure to abandon

62

exhausted faces is in the core of active-set methods for constrained optimization and

63

may be found in several textbooks [11, 22,34]. Many algorithms for solving general

64

constrained optimization problems use subproblems that consist on the minimization

65

of combinations of objective and constraint functions subject to linear constraints

66

[11, 32, 22, 34]. The combined algorithm presented here is inspired in the ideas of

67

[3, 9, 10], where the algorithm for leaving faces was the spectral projected gradient

68

method [1,12,13,14,15].

69

This paper is organized as follows. Section2 introduces the cubic regularization

70

method that either finds an interior stationary point or stops at the boundary of

71

the feasible region. First- and second-order complexity results are presented. In

72

Section3, we introduce a model algorithm with (p+1)-regularized models for nonlinear

73

programming. In Section 4, we describe the method for minimization with linear

74

constraints that combines the procedures of Sections2and3. In Section5, we compare

75

the combined algorithm against the method presented in Section 3 with p = 2. In

76

Section6, we state some conclusions and lines for future research.

77

Notation. Given Ω ⊆ Rn, Int(Ω) denotes the set of interior points of Ω and ¯Ω

78

denotes its closure; if Ω is convex, P(x) denotes the Euclidean projection ofxonto

79

Ω; ∇ and ∇2 are the gradient and Hessian operators, respectively; k · k denotes

80

the Euclidean norm; N= {0,1,2, . . .}; g(x) denotes the gradient of f(x) and H(x)

81

denotes its Hessian;λ1(H) denotes the leftmost eigenvalue of the symmetric matrix

82

H; Hk denotes H(xk); h0(x) denotes the Jacobian ofh : Rn → Rq; and, if a ∈ R,

83

[a]+= max{a,0}.

84

2. Inner-to-the-face algorithm. Consider the problem

85

Minimizef(x) subject tox∈Ω.

86

We assume that Ω⊂Rnhas non-empty interior andf is a continuous function defined

87

on an open set that contains Ω. In this section, we introduce an algorithm that finds

88

an interior point that approximately satisfies first- and second-order conditions for

89

unconstrained minimization; or, alternatively, it finds a point on the boundary of Ω.

90

(3)

More precisely, either the algorithm generates a finite sequence x0, x1, . . . , xˆk such

91

that the final point xˆk is on the boundary of Ω, xk ∈ Int(Ω) for all k < ˆk, and

92

f(xkˆ)< f(xˆk−1)<· · ·< f(x0), or it generates aninfinite sequence{xk}k=0⊂Int(Ω)

93

such that {f(xk)} is strictly decreasing, g(xk) → 0 and [−λ1(H(xk)]+ → 0 when

94

k→ ∞.

95

In the algorithm, for a given iterate xk and a trial step strial, we consider the

96

interiority condition

97

(1) xk+strial∈Int(Ω)

98

and the sufficient descent condition

99

(2) f(xk+strial)≤f(xk)−αkstrialk3.

100

The first step of the algorithm consists in computing, when possible, a solutionstrial

101

to the quadratic model

102

(3) Minimizeg(xk)Ts+1

2sTHks

103

and checking whether it satisfies (1,2) or not. Step 2 (that is executed when the first

104

steps is not successful) consists in, by increasing the regularizing parameter ρ in a

105

controlled way, finding a solutionstrial=strial(ρ) to the cubic regularized model

106

(4) Minimizeg(xk)Ts+1

2sTHks+ρksk3

107

that satisfies (1,2). Step 3 consists in verifying whether the regularizing parameter ρ

108

happened to be too large or not. Completing the verification may require the calcula-

109

tion of a point on the boundary of Ω and this task is performed at Step 4. At the end,

110

the computed new iterate may be a solution to (3) or (4) that belongs to the interior

111

of Ω and satisfies the sufficient descent condition or a point on the boundary of Ω.

112

In the latter case, the algorithm stops. The complete description of the algorithm

113

follows.

114

Algorithm2.1. Assume that x0∈Int(Ω), α >0, τ2 ≥τ1>1,M > ρmin>0,

115

andJ ≥0are given. Initialize k←0.

116

Step 1. Set j←0.

117

Step 1.1. If (3) is not solvable, go to Step 2.

118

Step 1.2. Defineρk,j = 0 andsk,j as a solution to (3).

119

Step 1.3. If (1) does not hold with strial =sk,j and j < J, compute, when possible,

120

a pointw on the boundary of Ω. If f(w)< f(xk)then definexk+1=w and

121

stop.

122

Step 1.4. If (1) and (2) hold with strial = sk,j, define sk = sk,j, ρk = ρk,j, and

123

xk+1 =xk+sk, set k←k+ 1, and go to Step 1. Otherwise, setj ←j+ 1

124

(and continue at Step 2 below).

125

Step 2. Set ρ¯←ρmin.

126

Step 2.1. Definesk,j as a solution to (4) withρ=ρk,j for someρk,j∈[τ1ρ, τ¯ 2ρ].¯

127

Step 2.2. If (1) does not hold with strial =sk,j and j < J, compute, when possible,

128

a pointw on the boundary of Ω. If f(w)< f(xk)then definexk+1=w and

129

stop.

130

Step 2.3. If (1) or (2) does not hold with strial=sk,j, set ρ¯←ρk,j,j ←j+ 1, and

131

go to Step 2.1.

132

(4)

Step 3. If

133

(5) ρk,j ≤M orj= 0 orxk+sk,j−1∈Int(Ω),

134

define sk =sk,j, ρkk,j, andxk+1 =xk +sk, set k ← k+ 1, and go to

135

Step 1.

136

Step 4. Defineσ1= 3ksk,j−1k,j−1 andσ2= 3ksk,jk,j.

137

Step 4.1. Computeσ∈[σ1, σ2) andw=xk−(Hk+σI)−1g(xk)such thatw is on

138

the boundary ofΩ. (If xk+sk,j−1 is on the boundary ofΩthenσ=σ1 and

139

w=xk+sk,j−1 is the natural choice.)

140

Step 4.2. If f(w)< f(xk) then define xk+1 =w and stop. Otherwise, definesk =

141

sk,jkk,j, andxk+1=xk+sk and go to Step 1.

142

Note that, at Step 1.1, (3) is solvable if and only if Hk is positive definite or

143

Hk is positive semidefinite and g(xk) belongs to the range space of Hk. Note also

144

that, at Step 2.1, we solve the cubic regularization problem (4) with a regularization

145

parameterρwhich is a priori unknown and may take any value betweenτ1ρ¯andτ2ρ.¯

146

This may be done using a root-finding process that aims to compute the quadratic

147

regularization parameter that corresponds to a cubic regularization one between the

148

given bounds (see [8] for details). Recall that, as shown in [17, Thm. 3.1], the set of

149

solutions of cubic regularized problems, varyingρ, coincides with the set of solutions

150

of quadratic regularized problems, varyingσ. At Steps 1.3 and 2.2, a magical step is

151

being considered and its definition may depend on characteristics of Ω. We have in

152

mind the case in which Ω is a closed and convex set and the projection onto it is an

153

affordable task. In this case, if (1) does not hold withstrial =sk,j, we may definew

154

as the projection ofxk+sk,j onto Ω. This trick has been proved to be very useful in

155

practice. A similar device is used in [31]. The number of magical steps per iteration

156

is limited toJ.

157

Algorithm2.1has been described in such a way that the only reason for stopping

158

is to find an iterate on the boundary of Ω. In any other case, the algorithm generates

159

an infinite sequence. Of course, in practice, stopping criteria are necessary (and they

160

will be defined later) but, in theory, we are interested on the behavior of the potentially

161

infinite sequence of iterates.

162

When the algorithm arrives to Step 3, a point xk +sk,j such that (1) and (2)

163

hold with strial = sk,j has been computed. If, in addition, ρk,j ≤ M , xk+sk,j is

164

accepted as the new iterate. The obvious question is why we do not accept the trial

165

pointxk+sk,j, in spite of being interior and satisfying the sufficient descent condition,

166

when (5) does not hold. The reason is that, since ρk,j > M could be very big, sk,j

167

could be unacceptably small and, so, sufficient descent could not mean satisfactory

168

descent. This situation may only happen whenj >0 andxk+sk,j−1∈/Int(Ω), which

169

means that the previous trial point at the present iteration was rejected without

170

testing its functional value because it was not interior. Acceptingxk+sk,j as a new

171

iterate or not involves computing an additional point at the boundary of Ω and this

172

task is performed at Step 4.

173

Recall that, when the algorithm executes Step 4, we are in a situation in which

174

x = xk +sk,j is interior and satisfies (2); whereas y = xk +sk,j−1, the previously

175

computed trial point, is not interior. Moreover, bothxandycome from solving cubic

176

regularization subproblems with parametersρxandρyymay be null), respectively,

177

whereρx≥2ρy ≥0. This means that bothxandycome from solving quadratic reg-

178

ularization problems with regularization parameters σx2 = 3ksk,jk,j andσy =

179

σ1 = 3ksk,j−1k,j−1, respectively, with σx > σy ≥ 0. At Step 4.2, it is assumed

180

(5)

that the root-finding process of computing a solution to a quadratic regularized sub-

181

problem withσ∈[σy, σx) that lies on the boundary of Ω is an affordable task. The

182

proposition below shows that σ andw that must be computed at Step 4.2 are well

183

defined.

184

Proposition 2.1. Assume that Ω⊂Rn, ϕ : [a, b]→ Rn is continuous, ϕ(a) ∈

185

Int(Ω), and ϕ(b) ∈/ Ω. Then, there exists t¯∈ [a, b] such that ϕ(¯t) belongs to the

186

boundary ofΩ.

187

Proof. Define ¯t ∈ [a, b] as the supremum of the values of t ∈ [a, b] such that

188

ϕ(t)∈Ω.

189

If ¯t=b, there exists a sequencetk →bsuch thatϕ(tk)∈Ω and, by continuity of

190

ϕ,ϕ(tk)→ϕ(b). Then,ϕ(¯t) is on the boundary of Ω and we are done.

191

Consider the case in which ¯t < b. On the one hand, for all t ∈ [0,¯t), we have

192

that ϕ(t) belongs to Ω. On the other hand, by the definition of supremum, for all

193

t0 ∈(¯t,1], there existst00∈[¯t, t0] such thatϕ(t00) does not belong to Ω. Therefore, in

194

every neighborhood ofϕ(¯t) there are points that belong to Ω and points that do not

195

belong to Ω. By continuity ofϕ, this implies that ϕ(¯t) is on the boundary of Ω.

196

It must be noted that, by assuming that it is an affordable task to find a trial

197

pointwthat at the same time is a solution to a quadratic regularization subproblem

198

and it is at the boundary of Ω, a single functional evaluation is performed (at Step 4.2)

199

to decide whether to reach the boundary or to remain in the interior of Ω. Having

200

computed w, the algorithm checks whether f(w) < f(xk) or not. In case it is, the

201

algorithm definesxk+1=wand stops. Iff(w)≥f(xk), this means that (2) does not

202

hold withstrial=w−xk. In this case, the algorithm acceptsxk+sk,j as new iterate.

203

This is because the fact thatstrial=w−xkdoes not satisfy (2) reveals that the cubic

204

regularization parameterρk,j associated withxk+sk,j is not unnecessarily big and,

205

as a consequence, thatxk+sk,j is acceptable as new iterate.

206

AssumptionA1. There exists L > 0 such that for all xk computed by Algo-

207

rithm2.1and all strial considered at (2) we have that

208

(6) f(xk+strial)≤f(xk) +g(xk)Tstrial+1

2(strial)THkstrial+Lkstrialk3.

209

Lemma 2.1. Suppose that Assumption A1 holds. If ρk,j ≥L+αthen (2) holds

210

withstrial=sk,j.

211

Proof. Ifρk,j ≥L+α, by AssumptionA1, we have that

212

f(xk+sk,j) ≤ f(xk) +g(xk)Tsk,j+12(sk,j)THksk,j+Lksk,jk3

= f(xk) +g(xk)Tsk,j+12(sk,j)THksk,j+ (L+α)ksk,jk3−αksk,jk3

≤ f(xk) +g(xk)Tsk,j+12(sk,j)THksk,jk,jksk,jk3−αksk,jk3. 213

Sincesk,j being a solution to (4) withρ=ρk,j implies that

214

g(xk)Tsk,j+1

2(sk,j)THksk,jk,jksk,jk3≤0,

215

(2) follows from the last inequality.

216

Lemma 2.2. Suppose that Assumption A1holds. Then, the sequence{xk} gener-

217

ated by Algorithm 2.1 is well defined. Moreover, if the algorithm does not stop at a

218

boundary point at iteration k(in which case ρk is undefined), we have that

219

(7) ρk ≤max{M, τ2(L+α)}.

220

(6)

Proof. Proving that the sequence{xk} is well defined consists in showing that,

221

given xk ∈Int(Ω), the point xk+1 is computed in finite time. If xk+1 is a solution

222

to (3) computed at Step 1 or it is a point in the boundary of Ω computed at Steps 1.3,

223

2.2, or 4, we are done since those calculations involve a finite number of operations

224

by definition. We now assume thatxk+1=xk+sk,j andsk,j is a solution to (4) with

225

ρ =ρk,j (computed at Step 3). Since xk is interior, for ρk,j large enough we have

226

thatxk+sk,j is interior too, meaning that (1) holds withstrial=sk,j. Moreover, by

227

Lemma2.1, ifρk,j≥L+αthen (2) holds withstrial=sk,j. Thus, since the updating

228

rule of ρk,j implies that ρk,j → ∞ when j → ∞, we have that xk +sk,j satisfying

229

(1,2) withstrial=sk,j is found in finite time.

230

Let us now prove the boundedness ofρk. Ifρkk,j is defined at Step 3 then

231

we have that ρk ≤ M. Assume now that ρk = ρk,j is defined at Step 4.2. By the

232

definition of the algorithm, we have that ρk,j ∈ [τ1ρk,j−1, τ2ρk,j−1]. Moreover, the

233

point won the boundary of Ω was rejected. This point is of the form w=xk+sw

234

withsw=−(Hk+σ)−1g(xk) a solution to the quadratic regularized subproblem

235

Minimizeg(xk)Ts+1

2sTHks+σ 2ksk2

236

with σ∈[σ1, σ2), σ1 = 3ksk,j−1k,j−1, and σ2 = 3ksk,jk,j. This means (see [17,

237

Thm. 3.1]) thatsw is a solution to (4) with ρ=ρw=σ/(3kswk) satisfyingρk,j−1

238

ρw ≤ρk,j. The point w was rejected becausef(w) ≥f(xk), which means that (2)

239

withstrial =sw does not hold. Therefore, by Lemma2.1, we have thatρw 6≥L+α.

240

Thus,ρk,j−1< L+αand, in consequence, ρk,j < τ2(L+α) as we wanted to prove.

241

Lemma 2.3. If sk = 0theng(xk) = 0andHk is positive semidefinite.

242

Proof. Since

243

g(xk)Ts+1 2sTHks

|s=0=g(xk)

244

and∇2

g(xk)Ts+12sTHks

|s=0=Hk, ifsk= 0 solves (3) then we have thatg(xk) =

245

0 andHk is positive semidefinite.

246

The gradient of the objective function that defines (4) is g(xk) +Hks+ 3ρksks.

247

Therefore, if sk = 0 is a solution to (4) then we have thatg(xk) = 0. Moreover, for

248

alls∈Rn, we must have

249

1

2sTHks+ρksk3≥0,

250

otherwisesk= 0 would not be a solution to (4). Then, for all s6= 0,

251

1 2

sTHks

ksk2 +ρksk ≥0.

252

In particular, ifs6= 0 is an eigenvector associated withλ1(Hk), it turns out that

253

1

1(Hk) +ρksk ≥0.

254

Taking limits fors→0 it follows thatλ1(Hk)≥0 as we wanted to prove.

255

Lemma 2.4. Suppose that Assumption A1holds, that Algorithm 2.1generates an

256

infinite sequence{xk}k=0, and that{f(xk)}k=0 is bounded below. Then,

257

(8) lim

k→∞kskk= 0

258

(7)

and

259

(9) lim

k→∞[−λ1(Hk)]+= 0.

260

Proof. Since {f(xk)} is bounded below, (8) follows from the fulfillment of the

261

sufficient descent condition (2) withstrial=sk for allk.

262

Moreover, sk is a minimizer of g(xk)s+ 12sTHks+ρkksk3 and, by Lemma 2.3,

263

Hk is positive semidefinite if sk = 0. If sk 6= 0, the Hessian of the cubic function

264

g(xk)s+12sTHks+ρkksk3 must be positive semidefinite. This Hessian is

265

Hk+ 3ρk

sk(sk)T

kskk +kskkI

.

266

Thus,

267

−λ1

Hk+ 3ρk

sk(sk)T

kskk +kskkI

+

≥0.

268

Sincesk →0 and, by Lemma2.2, ρk is bounded, we have that (9) holds.

269

AssumptionA2. Assumption A1 holds and, for all xk andsk computed by Al-

270

gorithm 2.1,

271

(10) kg(xk+sk)k ≤(L+ 3ρk)kskk2.

272

A sufficient condition for the fulfillment of AssumptionA2comes from assuming

273

thatg(xk+sk) is well represented by its linear approximation, namely,

274

(11) kg(xk+sk)−g(xk)−H(xk)skk ≤Lkskk2.

275

In this case, AssumptionA2holds due to the gradient annihilation of the subproblem

276

at Step 2. Moreover, a sufficient condition for the fulfillment of both AssumptionsA1

277

and A2 is the fulfillment of a Lipschitz condition by the Hessian H(x). It is inter-

278

esting to note that AssumptionA2 requires (10) to be verified only with respect to

279

incrementsskthat satisfy the interiority and the sufficient descent condition; whereas

280

AssumptionA1requires (6) to hold at every trial incrementstrial.

281

Lemma 2.5. Suppose that Assumption A2holds. Then, for allxk andsk gener-

282

ated by Algorithm2.1such that xk+1=xk+sk ∈Int(Ω), we have that

283

(12) f(xk+1)≤f(xk)−α

kg(xk+1)k

L+ 3 max{M, τ2(L+α)}

3/2 .

284

Proof. By AssumptionA2and Lemma2.2, we have that

285

(13) kg(xk+1)k ≤(L+ 3 max{M, τ2(L+α)})kskk2.

286

Thus, (12) follows from (13) and the fact that (2) holds withstrial=sk.

287

Theorem 2.1. Suppose that Assumption A2 holds and let ftarget ∈ R, εg >0,

288

and εh > 0 be arbitrary. Let ρmax = max{M, τ2(L+α)}. Then, the number of

289

iterations at which

290

f(xk)> ftarget and kg(xk)k> εg 291

(8)

is, at most,

292

(14)

$

f(x0)−ftarget

α

gk L+ 3ρmax

−3/2% .

293

Moreover, the number of iterations at which

294

f(xk)> ftarget and λ1(Hk)<−εh

295

is, at most,

296

(15)

$f(x0)−ftarget

α

εhmax

−3% .

297

Finally, at each iterationk, the number of functional evaluations is bounded by

298

J+

logτ1 ρmax

ρmin

+ 2.

299

Proof. The maximum number of iterations (14) follows directly from Lemma2.5.

300

The stepskis a minimizer ofg(xk)Ts+12sTHks+ρkksk3, where, by Lemma2.2,

301

ρk ≤ρmax. Thus, if sk 6= 0, the Hessian ofg(xk)Ts+12sTHks+ρkksk3 is positive

302

semidefinite ats=sk. By direct calculations, we see that this Hessian is given by

303

Hk+ 3ρk

sk(sk)T

kskk +kskkI

.

304

Letvk be an eigenvector ofHk associated withλ1(Hk) andkvkk= 1. Then, by the

305

positive definiteness ofHk+ 3ρhsk(sk)T

kskk +kskkIi

, we have that

306

0 ≤ vkT

Hk+ 3ρk

hsk(sk)T

kskk +kskkIi vk

≤ λ1(Hk) + 3ρk

(vkTsk)2/kskk+kskk

≤ λ1(Hk) + 3ρk

kskk2/kskk+kskk

≤ λ1(Hk) + 6ρmaxkskk.

307

Thus,kskk ≥ −λ1(Hk)/(6ρmax) or, equivalently,−kskk3≤(λ1(Hk)/(6ρmax))3. There-

308

fore, since (2) withstrial=sk holds for allk, we have that

309

f(xk+1)≤f(xk)−αkskk3≤f(xk) +α([λ1(Hk)]/(6ρmax))3,

310

from which (15) follows.

311

In order to establish the number of functional evaluations per iteration k, first

312

note that, while the interiority condition (1) is not being satisfied withstrial=sk,j, on

313

the one hand there is no need to check whether the descent condition (2) holds with

314

strial=sk,jand, on the other hand, simple decrease may be checked at no more thanJ

315

magical steps. This means that, at every iteration k, while (1) does not hold with

316

strial =sk,j, at most J functional evaluations are performed. Once (1) is satisfied,

317

the limit on the number of functional evaluations follows from the boundedness ofρk 318

for allk, given by Lemma2.2, the fact thatρk,0= 0 by definition, and the updating

319

rules forρk,j that guarantees that (a) if (3) is solvable then ρk,j ≥2j−1ρmin for all

320

j ≥1 and (b) if (3) is not solvable then ρk,j ≥2jρmin for all j ≥0. This completes

321

the proof.

322

(9)

Theorem 2.2. Suppose that Assumption A2 holds and that the sequence {xk}

323

generated by Algorithm 2.1 is bounded. Then, either the sequence stops at some

324

boundary point xkˆ such that f(xˆk)< f(xk−1ˆ )<· · ·< f(x0); or an infinite sequence

325

is generated such that

326

lim

k→∞kg(xk)k= 0 and lim

k→∞[−λ1(Hk)]+= 0.

327

Proof. If the sequence stops at a boundary point xˆk then f(xkˆ) < f(xˆk−1) <

328

· · ·< f(x0) follows by the definition of the algorithm. Assume that the sequence does

329

not stop at a boundary point. Since the sequence is bounded, by continuity, there

330

exists fbound ∈ R such that f(xk) > fbound for all k. Define ftarget = fbound. Let

331

ε >0 be arbitrary. By Theorem2.1, taking εg =ε we see that there exists k0 such

332

that, for all k ≥k0, kg(xk)k ≤ ε. The fact that limk→∞[−λ1(Hk)]+ = 0 has been

333

proved in Lemma2.4. This completes the proof.

334

3. High-order algorithms for constrained optimization. Consider the prob-

335

lem

336

(16) Minimizef(x) subject tox∈D,

337

where D ⊂ Rn represents an arbitrary set. In this section, we introduce a class of

338

methods for solving (16). The algorithms to be introduced in this section can be used

339

in connection with the algorithm introduced in Section2in different ways. Therefore,

340

this class of algorithms may be seen as independent procedures for solving the main

341

problem (16) or as auxiliary devices for continuing Algorithm2.1when “a face” ofD

342

should be abandoned.

343

Algorithm3.1 below is a high-order algorithm in which each iteration is defined

344

by the approximate minimization of thepth Taylor approximation of the functionf

345

around the iteratexkalong the lines of [6]. Iff :Rn→Rwith continuous derivatives

346

up to orderp∈ {1,2,3, . . .}, the Taylor polynomial of order pwill be written in the

347

form

348

(17) Tp(x, s) =

p

X

j=1

Pj(x, s),

349

wherePj(x, s) is an homogeneous polynomial of degreej given by

350

(18) Pj(x, s) = 1

j!

s1

∂x1

+· · ·+sn

∂xn

j

f(x).

351

For example,T1(x, s) =g(x)TsandT2(x, s) =g(x)Ts+12sTH(x)s.

352

Algorithm3.1. Assume that p∈ {1,2,3, . . .}, α >0, ρmin >0, τ2 ≥τ1 >1,

353

θ >0, andx0∈D are given. Initialize k←0.

354

Step 1. Set ρ←0.

355

Step 2. Compute strial∈Rn such that

356

(19) xk+strial∈D

357

and

358

(20) Tp(xk, strial) +ρkstrialkp+1≤0.

359

(10)

Step 3. Test the condition

360

(21) f(xk+strial)≤f(xk)−αkstrialkp+1.

361

If (21) holds, define sk =strialk =ρ, and xk+1 =xk+sk, setk←k+ 1,

362

and go to Step 1. Otherwise, updateρ←max{ρmin, τ ρ} withτ ∈[τ1, τ2], and

363

go to Step 2.

364

The trial increment strial is intended to be an approximate solution to the sub-

365

problem

366

(22) MinimizeTp(xk, s) +ρkskp+1 subject toxk+s∈D.

367

The condition (20) is the minimal condition that should be imposed to the approx-

368

imate solution to (22) in order to obtain meaningful results; although an obvious

369

choice that satisfies (19), (20), and (21) isstrial= 0, which is not useful at all and it

370

will be discarded later.

371

Lemma 3.1. Assume that the sequence {xk} is generated by Algorithm 3.1. If

372

{f(xk)} is bounded below then

373

lim

k→∞kskk= 0.

374

Proof. The proof follows straightforwardly from the hypotheses of the lemma

375

and (21).

376

The following assumption coincides with AssumptionA1in the casep= 2.

377

AssumptionA3. There exists L > 0 such that for all xk computed by Algo-

378

rithm3.1and all strial considered at (21) we have that

379

(23) f(xk+strial)≤f(xk) +Tp(xk, strial) +Lkstrialkp+1.

380

Lemma 3.2. Suppose that Assumption A3holds. If the regularization parameter

381

ρin (20) satisfiesρ≥L+αthen a trial stepstrial that satisfies (20) also satisfies the

382

sufficient descent condition (21).

383

Proof. By AssumptionA3,

384

f(xk+strial) ≤ f(xk) +Tp(xk, strial) +Lkstrialkp+1

= f(xk) +Tp(xk, strial) +ρkstrialkp+1−ρkstrialkp+1+Lkstrialkp+1. 385

Then, by (20),

386

f(xk+strial)≤f(xk)−ρkstrialkp+1+Lkstrialkp+1.

387

Therefore, ifρ≥L+α, (21) holds.

388

AssumptionA4. Assumption A3 holds and, for all xk andsk computed by Al-

389

gorithm 3.1, we have that

390

(24) kg(xk+sk)− ∇sTp(xk, sk)k ≤Lkskkp.

391

AssumptionsA3andA4are satisfied if thepth derivatives off satisfy a Lipschitz

392

condition (see [6]). As in the case of AssumptionsA1andA2, observe that Assump-

393

tionA3must hold for every trial incrementstrial; whereas AssumptionA4must hold

394

only at the accepted incrementssk.

395

(11)

AssumptionA5. The setD is defined by

396

(25) D={x∈Rn|hi(x)≤0 for all i= 1, . . . , q},

397

where the functionshi are continuously differentiable. Moreover, every feasible point

398

of (25) satisfies a constraint qualification.

399

Assumption A5 implies that, for any function ϕ : Rn →R, if z is a minimizer

400

of ϕ subject to z ∈ D, associated KKT conditions hold. For the sake of simplicity

401

and without loss of generality, we formulated the definition of D only in terms of

402

inequality constraints. Implicitly, we assume that if equality constraints are present,

403

they are expressed as pairs of inequalities.

404

For allx∈D, we define

405

(26) L(D, x) ={z∈Rn|hi(x) +h0i(x)(z−x)≤0 for all i= 1, . . . , q}.

406

We say thatL(D, x) is the linear approximation of D aroundx. Givenx∈ D and

407

a function ϕ, we will denote∇Dϕ(x) =PL(D,x)(x− ∇ϕ(x))−x.Direct calculations

408

show thatxsatisfies the KKT conditions of the problem of minimizingϕ ontoD if,

409

and only if, ∇Dϕ(x) = 0. If there existsxk →x such that ∇Dϕ(xk)→0, we say

410

that x satisfies the Approximate Gradient Projection (AGP) sequential optimality

411

condition [2,29].

412

AssumptionA6. There exists θ >0 such that, for allk∈N, approximate solu-

413

tionssk to the subproblem (22) satisfy

414

(27)

D

Tp(xk, x−xk) +ρkx−xkkp+1 x=xk+sk

≤θkskkp.

415

Assumption A6 states that the accepted increment sk satisfies, approximately,

416

an AGP optimality condition. If the constraints satisfy some constraint qualification,

417

every minimizer of (22) satisfies the AGP condition and, consequently, also fulfills (27).

418

Therefore, AssumptionA6states the degree of accuracy with which one wishes to solve

419

the subproblems (and this is the assumption that eliminates the possibility of taking

420

sk = strial = 0). Note that the degree of accuracy (27) is required not to all the

421

subproblems but only to the ones that, ultimately, define algorithmic progress.

422

Lemma 3.3. Suppose that Assumptions A4, A5, and A6 hold and that the se-

423

quence {xk} is generated by Algorithm3.1. Then, at each iteration k, we have that

424

(28)

Df(xk+sk)

≤(L+τ2(L+α) (p+ 1) +θ)kskkp.

425

Proof. By the definition of∇D, we have that

426

(29)

k∇Df(xk+sk)− ∇DTp(xk, x−xk)|x=xk+skk

= kPL(D,xk+sk)

(xk+sk)−g(xk+sk)

− PL(D,xk+sk)

(xk+sk)− ∇Tp(xk, x−xk)|x=xk+sk

k

≤ k(xk+sk)−g(xk+sk)− (xk+sk)− ∇Tp(xk, x−xk)|x=xk+sk

k

= kg(xk+sk)− ∇Tp(xk, x−xk)|x=xk+skk ≤Lkskkp,

427

where the first inequality follows from the contraction property of projections and the

428

(12)

last inequality follows from AssumptionA4. This means that

429

k∇Df(xk+sk)k

≤ Lkskkp+k∇DTp(xk, x−xk)|x=xk+skk

≤ (L+θ)kskkp+ρk∇D

kx−xkkp+1

|x=xk+skk

= (L+θ)kskkp+ρkPL(D,xk+sk)

(xk+sk)− ∇

kx−xkkp+1

|x=xk+sk

−(xk+sk)k

≤ (L+θ)kskkp+ρk(xk+sk)− ∇

kx−xkkp+1

|x=xk+sk−(xk+sk)k

= (L+θ)kskkp+ρk∇

kx−xkkp+1

|x=xk+skk

= (L+θ+ρ(p+ 1))kskkp, 430

where the first inequality follows from (29), the second inequality follows from As-

431

sumption A6, and the third inequality follows from the fact xk +sk belongs to

432

L(D, xk +sk) and, therefore, for any z ∈ Rn, PL(D,xk+sk)(z) is closer to xk +sk

433

thanz itself. Finally, by Lemma3.2, we have thatρ≤τ2(L+α), which implies the

434

desired result.

435

Lemma 3.4. Suppose that Assumptions A4, A5, and A6 hold and that the se-

436

quence {xk} is generated by Algorithm3.1. Then, at each iteration k, we have that

437

(30) f(xk+1)≤f(xk)−α

k∇Df(xk+1)k L+τ2(L+α) (p+ 1) +θ

(p+1)/p .

438

Proof. The result follows straightforwardly from Lemma3.3and (21).

439

Theorem 3.1. Suppose that AssumptionsA4,A5, and A6hold and that the se-

440

quence {xk} is generated by Algorithm 3.1. Let ftarget ∈R andεg >0 be arbitrary.

441

Then, the number of iterationsksuch that

442

f(xk)> ftarget and k∇Df(xk+1)k ≥εg

443

is not greater than

444

(31)

$f(x0)−ftarget

α

εg

L+τ2(L+α) (p+ 1) +θ

−(p+1)/p%

445

and the number of functional evaluations per iteration is bounded above by

446

logτ1

τ2(L+α) ρmin

+ 1.

447

Finally, if{f(xk)} is bounded below,

448

(32) lim

k→∞k∇Df(xk+1)k= 0.

449

Proof. The maximum number of iterations (31) follows from Lemma 3.4. The

450

bound on the number of functional evaluations per iteration follows from the updating

451

rule for ρand Lemma3.2; while (32) follows from Lemma3.4 and the boundedness

452

of{f(xk)}.

453

AssumptionA7. For allk∈N,sk is a global minimizer ofTp(xk, s) +ρkkskp+1

454

subject to xk+s∈D.

455

(13)

If the constraintsh(x)≤0 are simple enough, AssumptionA7implies Assump-

456

tionA6.

457

Theorem 3.2. Suppose that AssumptionA4,A5,A6, andA7 hold and that the

458

sequence {xk} is generated by Algorithm 3.1. Letx be a limit point of{xk}. Then,

459

there exists ρ ∈ [0, τ2(L+α)] such that s= 0 is a global minimizer of Tp(x, s) +

460

ρkskp+1 subject to x+s∈D.

461

Proof. By Lemma 3.2, ρk ∈ [0, τ2(L+α)] for allk ∈N. Therefore, there exists

462

ρ∈[0, τ2(L+α)] and an infinite sequence of indices Ksuch that

463

lim

k∈Kρk and lim

k∈Kxk=x.

464

Lets∈Rn such thatxk+s∈D be arbitrary. By AssumptionA7, we have that

465

Tp(xk, sk) +ρkkskkp+1≤Tp(xk, s) +ρkkskp+1.

466

Taking limits in this inequality fork∈K and using the continuity of the derivatives

467

off up to order p, the fact that, by Lemma3.1,sk→0 andρk →ρ, we obtain that

468

Tp(x,0) +ρk0kp+1≤Tp(x, s) +ρkskp+1.

469

Sincessatisfyingx+s∈Dis arbitrary, we obtain the desired result.

470

Recall that, if a point x is an unconstrained local minimizer of a function f,

471

we have that such point is p-stationary. A pointx∈ Rn will be said to beq-order

472

stationary (with 1≤q≤p) if it is (q−1)-order stationary and, for all v ∈Rn such

473

that P0(x, v) =· · · =Pj−1(x, v) = 0, one has that Pj(x, v)≥0. By convention, we

474

will say that every pointx∈Rn is 0-order stationary andP0(x, v) = 0.

475

Corollary 3.1. Assume that q = 0 (that is, D = Rn and the problem is un-

476

constrained). Under the hypotheses of Theorem 3.2, the limit point x is m-order

477

stationary for allm≤p.

478

Proof. By Theorem 3.2, s= 0 is q-order stationary for the function Tp(x, s) +

479

ρkskp+1. But the conditions ofq-order stationarity of this function ats= 0 are the

480

same as the ones off atxsince all their derivatives up to orderpcoincide.

481

Anm-order stationary condition for local minimization ofϕ(x) subject tox∈D

482

is a property that involves derivatives up to order mofϕ as well as properties ofD

483

and must be satisfied by any local minimizer ofϕ. Since, for all m≤p, them-order

484

partial derivatives off atx coincide with those ofTp(x, s) +ρkskp+1 ats= 0, we

485

may extend Corollary3.1to the constrained optimization case as follows.

486

Corollary 3.2. Under the hypotheses of Theorem 3.2, the limit point x is m-

487

order stationary for allm≤p.

488

4. Linearly-constrained optimization. In this section, we consider the prob-

489

lem

490

(33) Minimizef(x) subject tox∈D,

491

whereD⊂Rn is a polytope defined by

492

D={x∈Rn|(ai)Tx≤bi for alli= 1, . . . , q}.

493

Referências

Documentos relacionados

Para a realidade brasileira da energia solar fotovoltaica, são necessárias implementações de políticas públicas de incentivo à reutilização de módulos fotovoltaicos que

Our contribution focuses on two problems: finding a minimum cost disjoint pair of loopless paths, the active path of which visits a given set of nodes, and finding a

O planejamento recomendava, ainda, que a qualidade presente nos núcleos educacionais mais avançados chegasse até os núcleos menos avançados. Isso envolveria o

Em relação às águas superficiais (GOSS) os princípios ativos glifosato, acefato e o dicloreto de paraquate possuem maior mobilidade de transporte em água associado ao

Já a empresa Agrícola da Ilha especializada na produção de Hemerocallis, vem buscando a cada dia a qualidade de seus produtos, e buscando novas fontes de renda, como

Pedro II à cidade de Desterro, capital da província de Santa Catarina, em 1845, segundo Oswaldo Rodrigo Cabral (1979), configura-se como o acontecimento

In general, the Bayesian FA model was robust under different simulated unbalanced levels, presenting the superior predictive ability of missing data when compared to competing