COMPLEXITY FOR CONSTRAINED OPTIMIZATION
2
E. G. BIRGIN† AND J. M. MART´ıNEZ‡ 3
Abstract. The main objective of this research is to introduce a practical method for smooth 4
bound-constrained optimization that possesses worst-case evaluation complexityO(ε−3/2) for finding 5
anε-approximate first-order stationary point when the Hessian of the objective function is Lipschitz- 6
continuous. As other well established algorithms for optimization with box constraints, the algorithm 7
proceeds visiting the different faces of the domain aiming to reduce the norm of an internal projected 8
gradient and abandoning active constraints when no additional progress is expected in the current 9
face. The introduced method emerges as a particular case of a method for minimization with linear 10
constraints. Moreover, the linearly-constrained minimization algorithm is an instance of a mini- 11
mization algorithm with general constraints whose implementation may be unaffordable when the 12
constraints are complicated. As a procedure for leaving faces, it is employed a different method that 13
may be regarded as an independent device for constrained optimization. Such independent algorithm 14
may be employed to solve linearly-constrained optimization problem on its own, without relying on 15
the active-set strategy. A careful implementation and numerical experiments shows that the algo- 16
rithm that combines active sets with leaving-face iterations is more effective than the independent 17
algorithm on which leaving-face iterations are based, although both exhibits similar complexities 18
O(ε−3/2).
19
Key words. Nonlinear programming, bound-constrained minimization, active-set strategies, 20
regularization, complexity.
21
AMS subject classifications. 90C30, 65K05, 49M37, 90C60, 68Q25.
22
1. Introduction. In this paper, we address the problem of minimizing a smooth
23
and generally nonconvex function within a region of the Euclidean finite dimensional
24
space. Initially, we will present a cubic regularization algorithm with cubic descent
25
that finds an approximate first-order stationary point with arbitrary precision ε or
26
a boundary point of the feasible region with evaluation complexity O(ε−3/2). In ad-
27
dition, complexity for finding second-order stationary points and convergence results
28
will be presented. Secondly, we introduce an algorithm for minimization on arbitrary
29
and generally nonconvex regions defined by inequalities and equalities. The evaluation
30
complexity of this algorithm for finding first-order stationary points is alsoO(ε−3/2),
31
when one uses cubic regularization of the functional quadratic approximation. The
32
most general form, in which we use pth Taylor polynomials to define subproblems,
33
has complexityO(ε−(p+1)/p). A version of this algorithm for minimizing smooth func-
34
tions with linear constraints will be introduced and implemented, for the casep= 2.
35
Moreover, the problem of minimizing with linear constraints will be addressed also
36
in a different way. Namely, we will consider each face of the polytope as a finite di-
37
mensional region within which we may employ the algorithm firstly introduced in the
38
paper. As mentioned above, such algorithm either finds an approximate stationary
39
point within the face or finds a point on its boundary. Then, as a natural combination
40
of the two first algorithms so far introduced, we will use the first one for minimizing
41
within faces and the second one for giving up constraints, when the current face is
42
∗Submitted to the editors April 24, 2017.
Funding: Supported by FAPESP (grants 2013/03447-6, 2013/05475-7, 2013/07375-0, and 2014/18711-3) and CNPq (grants 309517/2014-1 and 303750/2014-6).
†Dept. of Computer Science, Institute of Mathematics and Statistics, University of S˜ao Paulo, Rua do Mat˜ao, 1010, Cidade Universit´aria, 05508-090, S˜ao Paulo, SP, Brazil.egbirgin@ime.usp.br
‡Dept. of Applied Mathematics, Institute of Mathematics, Statistics, and Scientific Computing, State University of Campinas, 13083-859, Campinas, SP, Brazil. martinez@ime.unicamp.br
1
exhausted. This gives rise to a third algorithm with complexityO(ε−3/2) for finding
43
ε-approximate first-order stationary points. This method will be implemented and
44
compared with the second one. Therefore, the present paper introduces an algorithm
45
with complexityO(ε−3/2) for finding approximate KKT points of linearly-constrained
46
optimization problems, as a result of the manipulation of a theoretical algorithm for
47
constrained optimization. Theoretical algorithms for finding approximate KKT points
48
of general nonlinear programming problems with complexityO(ε−3/2) have been in-
49
troduced [5], but practical counterparts are not yet known.
50
Several numerical optimization traditions converge on the present work. Algo-
51
rithms that use quadratic models for unconstrained problems and employ regularized
52
or trust-regions subproblems combined with actual-versus-predicted-reduction or cu-
53
bic descent with proven complexityO(ε−3/2) were given in [8,16,17,18,20,21,28,33].
54
Cubic regularization was introduced as a useful optimization tool in [26,35]. The op-
55
timality of the complexity O(ε−3/2) was proved in [18]. The generalization of cubic
56
regularization to arbitrary (p+ 1)th regularization was given in [6]. In [25], the com-
57
plexity (close to O(ε−2)) of quasi-Newton methods for unconstrained optimization
58
was analyzed. The complexity of unconstrained optimization and some constrained
59
optimization algorithms assuming relaxed Lipschitz (H¨older) conditions on the ob-
60
jective function was analyzed in [19, 24, 27]. The idea of minimizing on polytopes
61
using a fast algorithm within faces combined with a suitable procedure to abandon
62
exhausted faces is in the core of active-set methods for constrained optimization and
63
may be found in several textbooks [11, 22,34]. Many algorithms for solving general
64
constrained optimization problems use subproblems that consist on the minimization
65
of combinations of objective and constraint functions subject to linear constraints
66
[11, 32, 22, 34]. The combined algorithm presented here is inspired in the ideas of
67
[3, 9, 10], where the algorithm for leaving faces was the spectral projected gradient
68
method [1,12,13,14,15].
69
This paper is organized as follows. Section2 introduces the cubic regularization
70
method that either finds an interior stationary point or stops at the boundary of
71
the feasible region. First- and second-order complexity results are presented. In
72
Section3, we introduce a model algorithm with (p+1)-regularized models for nonlinear
73
programming. In Section 4, we describe the method for minimization with linear
74
constraints that combines the procedures of Sections2and3. In Section5, we compare
75
the combined algorithm against the method presented in Section 3 with p = 2. In
76
Section6, we state some conclusions and lines for future research.
77
Notation. Given Ω ⊆ Rn, Int(Ω) denotes the set of interior points of Ω and ¯Ω
78
denotes its closure; if Ω is convex, PΩ(x) denotes the Euclidean projection ofxonto
79
Ω; ∇ and ∇2 are the gradient and Hessian operators, respectively; k · k denotes
80
the Euclidean norm; N= {0,1,2, . . .}; g(x) denotes the gradient of f(x) and H(x)
81
denotes its Hessian;λ1(H) denotes the leftmost eigenvalue of the symmetric matrix
82
H; Hk denotes H(xk); h0(x) denotes the Jacobian ofh : Rn → Rq; and, if a ∈ R,
83
[a]+= max{a,0}.
84
2. Inner-to-the-face algorithm. Consider the problem
85
Minimizef(x) subject tox∈Ω.
86
We assume that Ω⊂Rnhas non-empty interior andf is a continuous function defined
87
on an open set that contains Ω. In this section, we introduce an algorithm that finds
88
an interior point that approximately satisfies first- and second-order conditions for
89
unconstrained minimization; or, alternatively, it finds a point on the boundary of Ω.
90
More precisely, either the algorithm generates a finite sequence x0, x1, . . . , xˆk such
91
that the final point xˆk is on the boundary of Ω, xk ∈ Int(Ω) for all k < ˆk, and
92
f(xkˆ)< f(xˆk−1)<· · ·< f(x0), or it generates aninfinite sequence{xk}∞k=0⊂Int(Ω)
93
such that {f(xk)} is strictly decreasing, g(xk) → 0 and [−λ1(H(xk)]+ → 0 when
94
k→ ∞.
95
In the algorithm, for a given iterate xk and a trial step strial, we consider the
96
interiority condition
97
(1) xk+strial∈Int(Ω)
98
and the sufficient descent condition
99
(2) f(xk+strial)≤f(xk)−αkstrialk3.
100
The first step of the algorithm consists in computing, when possible, a solutionstrial
101
to the quadratic model
102
(3) Minimizeg(xk)Ts+1
2sTHks
103
and checking whether it satisfies (1,2) or not. Step 2 (that is executed when the first
104
steps is not successful) consists in, by increasing the regularizing parameter ρ in a
105
controlled way, finding a solutionstrial=strial(ρ) to the cubic regularized model
106
(4) Minimizeg(xk)Ts+1
2sTHks+ρksk3
107
that satisfies (1,2). Step 3 consists in verifying whether the regularizing parameter ρ
108
happened to be too large or not. Completing the verification may require the calcula-
109
tion of a point on the boundary of Ω and this task is performed at Step 4. At the end,
110
the computed new iterate may be a solution to (3) or (4) that belongs to the interior
111
of Ω and satisfies the sufficient descent condition or a point on the boundary of Ω.
112
In the latter case, the algorithm stops. The complete description of the algorithm
113
follows.
114
Algorithm2.1. Assume that x0∈Int(Ω), α >0, τ2 ≥τ1>1,M > ρmin>0,
115
andJ ≥0are given. Initialize k←0.
116
Step 1. Set j←0.
117
Step 1.1. If (3) is not solvable, go to Step 2.
118
Step 1.2. Defineρk,j = 0 andsk,j as a solution to (3).
119
Step 1.3. If (1) does not hold with strial =sk,j and j < J, compute, when possible,
120
a pointw on the boundary of Ω. If f(w)< f(xk)then definexk+1=w and
121
stop.
122
Step 1.4. If (1) and (2) hold with strial = sk,j, define sk = sk,j, ρk = ρk,j, and
123
xk+1 =xk+sk, set k←k+ 1, and go to Step 1. Otherwise, setj ←j+ 1
124
(and continue at Step 2 below).
125
Step 2. Set ρ¯←ρmin.
126
Step 2.1. Definesk,j as a solution to (4) withρ=ρk,j for someρk,j∈[τ1ρ, τ¯ 2ρ].¯
127
Step 2.2. If (1) does not hold with strial =sk,j and j < J, compute, when possible,
128
a pointw on the boundary of Ω. If f(w)< f(xk)then definexk+1=w and
129
stop.
130
Step 2.3. If (1) or (2) does not hold with strial=sk,j, set ρ¯←ρk,j,j ←j+ 1, and
131
go to Step 2.1.
132
Step 3. If
133
(5) ρk,j ≤M orj= 0 orxk+sk,j−1∈Int(Ω),
134
define sk =sk,j, ρk =ρk,j, andxk+1 =xk +sk, set k ← k+ 1, and go to
135
Step 1.
136
Step 4. Defineσ1= 3ksk,j−1kρk,j−1 andσ2= 3ksk,jkρk,j.
137
Step 4.1. Computeσ∈[σ1, σ2) andw=xk−(Hk+σI)−1g(xk)such thatw is on
138
the boundary ofΩ. (If xk+sk,j−1 is on the boundary ofΩthenσ=σ1 and
139
w=xk+sk,j−1 is the natural choice.)
140
Step 4.2. If f(w)< f(xk) then define xk+1 =w and stop. Otherwise, definesk =
141
sk,j,ρk =ρk,j, andxk+1=xk+sk and go to Step 1.
142
Note that, at Step 1.1, (3) is solvable if and only if Hk is positive definite or
143
Hk is positive semidefinite and g(xk) belongs to the range space of Hk. Note also
144
that, at Step 2.1, we solve the cubic regularization problem (4) with a regularization
145
parameterρwhich is a priori unknown and may take any value betweenτ1ρ¯andτ2ρ.¯
146
This may be done using a root-finding process that aims to compute the quadratic
147
regularization parameter that corresponds to a cubic regularization one between the
148
given bounds (see [8] for details). Recall that, as shown in [17, Thm. 3.1], the set of
149
solutions of cubic regularized problems, varyingρ, coincides with the set of solutions
150
of quadratic regularized problems, varyingσ. At Steps 1.3 and 2.2, a magical step is
151
being considered and its definition may depend on characteristics of Ω. We have in
152
mind the case in which Ω is a closed and convex set and the projection onto it is an
153
affordable task. In this case, if (1) does not hold withstrial =sk,j, we may definew
154
as the projection ofxk+sk,j onto Ω. This trick has been proved to be very useful in
155
practice. A similar device is used in [31]. The number of magical steps per iteration
156
is limited toJ.
157
Algorithm2.1has been described in such a way that the only reason for stopping
158
is to find an iterate on the boundary of Ω. In any other case, the algorithm generates
159
an infinite sequence. Of course, in practice, stopping criteria are necessary (and they
160
will be defined later) but, in theory, we are interested on the behavior of the potentially
161
infinite sequence of iterates.
162
When the algorithm arrives to Step 3, a point xk +sk,j such that (1) and (2)
163
hold with strial = sk,j has been computed. If, in addition, ρk,j ≤ M , xk+sk,j is
164
accepted as the new iterate. The obvious question is why we do not accept the trial
165
pointxk+sk,j, in spite of being interior and satisfying the sufficient descent condition,
166
when (5) does not hold. The reason is that, since ρk,j > M could be very big, sk,j
167
could be unacceptably small and, so, sufficient descent could not mean satisfactory
168
descent. This situation may only happen whenj >0 andxk+sk,j−1∈/Int(Ω), which
169
means that the previous trial point at the present iteration was rejected without
170
testing its functional value because it was not interior. Acceptingxk+sk,j as a new
171
iterate or not involves computing an additional point at the boundary of Ω and this
172
task is performed at Step 4.
173
Recall that, when the algorithm executes Step 4, we are in a situation in which
174
x = xk +sk,j is interior and satisfies (2); whereas y = xk +sk,j−1, the previously
175
computed trial point, is not interior. Moreover, bothxandycome from solving cubic
176
regularization subproblems with parametersρxandρy(ρymay be null), respectively,
177
whereρx≥2ρy ≥0. This means that bothxandycome from solving quadratic reg-
178
ularization problems with regularization parameters σx=σ2 = 3ksk,jkρk,j andσy =
179
σ1 = 3ksk,j−1kρk,j−1, respectively, with σx > σy ≥ 0. At Step 4.2, it is assumed
180
that the root-finding process of computing a solution to a quadratic regularized sub-
181
problem withσ∈[σy, σx) that lies on the boundary of Ω is an affordable task. The
182
proposition below shows that σ andw that must be computed at Step 4.2 are well
183
defined.
184
Proposition 2.1. Assume that Ω⊂Rn, ϕ : [a, b]→ Rn is continuous, ϕ(a) ∈
185
Int(Ω), and ϕ(b) ∈/ Ω. Then, there exists t¯∈ [a, b] such that ϕ(¯t) belongs to the
186
boundary ofΩ.
187
Proof. Define ¯t ∈ [a, b] as the supremum of the values of t ∈ [a, b] such that
188
ϕ(t)∈Ω.
189
If ¯t=b, there exists a sequencetk →bsuch thatϕ(tk)∈Ω and, by continuity of
190
ϕ,ϕ(tk)→ϕ(b). Then,ϕ(¯t) is on the boundary of Ω and we are done.
191
Consider the case in which ¯t < b. On the one hand, for all t ∈ [0,¯t), we have
192
that ϕ(t) belongs to Ω. On the other hand, by the definition of supremum, for all
193
t0 ∈(¯t,1], there existst00∈[¯t, t0] such thatϕ(t00) does not belong to Ω. Therefore, in
194
every neighborhood ofϕ(¯t) there are points that belong to Ω and points that do not
195
belong to Ω. By continuity ofϕ, this implies that ϕ(¯t) is on the boundary of Ω.
196
It must be noted that, by assuming that it is an affordable task to find a trial
197
pointwthat at the same time is a solution to a quadratic regularization subproblem
198
and it is at the boundary of Ω, a single functional evaluation is performed (at Step 4.2)
199
to decide whether to reach the boundary or to remain in the interior of Ω. Having
200
computed w, the algorithm checks whether f(w) < f(xk) or not. In case it is, the
201
algorithm definesxk+1=wand stops. Iff(w)≥f(xk), this means that (2) does not
202
hold withstrial=w−xk. In this case, the algorithm acceptsxk+sk,j as new iterate.
203
This is because the fact thatstrial=w−xkdoes not satisfy (2) reveals that the cubic
204
regularization parameterρk,j associated withxk+sk,j is not unnecessarily big and,
205
as a consequence, thatxk+sk,j is acceptable as new iterate.
206
AssumptionA1. There exists L > 0 such that for all xk computed by Algo-
207
rithm2.1and all strial considered at (2) we have that
208
(6) f(xk+strial)≤f(xk) +g(xk)Tstrial+1
2(strial)THkstrial+Lkstrialk3.
209
Lemma 2.1. Suppose that Assumption A1 holds. If ρk,j ≥L+αthen (2) holds
210
withstrial=sk,j.
211
Proof. Ifρk,j ≥L+α, by AssumptionA1, we have that
212
f(xk+sk,j) ≤ f(xk) +g(xk)Tsk,j+12(sk,j)THksk,j+Lksk,jk3
= f(xk) +g(xk)Tsk,j+12(sk,j)THksk,j+ (L+α)ksk,jk3−αksk,jk3
≤ f(xk) +g(xk)Tsk,j+12(sk,j)THksk,j+ρk,jksk,jk3−αksk,jk3. 213
Sincesk,j being a solution to (4) withρ=ρk,j implies that
214
g(xk)Tsk,j+1
2(sk,j)THksk,j+ρk,jksk,jk3≤0,
215
(2) follows from the last inequality.
216
Lemma 2.2. Suppose that Assumption A1holds. Then, the sequence{xk} gener-
217
ated by Algorithm 2.1 is well defined. Moreover, if the algorithm does not stop at a
218
boundary point at iteration k(in which case ρk is undefined), we have that
219
(7) ρk ≤max{M, τ2(L+α)}.
220
Proof. Proving that the sequence{xk} is well defined consists in showing that,
221
given xk ∈Int(Ω), the point xk+1 is computed in finite time. If xk+1 is a solution
222
to (3) computed at Step 1 or it is a point in the boundary of Ω computed at Steps 1.3,
223
2.2, or 4, we are done since those calculations involve a finite number of operations
224
by definition. We now assume thatxk+1=xk+sk,j andsk,j is a solution to (4) with
225
ρ =ρk,j (computed at Step 3). Since xk is interior, for ρk,j large enough we have
226
thatxk+sk,j is interior too, meaning that (1) holds withstrial=sk,j. Moreover, by
227
Lemma2.1, ifρk,j≥L+αthen (2) holds withstrial=sk,j. Thus, since the updating
228
rule of ρk,j implies that ρk,j → ∞ when j → ∞, we have that xk +sk,j satisfying
229
(1,2) withstrial=sk,j is found in finite time.
230
Let us now prove the boundedness ofρk. Ifρk =ρk,j is defined at Step 3 then
231
we have that ρk ≤ M. Assume now that ρk = ρk,j is defined at Step 4.2. By the
232
definition of the algorithm, we have that ρk,j ∈ [τ1ρk,j−1, τ2ρk,j−1]. Moreover, the
233
point won the boundary of Ω was rejected. This point is of the form w=xk+sw
234
withsw=−(Hk+σ)−1g(xk) a solution to the quadratic regularized subproblem
235
Minimizeg(xk)Ts+1
2sTHks+σ 2ksk2
236
with σ∈[σ1, σ2), σ1 = 3ksk,j−1kρk,j−1, and σ2 = 3ksk,jkρk,j. This means (see [17,
237
Thm. 3.1]) thatsw is a solution to (4) with ρ=ρw=σ/(3kswk) satisfyingρk,j−1≤
238
ρw ≤ρk,j. The point w was rejected becausef(w) ≥f(xk), which means that (2)
239
withstrial =sw does not hold. Therefore, by Lemma2.1, we have thatρw 6≥L+α.
240
Thus,ρk,j−1< L+αand, in consequence, ρk,j < τ2(L+α) as we wanted to prove.
241
Lemma 2.3. If sk = 0theng(xk) = 0andHk is positive semidefinite.
242
Proof. Since
243
∇
g(xk)Ts+1 2sTHks
|s=0=g(xk)
244
and∇2
g(xk)Ts+12sTHks
|s=0=Hk, ifsk= 0 solves (3) then we have thatg(xk) =
245
0 andHk is positive semidefinite.
246
The gradient of the objective function that defines (4) is g(xk) +Hks+ 3ρksks.
247
Therefore, if sk = 0 is a solution to (4) then we have thatg(xk) = 0. Moreover, for
248
alls∈Rn, we must have
249
1
2sTHks+ρksk3≥0,
250
otherwisesk= 0 would not be a solution to (4). Then, for all s6= 0,
251
1 2
sTHks
ksk2 +ρksk ≥0.
252
In particular, ifs6= 0 is an eigenvector associated withλ1(Hk), it turns out that
253
1
2λ1(Hk) +ρksk ≥0.
254
Taking limits fors→0 it follows thatλ1(Hk)≥0 as we wanted to prove.
255
Lemma 2.4. Suppose that Assumption A1holds, that Algorithm 2.1generates an
256
infinite sequence{xk}∞k=0, and that{f(xk)}∞k=0 is bounded below. Then,
257
(8) lim
k→∞kskk= 0
258
and
259
(9) lim
k→∞[−λ1(Hk)]+= 0.
260
Proof. Since {f(xk)} is bounded below, (8) follows from the fulfillment of the
261
sufficient descent condition (2) withstrial=sk for allk.
262
Moreover, sk is a minimizer of g(xk)s+ 12sTHks+ρkksk3 and, by Lemma 2.3,
263
Hk is positive semidefinite if sk = 0. If sk 6= 0, the Hessian of the cubic function
264
g(xk)s+12sTHks+ρkksk3 must be positive semidefinite. This Hessian is
265
Hk+ 3ρk
sk(sk)T
kskk +kskkI
.
266
Thus,
267
−λ1
Hk+ 3ρk
sk(sk)T
kskk +kskkI
+
≥0.
268
Sincesk →0 and, by Lemma2.2, ρk is bounded, we have that (9) holds.
269
AssumptionA2. Assumption A1 holds and, for all xk andsk computed by Al-
270
gorithm 2.1,
271
(10) kg(xk+sk)k ≤(L+ 3ρk)kskk2.
272
A sufficient condition for the fulfillment of AssumptionA2comes from assuming
273
thatg(xk+sk) is well represented by its linear approximation, namely,
274
(11) kg(xk+sk)−g(xk)−H(xk)skk ≤Lkskk2.
275
In this case, AssumptionA2holds due to the gradient annihilation of the subproblem
276
at Step 2. Moreover, a sufficient condition for the fulfillment of both AssumptionsA1
277
and A2 is the fulfillment of a Lipschitz condition by the Hessian H(x). It is inter-
278
esting to note that AssumptionA2 requires (10) to be verified only with respect to
279
incrementsskthat satisfy the interiority and the sufficient descent condition; whereas
280
AssumptionA1requires (6) to hold at every trial incrementstrial.
281
Lemma 2.5. Suppose that Assumption A2holds. Then, for allxk andsk gener-
282
ated by Algorithm2.1such that xk+1=xk+sk ∈Int(Ω), we have that
283
(12) f(xk+1)≤f(xk)−α
kg(xk+1)k
L+ 3 max{M, τ2(L+α)}
3/2 .
284
Proof. By AssumptionA2and Lemma2.2, we have that
285
(13) kg(xk+1)k ≤(L+ 3 max{M, τ2(L+α)})kskk2.
286
Thus, (12) follows from (13) and the fact that (2) holds withstrial=sk.
287
Theorem 2.1. Suppose that Assumption A2 holds and let ftarget ∈ R, εg >0,
288
and εh > 0 be arbitrary. Let ρmax = max{M, τ2(L+α)}. Then, the number of
289
iterations at which
290
f(xk)> ftarget and kg(xk)k> εg 291
is, at most,
292
(14)
$
f(x0)−ftarget
α
kεgk L+ 3ρmax
−3/2% .
293
Moreover, the number of iterations at which
294
f(xk)> ftarget and λ1(Hk)<−εh
295
is, at most,
296
(15)
$f(x0)−ftarget
α
εh 6ρmax
−3% .
297
Finally, at each iterationk, the number of functional evaluations is bounded by
298
J+
logτ1 ρmax
ρmin
+ 2.
299
Proof. The maximum number of iterations (14) follows directly from Lemma2.5.
300
The stepskis a minimizer ofg(xk)Ts+12sTHks+ρkksk3, where, by Lemma2.2,
301
ρk ≤ρmax. Thus, if sk 6= 0, the Hessian ofg(xk)Ts+12sTHks+ρkksk3 is positive
302
semidefinite ats=sk. By direct calculations, we see that this Hessian is given by
303
Hk+ 3ρk
sk(sk)T
kskk +kskkI
.
304
Letvk be an eigenvector ofHk associated withλ1(Hk) andkvkk= 1. Then, by the
305
positive definiteness ofHk+ 3ρhsk(sk)T
kskk +kskkIi
, we have that
306
0 ≤ vkT
Hk+ 3ρk
hsk(sk)T
kskk +kskkIi vk
≤ λ1(Hk) + 3ρk
(vkTsk)2/kskk+kskk
≤ λ1(Hk) + 3ρk
kskk2/kskk+kskk
≤ λ1(Hk) + 6ρmaxkskk.
307
Thus,kskk ≥ −λ1(Hk)/(6ρmax) or, equivalently,−kskk3≤(λ1(Hk)/(6ρmax))3. There-
308
fore, since (2) withstrial=sk holds for allk, we have that
309
f(xk+1)≤f(xk)−αkskk3≤f(xk) +α([λ1(Hk)]/(6ρmax))3,
310
from which (15) follows.
311
In order to establish the number of functional evaluations per iteration k, first
312
note that, while the interiority condition (1) is not being satisfied withstrial=sk,j, on
313
the one hand there is no need to check whether the descent condition (2) holds with
314
strial=sk,jand, on the other hand, simple decrease may be checked at no more thanJ
315
magical steps. This means that, at every iteration k, while (1) does not hold with
316
strial =sk,j, at most J functional evaluations are performed. Once (1) is satisfied,
317
the limit on the number of functional evaluations follows from the boundedness ofρk 318
for allk, given by Lemma2.2, the fact thatρk,0= 0 by definition, and the updating
319
rules forρk,j that guarantees that (a) if (3) is solvable then ρk,j ≥2j−1ρmin for all
320
j ≥1 and (b) if (3) is not solvable then ρk,j ≥2jρmin for all j ≥0. This completes
321
the proof.
322
Theorem 2.2. Suppose that Assumption A2 holds and that the sequence {xk}
323
generated by Algorithm 2.1 is bounded. Then, either the sequence stops at some
324
boundary point xkˆ such that f(xˆk)< f(xk−1ˆ )<· · ·< f(x0); or an infinite sequence
325
is generated such that
326
lim
k→∞kg(xk)k= 0 and lim
k→∞[−λ1(Hk)]+= 0.
327
Proof. If the sequence stops at a boundary point xˆk then f(xkˆ) < f(xˆk−1) <
328
· · ·< f(x0) follows by the definition of the algorithm. Assume that the sequence does
329
not stop at a boundary point. Since the sequence is bounded, by continuity, there
330
exists fbound ∈ R such that f(xk) > fbound for all k. Define ftarget = fbound. Let
331
ε >0 be arbitrary. By Theorem2.1, taking εg =ε we see that there exists k0 such
332
that, for all k ≥k0, kg(xk)k ≤ ε. The fact that limk→∞[−λ1(Hk)]+ = 0 has been
333
proved in Lemma2.4. This completes the proof.
334
3. High-order algorithms for constrained optimization. Consider the prob-
335
lem
336
(16) Minimizef(x) subject tox∈D,
337
where D ⊂ Rn represents an arbitrary set. In this section, we introduce a class of
338
methods for solving (16). The algorithms to be introduced in this section can be used
339
in connection with the algorithm introduced in Section2in different ways. Therefore,
340
this class of algorithms may be seen as independent procedures for solving the main
341
problem (16) or as auxiliary devices for continuing Algorithm2.1when “a face” ofD
342
should be abandoned.
343
Algorithm3.1 below is a high-order algorithm in which each iteration is defined
344
by the approximate minimization of thepth Taylor approximation of the functionf
345
around the iteratexkalong the lines of [6]. Iff :Rn→Rwith continuous derivatives
346
up to orderp∈ {1,2,3, . . .}, the Taylor polynomial of order pwill be written in the
347
form
348
(17) Tp(x, s) =
p
X
j=1
Pj(x, s),
349
wherePj(x, s) is an homogeneous polynomial of degreej given by
350
(18) Pj(x, s) = 1
j!
s1
∂
∂x1
+· · ·+sn
∂
∂xn
j
f(x).
351
For example,T1(x, s) =g(x)TsandT2(x, s) =g(x)Ts+12sTH(x)s.
352
Algorithm3.1. Assume that p∈ {1,2,3, . . .}, α >0, ρmin >0, τ2 ≥τ1 >1,
353
θ >0, andx0∈D are given. Initialize k←0.
354
Step 1. Set ρ←0.
355
Step 2. Compute strial∈Rn such that
356
(19) xk+strial∈D
357
and
358
(20) Tp(xk, strial) +ρkstrialkp+1≤0.
359
Step 3. Test the condition
360
(21) f(xk+strial)≤f(xk)−αkstrialkp+1.
361
If (21) holds, define sk =strial,ρk =ρ, and xk+1 =xk+sk, setk←k+ 1,
362
and go to Step 1. Otherwise, updateρ←max{ρmin, τ ρ} withτ ∈[τ1, τ2], and
363
go to Step 2.
364
The trial increment strial is intended to be an approximate solution to the sub-
365
problem
366
(22) MinimizeTp(xk, s) +ρkskp+1 subject toxk+s∈D.
367
The condition (20) is the minimal condition that should be imposed to the approx-
368
imate solution to (22) in order to obtain meaningful results; although an obvious
369
choice that satisfies (19), (20), and (21) isstrial= 0, which is not useful at all and it
370
will be discarded later.
371
Lemma 3.1. Assume that the sequence {xk} is generated by Algorithm 3.1. If
372
{f(xk)} is bounded below then
373
lim
k→∞kskk= 0.
374
Proof. The proof follows straightforwardly from the hypotheses of the lemma
375
and (21).
376
The following assumption coincides with AssumptionA1in the casep= 2.
377
AssumptionA3. There exists L > 0 such that for all xk computed by Algo-
378
rithm3.1and all strial considered at (21) we have that
379
(23) f(xk+strial)≤f(xk) +Tp(xk, strial) +Lkstrialkp+1.
380
Lemma 3.2. Suppose that Assumption A3holds. If the regularization parameter
381
ρin (20) satisfiesρ≥L+αthen a trial stepstrial that satisfies (20) also satisfies the
382
sufficient descent condition (21).
383
Proof. By AssumptionA3,
384
f(xk+strial) ≤ f(xk) +Tp(xk, strial) +Lkstrialkp+1
= f(xk) +Tp(xk, strial) +ρkstrialkp+1−ρkstrialkp+1+Lkstrialkp+1. 385
Then, by (20),
386
f(xk+strial)≤f(xk)−ρkstrialkp+1+Lkstrialkp+1.
387
Therefore, ifρ≥L+α, (21) holds.
388
AssumptionA4. Assumption A3 holds and, for all xk andsk computed by Al-
389
gorithm 3.1, we have that
390
(24) kg(xk+sk)− ∇sTp(xk, sk)k ≤Lkskkp.
391
AssumptionsA3andA4are satisfied if thepth derivatives off satisfy a Lipschitz
392
condition (see [6]). As in the case of AssumptionsA1andA2, observe that Assump-
393
tionA3must hold for every trial incrementstrial; whereas AssumptionA4must hold
394
only at the accepted incrementssk.
395
AssumptionA5. The setD is defined by
396
(25) D={x∈Rn|hi(x)≤0 for all i= 1, . . . , q},
397
where the functionshi are continuously differentiable. Moreover, every feasible point
398
of (25) satisfies a constraint qualification.
399
Assumption A5 implies that, for any function ϕ : Rn →R, if z is a minimizer
400
of ϕ subject to z ∈ D, associated KKT conditions hold. For the sake of simplicity
401
and without loss of generality, we formulated the definition of D only in terms of
402
inequality constraints. Implicitly, we assume that if equality constraints are present,
403
they are expressed as pairs of inequalities.
404
For allx∈D, we define
405
(26) L(D, x) ={z∈Rn|hi(x) +h0i(x)(z−x)≤0 for all i= 1, . . . , q}.
406
We say thatL(D, x) is the linear approximation of D aroundx. Givenx∈ D and
407
a function ϕ, we will denote∇Dϕ(x) =PL(D,x)(x− ∇ϕ(x))−x.Direct calculations
408
show thatxsatisfies the KKT conditions of the problem of minimizingϕ ontoD if,
409
and only if, ∇Dϕ(x) = 0. If there existsxk →x∗ such that ∇Dϕ(xk)→0, we say
410
that x∗ satisfies the Approximate Gradient Projection (AGP) sequential optimality
411
condition [2,29].
412
AssumptionA6. There exists θ >0 such that, for allk∈N, approximate solu-
413
tionssk to the subproblem (22) satisfy
414
(27)
∇D
Tp(xk, x−xk) +ρkx−xkkp+1 x=xk+sk
≤θkskkp.
415
Assumption A6 states that the accepted increment sk satisfies, approximately,
416
an AGP optimality condition. If the constraints satisfy some constraint qualification,
417
every minimizer of (22) satisfies the AGP condition and, consequently, also fulfills (27).
418
Therefore, AssumptionA6states the degree of accuracy with which one wishes to solve
419
the subproblems (and this is the assumption that eliminates the possibility of taking
420
sk = strial = 0). Note that the degree of accuracy (27) is required not to all the
421
subproblems but only to the ones that, ultimately, define algorithmic progress.
422
Lemma 3.3. Suppose that Assumptions A4, A5, and A6 hold and that the se-
423
quence {xk} is generated by Algorithm3.1. Then, at each iteration k, we have that
424
(28)
∇Df(xk+sk)
≤(L+τ2(L+α) (p+ 1) +θ)kskkp.
425
Proof. By the definition of∇D, we have that
426
(29)
k∇Df(xk+sk)− ∇DTp(xk, x−xk)|x=xk+skk
= kPL(D,xk+sk)
(xk+sk)−g(xk+sk)
− PL(D,xk+sk)
(xk+sk)− ∇Tp(xk, x−xk)|x=xk+sk
k
≤ k(xk+sk)−g(xk+sk)− (xk+sk)− ∇Tp(xk, x−xk)|x=xk+sk
k
= kg(xk+sk)− ∇Tp(xk, x−xk)|x=xk+skk ≤Lkskkp,
427
where the first inequality follows from the contraction property of projections and the
428
last inequality follows from AssumptionA4. This means that
429
k∇Df(xk+sk)k
≤ Lkskkp+k∇DTp(xk, x−xk)|x=xk+skk
≤ (L+θ)kskkp+ρk∇D
kx−xkkp+1
|x=xk+skk
= (L+θ)kskkp+ρkPL(D,xk+sk)
(xk+sk)− ∇
kx−xkkp+1
|x=xk+sk
−(xk+sk)k
≤ (L+θ)kskkp+ρk(xk+sk)− ∇
kx−xkkp+1
|x=xk+sk−(xk+sk)k
= (L+θ)kskkp+ρk∇
kx−xkkp+1
|x=xk+skk
= (L+θ+ρ(p+ 1))kskkp, 430
where the first inequality follows from (29), the second inequality follows from As-
431
sumption A6, and the third inequality follows from the fact xk +sk belongs to
432
L(D, xk +sk) and, therefore, for any z ∈ Rn, PL(D,xk+sk)(z) is closer to xk +sk
433
thanz itself. Finally, by Lemma3.2, we have thatρ≤τ2(L+α), which implies the
434
desired result.
435
Lemma 3.4. Suppose that Assumptions A4, A5, and A6 hold and that the se-
436
quence {xk} is generated by Algorithm3.1. Then, at each iteration k, we have that
437
(30) f(xk+1)≤f(xk)−α
k∇Df(xk+1)k L+τ2(L+α) (p+ 1) +θ
(p+1)/p .
438
Proof. The result follows straightforwardly from Lemma3.3and (21).
439
Theorem 3.1. Suppose that AssumptionsA4,A5, and A6hold and that the se-
440
quence {xk} is generated by Algorithm 3.1. Let ftarget ∈R andεg >0 be arbitrary.
441
Then, the number of iterationsksuch that
442
f(xk)> ftarget and k∇Df(xk+1)k ≥εg
443
is not greater than
444
(31)
$f(x0)−ftarget
α
εg
L+τ2(L+α) (p+ 1) +θ
−(p+1)/p%
445
and the number of functional evaluations per iteration is bounded above by
446
logτ1
τ2(L+α) ρmin
+ 1.
447
Finally, if{f(xk)} is bounded below,
448
(32) lim
k→∞k∇Df(xk+1)k= 0.
449
Proof. The maximum number of iterations (31) follows from Lemma 3.4. The
450
bound on the number of functional evaluations per iteration follows from the updating
451
rule for ρand Lemma3.2; while (32) follows from Lemma3.4 and the boundedness
452
of{f(xk)}.
453
AssumptionA7. For allk∈N,sk is a global minimizer ofTp(xk, s) +ρkkskp+1
454
subject to xk+s∈D.
455
If the constraintsh(x)≤0 are simple enough, AssumptionA7implies Assump-
456
tionA6.
457
Theorem 3.2. Suppose that AssumptionA4,A5,A6, andA7 hold and that the
458
sequence {xk} is generated by Algorithm 3.1. Letx∗ be a limit point of{xk}. Then,
459
there exists ρ∗ ∈ [0, τ2(L+α)] such that s= 0 is a global minimizer of Tp(x∗, s) +
460
ρ∗kskp+1 subject to x∗+s∈D.
461
Proof. By Lemma 3.2, ρk ∈ [0, τ2(L+α)] for allk ∈N. Therefore, there exists
462
ρ∗∈[0, τ2(L+α)] and an infinite sequence of indices Ksuch that
463
lim
k∈Kρk=ρ∗ and lim
k∈Kxk=x∗.
464
Lets∈Rn such thatxk+s∈D be arbitrary. By AssumptionA7, we have that
465
Tp(xk, sk) +ρkkskkp+1≤Tp(xk, s) +ρkkskp+1.
466
Taking limits in this inequality fork∈K and using the continuity of the derivatives
467
off up to order p, the fact that, by Lemma3.1,sk→0 andρk →ρ∗, we obtain that
468
Tp(x∗,0) +ρ∗k0kp+1≤Tp(x∗, s) +ρ∗kskp+1.
469
Sincessatisfyingx∗+s∈Dis arbitrary, we obtain the desired result.
470
Recall that, if a point x∗ is an unconstrained local minimizer of a function f,
471
we have that such point is p-stationary. A pointx∈ Rn will be said to beq-order
472
stationary (with 1≤q≤p) if it is (q−1)-order stationary and, for all v ∈Rn such
473
that P0(x, v) =· · · =Pj−1(x, v) = 0, one has that Pj(x, v)≥0. By convention, we
474
will say that every pointx∈Rn is 0-order stationary andP0(x, v) = 0.
475
Corollary 3.1. Assume that q = 0 (that is, D = Rn and the problem is un-
476
constrained). Under the hypotheses of Theorem 3.2, the limit point x∗ is m-order
477
stationary for allm≤p.
478
Proof. By Theorem 3.2, s= 0 is q-order stationary for the function Tp(x∗, s) +
479
ρ∗kskp+1. But the conditions ofq-order stationarity of this function ats= 0 are the
480
same as the ones off atx∗since all their derivatives up to orderpcoincide.
481
Anm-order stationary condition for local minimization ofϕ(x) subject tox∈D
482
is a property that involves derivatives up to order mofϕ as well as properties ofD
483
and must be satisfied by any local minimizer ofϕ. Since, for all m≤p, them-order
484
partial derivatives off atx∗ coincide with those ofTp(x∗, s) +ρ∗kskp+1 ats= 0, we
485
may extend Corollary3.1to the constrained optimization case as follows.
486
Corollary 3.2. Under the hypotheses of Theorem 3.2, the limit point x∗ is m-
487
order stationary for allm≤p.
488
4. Linearly-constrained optimization. In this section, we consider the prob-
489
lem
490
(33) Minimizef(x) subject tox∈D,
491
whereD⊂Rn is a polytope defined by
492
D={x∈Rn|(ai)Tx≤bi for alli= 1, . . . , q}.
493