Part IV - Algorithms
Gabriel Haeser
Department of Applied Mathematics Institute of Mathematics and Statistics
University of São Paulo São Paulo, SP, Brazil
Santiago de Compostela, Spain, October 28-31, 2014
Part I - Introduction to nonlinear optimization Examples and historical notes
First and Second order optimality conditions Penalty methods
Interior point methods Part II - Optimality Conditions
Algorithmic proof of Karush-Kuhn-Tucker conditions Sequential Optimality Conditions
Algorithmic discussion
Part III - Constraint Qualifications Geometric Interpretation
First and Second order constraint qualifications Part IV - Algorithms
Augmented Lagrangian methods Inexact Restoration algorithms Dual methods
Choose a sequence{ρk}withρk →+∞and for eachksolve the subproblem
Minimize f(x) +ρkP(x), Subject to x∈Ω¯
obtaining the global solutionxk, if it exists.
Penalty function:P(x) =0ifh(x) =0andg(x)≤0;
P(x)>0otherwise.
Theorem:If{xk}is well defined then every limit point of{xk}is a global solution to MinimizeP(x), Subject tox∈Ω.¯
Theorem:If{xk}is well defined and there exists a point inΩ¯ where the functionPvanishes (feasible region is not empty), then every limit point of{xk}is a global solution of
Minimizef(x), Subject toh(x) =0,g(x)≤0,x∈Ω.¯
This method is of little practical relevance for two main reasons:
The parameterρk must really tend to infinity in order to reach a solution, which makes subproblems hard to solve (ill-conditioning).
Finding global solutions, even for the subproblems, is almost impossible. In practice we can only find (approximate) stationary points.
squared`2-norm penalty:
P(x) =kh(x)k22+kmax{0,g(x)}k22
`∞-norm penalty:
P1(x) =max{0,|h1(x)|, . . . ,|hm(x)|,g1(x), . . . ,gp(x)}
Theorem(exact penalty): The global solution of
Minimizef(x) +ρP1(x),subject tox∈Ω, is the global solution of¯ the original problem for allρ >ρ.¯
Consider the problem
Minimize f(x), subject to h(x) =0,
such that atx∗, there exist a Lagrange multiplierλ∗ fulfilling the second order sufficient condition
dT(∇f(x∗) +
m
X
i=1
λ∗i∇hi(x∗))d>0, ∀d 6=0,∇h(x∗)Td=0.
Let’s prove thatx∗is a solution. In particular, let’s consider the equivalent problem
Minimize f(x) +ρ2kh(x)k22, subject to h(x) =0,
Consider the “augmented lagrangian function” defined as the usual lagrangian function for the previous problem:
Lρ(x, λ) =f(x) +h(x)Tλ+ ρ
2kh(x)k2 =f(x) +ρ
2kh(x) +λ ρk2+c, for some constantc.
Let’s prove thatx∗is a local minimizer toLρ(x, λ∗)for sufficiently largeρ.
Lemma:LetPandQbe symmetric matrices withQpositive semidefinite andxTPx>0ifx6=0andxTQx=0, thenP+ρQis positive definite for sufficiently largeρ.
Proof:
AssumexTk(P+kQ)xk ≤0,kxkk=1,xk →x xTkPxk+kxTkQxk≤0withxTkQxk ≥0
Hence, taking limits,xTQx=0
ThenxTPx>0. SincexTkPxk ≤0, we have a contradiction.
Evaluating derivatives at(x∗, λ∗):
Lρ(x, λ) =f(x) +h(x)Tλ+ρ2kh(x)k2
∇xLρ(x∗, λ∗) =∇f(x∗) +
m
X
i=1
λ∗i∇hi(x∗) +ρ
m
X
i=1
hi(x∗)∇hi(x∗)
=∇f(x∗) +Pm
i=1(λ∗i +ρhi(x∗))∇hi(x∗) =∇xL(x∗, λ∗) =0.
∇2xxLρ(x∗, λ∗) =
∇2f(x∗) +
m
X
i=1
(λ∗i +ρhi(x∗))∇2hi(x∗) +ρ∇h(x∗)∇h(x)T
=∇2L(x∗, λ∗) +ρ∇h(x∗)∇h(x∗)T
SincedTL(x∗, λ∗)d>0for alld6=0,∇h(x∗)Td=0, this is true whendT∇h(x∗)∇h(x∗)Td=0. Hence, using the Lemma,
∇2xxLρ(x∗, λ∗)is positive definite for sufficiently largeρ.
Hencex∗is a local minimizer of Lρ(x, λ∗) =f(x) +h(x)Tλ∗+ρ2kh(x)k2.
The result follows asLρ(x, λ∗) =f(x)for feasiblex.
This proof suggests minimizing the augmented lagrangian and successively increasing the penalty parameter. The problem is that we don’t know the exact Lagrange multiplierλ∗. But if we know an approximation toλ∗, we can get an approximation to x∗.
Minimize 1
2x21+x22, subject tox1=1.
x∗ = (1,0),(1,0) +λ∗(1,0) =0, λ∗ =−1.
Lρ(x, λ) = 1
2x21+x22+λ(x1−1) +ρ
2(x1−1)2
∇xLρ(x, λ) = (x1+λ+ρ(x1−1),x2) = (0,0) x1= ρ−λ
ρ+1,x2=0 Ifλ→λ∗, thenx→x∗
Minimize f(x), subject to g(x)≤0, Definehi(x) :=gi(x) +z2i =0then
MinimizeLρ(x, µ) =f(x) +ρ2Pp
j=1(gj(x) +z2j +µρj)2 z∗j =0ifgj(x) +µρj ≥0,
z∗j =q
−gj(x)−µρj otherwise.
Hence we may consider Lρ(x, µ) =f(x) + ρ2Pp
j=1max{0,gj(x) +µρj}2
Given a penalty parameterρk and approximationsλk andµk to Lagrange multipliers, we findxk minimizing the augmented lagrangianLρk(x, λk, µk) =
f(x) +ρk
2
m
X
i=1
hi(x) +λki ρk
2
+
p
X
j=1
max (
0,gj(x) +µkj ρk
)2
.
Since∇xLρk(xk, λk, µk) =0, we get
∇f(xk)+
m
X
i=1
(λki+ρkhi(xk))∇hi(xk)+
p
X
j=1
max{0, µkj+ρkgj(xk)}∇gj(xk) =0.
This suggests that we should defineλk+1i =λki +ρkhi(xk)and µk+1j =max{0, µkj +ρkgj(xk)} ≥0.
Note that ifgj(x∗)<0and{µkj}is bounded then forρk
sufficiently large,µk+1j =0and complementarity is fulfilled.
Since we are using Lagrange multipliers approximations, if a
“good” iteration is performed, we can avoid increasing the penalty parameter and only update Lagrange multipliers.
A “good” iteration should sufficiently decrease infeasibility and a complementarity measure.
ConsiderVjk =max{gj(xk),−µ
k j
ρk}and note that Vjk=0
if and only if
gj(xk)≤0andµkj =0whenevergj(xk)<0
Initializex0, λ1, µ1≥0, ρ1>0and parametersσ ∈(0,1)and γ >1. Fork=1,2, . . .
1 Findxk, a solution to MinimizeLρk(x, λk, µk).
2 ComputeVjk =max{gj(xk),−µ
k j
ρk},j=1, . . . ,p.
3 Ifmax{kh(xk)k,kV(xk)k} ≤σmax{kh(xk−1)k,kV(xk−1)k}
Defineρk+1=ρk
Otherwise, defineρk+1=γρk
4 Defineλk+1i =P[λmin,λmax](λki +ρkhi(xk))and
µk+1j =P[0,µmax](µkj +ρkgj(xk)),i=1, . . . ,m,j=1, . . . ,p.
This framework allows to replace
1 Findxk, a solution to MinimizeLρk(x, λk, µk).
by
1 Findxk, an approximate stationary point of Minimize Lρk(x, λk, µk), that is,k∇xLρk(xk, λk, µk)k ≤εk
withεk →0+.
Theorem:The limit points of a sequence generated by the algorithm are stationary points of the problem of minimizing the infeasibility measurekh(x)k22+kmax{0,g(x)}k22. If a limit point is feasible, it is an Approximate-KKT point (this means that under a weak constraint qualification, the KKT condition holds). Also, under Mangasarian-Fromovitz constraint qualification,{λk}and {µk}converges to true Lagrange multipliers.
Remark 1: Requiring second-order approximate stationarity in the subproblems, we get convergence (under a weak constraint qualification) to a second-order stationary point.
Remark 2: Under linear independence constraint qualification and second-order sufficient condition, the penalty parameter remains bounded.
Minimize f(x) Subject to h(x) =0,
x∈Ω,
f :Rn→R,h:Rn→Rmsmooth functions.Ωis a bounded polytope.
Restoration phase: Given a pointxk, obtainykcloser to the feasible region. This should be done in such a way that the objective function atyk is not radically worst then atxk. Optimization phase: Obtain a trial pointzk with better objective function, without losing much of the feasibility of yk.
Globalization: If the trial point is not good enough, obtain a new one closer toyk.
Dealing with feasibility and optimality at the same time may have some drawbacks:
The subproblems may be infeasible.
The optimization model for the subproblem is based in a point away from the feasible set.
One of the goals may be much easier than the other.
One (or both) of the goals may have a structure that can be exploited.
One (or both) of the goals may have a structure that can be exploited.
Electronic structure calculation (Martínez, et al.) Optimal control (Kaya, et al.)
Derivative free optimization (Bueno, et al.) Bilevel optimization (Andreani, et al.) Multiobjective problems
Parameters:r ∈[0,1), β >0, τ >0, γ >0, γ >0
Step 0:Initialization: Choosex0∈Ωandθ−1 ∈(0,1). Define k=0.
Step 1:Restoration phase: Computeyk ∈Ωsuch that kh(yk)k ≤ rkh(xk)k
f(yk)−f(xk) ≤ βkh(xk)k Step 2:Optimality phase:
Compute the directiondk =Pk(−∇f(yk)), wherePk is de euclidean projection onto
Tk={d∈Rn |yk+d∈Ω,∇h(yk)Td=0}.
Step 3.1:Globalization: Penalty parameter
Computeθk as the supremum of the values ofθ∈[0, θk−1]such that
Φ(yk, θk)−Φ(xk, θk)≤ 1
2(1−r)(kh(yk)k − kh(xk)k), whereΦ(x, θ) =θf(x) + (1−θ)kh(x)k.
Step 3.1:Globalization: Line search
Computetk ∈[0,1]as large as possible such that Φ(yk+tkdk, θk)−Φ(xk, θk)≤ 1
2(1−r)(kh(yk)k − kh(xk)k) and
f(yk+tkdk)<f(yk).
Step 4:Update
Definexk+1=yk+tkdk,k :=k+1and repeat.
The penalty parameter is bounded away from zero (θk ≥θ >¯ 0).
The stepsizetk is accepted after a finite number of attempts (tk≥¯t>0).
The seriesP∞
k=0kh(xk)kis convergent.
The sequence{dk}converges to zero.
All limit points of{xk}satisfy the L-AGP sequential optimality condition. In particular, KKT condition holds under the Constant Positive Generators (CPG) constraint qualification.
Step 2:Optimality phase:
Compute the directiondk as the solution of Minimize 1
2dTHkd+∇f(yk)Td Subject to ∇h(yk)Td =0,
yk+d∈Ω, where{Hk}is uniformly positive definite.
Step 3.1:Globalization: Penalty parameter
Computeθk as the supremum of the values ofθ∈[0, θk−1]such that
Φ(yk,λkθk)−Φ(xk,λk−1, θk)≤ 1
2(1−sk)(kh(yk)k − kh(xk)k), whereΦ(x,λ, θ) =θ(f(x)+λTh(x)) + (1−θ)kh(x)k.
Step 3.1:Globalization: Line search
Computetk ∈[0,1]as large as possible such that Φ(yk+tkdk,λk, θk)−Φ(xk,λk−1, θk)≤ 1
2(1−r)(kh(yk)k − kh(xk)k) and
L(yk+tkdk, λk)<L(yk, λk−1).
The original Fischer-Friedlander algorithm is recovered if:
Hk is the identity matrix, sk =rand,
λk≡0.
In our implementation:
Hk is the Hessian of the Lagrangian functionL(x, λk−1) =f(x) + (λk−1)Th(x).
sk =0(larger penalty parameter).
λkis the Lagrange multiplier obtained from the optimization phase
subproblem.
We use Fletcher’sfilderSD.fand qlcpd.ffor the restoration and optimization phase respectively.
The penalty parameter is bounded away from zero (θk ≥θ >¯ 0).
The stepsizetk is accepted after a finite number of attempts (tk≥¯t>0).
The seriesP∞
k=0kh(xk)kis convergent.
The sequence{dk}converges to zero.
All limit points of{xk}satisfy the L-AGP sequential optimality condition. In particular, KKT condition holds under the Constant Positive Generators (CPG) constraint qualification.
Under MFCQ, a suitable subsequence of{λk}converges to a Lagrange multiplier.
We consider the Mean-Variance problem in portfolio
optimization: Investor aims to maximize return and minimiz risk in investing innassets.
min −c0x and min x0Qx s.t.
n
X
i=1
xi =1, x≥0.
Under certain conditions, Pareto points are solutions of
minx α(−c0x) + (1−α)(x0Qx) s.t.
n
X
i=1
xi=1, x≥0,
for someα≥0.
minα,x kx−xdk2
s.t. α≥0andxis a solution of:
minx α(−c0x) + (1−α)(x0Qx) s.t.
n
X
i=1
xi=1, x≥0.
Classical approach: Reformulate the “lower level Pareto problem” by its KKT condition.
Restoration phase: Given(xk, αk), the restored point(yk, αk)is given by
minx αk(−c0x) + (1−αk)(x0Qx) s.t.
n
X
i=1
xi =1, x≥0.
Optimization phase: We write the KKT conditions from the problem above asH(x, α, λ) =0and we solved the linearized subproblem
dxmin,dα,dλ
kyk+dx−xdk2
s.t. ∇H(yk, αk, λk)0(dx,dα,dλ) =0, a≤(dx,dα,dλ)≤b.
Obtainxk+1=yk+tdx, αk+1=αk+tdα, λk+1 =λk+tdλ
Test 1
In our simulation we used seven shares from the London exchange market plus a risk-free asset to generate problems withn∈ {8,100,1000}. We generate scenarios using the historical data and compute the expected returncand covariance matrixQ.
Problems could be solved under 1.5 seconds and no major difference could be detected between the inexact restoration and the classical approach.
Test 2
We compared the formulations in 300 problems with two objectives and fourth degree polynomials randomly chosen.
With the IR formulation we found a KKT point for the Pareto lower lever problem in all the problems.
Number of problems in which the KKT reformulation did not find a KKT point for the Pareto lower level problem:
n=1 n=10 n=20
13 84 100
Performance profile for120two-objective problems from the Moré-Garbow-Hillstrom unconstrained optimization collection.
Minimize f(x), Subject to g(x)≤0,
x∈X, wheref∗= min
x∈X|g(x)≤0f(x)is assumed to be finite.
Lagrangian:L(x, µ) =f(x) +µTg(x) f∗ =min
x∈X max
µ≥0L(x, µ) =min
x∈X
f(x) ifg(x)≤0 +∞ otherwise The dual problem:
q∗ =max
µ≥0min
x∈XL(x, µ) =max
µ≥0q(µ), whereq(µ) =minx∈XL(x, µ)is the dual function.
Effective domain ofq:Dq ={u|q(µ)>−∞}.
Theorem:The domainDq is convex andqis a concave function overDq. (The dual problem is convex).
Theorem (weak duality):q∗≤f∗
Proof:For allµ≥0andx∈Xwithg(x)≤0:
q(µ) =mint∈XL(t, µ)≤f(x) +µTg(x)≤f(x).
Theorem:Ifq∗=f∗, given a primal-dual solution(x∗, µ∗), it holds:L(x∗, µ∗) =minx∈XL(x, µ∗)and(µ∗)Tg(x∗) =0
Proof:
f(x∗) =q(µ∗) =minx∈XL(x, µ∗)≤f(x∗) + (µ∗)Tg(x∗)≤f(x∗).
f(x) = 12(g(x) +1)2+x22)
q(µ) =min12(x12+x22) +µ(x1−1) =−12µ2−µ(x1=−µ,x2 =0) maxµ≥0q(µ)⇒µ∗=0,(x∗1=0,x∗2 =0)
Note that:∇q(µ) =g(−µ,0)
L(xµ, µ) =minx∈XL(x, µ)
q(¯µ) =minx∈XL(x,µ)¯ ≤f(xµ) + (¯µ)Tg(xµ) =q(µ) + (¯µ−µ)Tg(xµ) q(µ)may be non-differentiable, butg(xµ)acts as the gradient (subgradient) ofq(µ).
Subgradient method:
µk+1= [µk+skg(xµk)]+, where[·]+is the orthogonal projection onM={µ≥0|q(µ)>−∞}.
Theorem:Ifsk is sufficiently small,kµk+1−µ∗k<kµk−µ∗k.
Cutting planes method:
µk+1is the solution of
Maximizeµ∈MQk(µ) = min
i=0,...,kq(µi) + (µ−µi)Tg(xµi))
Theorem:q(µ)≤Qk+1(µ)≤Qk(µ)and every limit point of{µk} is a dual solution.
Example: Separable problem - Resource allocation Minimize
r
X
i=1
fi(xi), Subject to
r
X
i=1
gi(xi)≤0, xi∈Xi
q(µ) =min
x∈X r
X
i=1
fi(xi) +µT(
r
X
i=1
gi(xi))
=
r
X
i=1
xmini∈Xifi(xi) +µTgi(xi)
Example: Dual of a Linear Problem Minimize cTx Subject to Ax≥b L(x, µ) =cTx+µT(b−Ax)
=µTb+ (c−ATµ)Tx q(µ) = min
x∈Rn
L(x, µ) =
bTµ ifATµ=c
−∞ otherwise.
Maximize bTµ Subject to ATµ=c,
µ≥0
1 E. G. Birgin and J. M. Martínez, Practical Augmented Lagrangian Methods for Constrained Optimization, SIAM, Philadelphia, 2014.
2 L. F. Bueno, G. Haeser, J. M. Martínez - A Flexible Inexact Restoration Method for Constrained Optimization. Journal of Optimization Theory and Applications, June 2014.
3 D. Bertsekas. Nonlinear Programming, 1999.