Short course: Optimality Conditions and Algorithms in Nonlinear Optimization

(1)

Part IV - Algorithms

Gabriel Haeser

Department of Applied Mathematics Institute of Mathematics and Statistics

University of São Paulo São Paulo, SP, Brazil

Santiago de Compostela, Spain, October 28-31, 2014

(2)

Part I - Introduction to nonlinear optimization Examples and historical notes

First and Second order optimality conditions Penalty methods

Interior point methods Part II - Optimality Conditions

Algorithmic proof of Karush-Kuhn-Tucker conditions Sequential Optimality Conditions

Algorithmic discussion

Part III - Constraint Qualifications Geometric Interpretation

First and Second order constraint qualifications Part IV - Algorithms

Augmented Lagrangian methods Inexact Restoration algorithms Dual methods

(3)

Choose a sequence{ρ_k}withρk →+∞and for eachksolve the subproblem

Minimize f(x) +ρkP(x), Subject to x∈Ω¯

obtaining the global solutionx^k, if it exists.

Penalty function:P(x) =0ifh(x) =0andg(x)≤0;

P(x)>0otherwise.

(4)

Theorem:If{x^k}is well defined then every limit point of{x^k}is a global solution to MinimizeP(x), Subject tox∈Ω.¯

Theorem:If{x^k}is well defined and there exists a point inΩ¯ where the functionPvanishes (feasible region is not empty), then every limit point of{x_k}is a global solution of

Minimizef(x), Subject toh(x) =0,g(x)≤0,x∈Ω.¯

(5)

This method is of little practical relevance for two main reasons:

The parameterρk must really tend to infinity in order to reach a solution, which makes subproblems hard to solve (ill-conditioning).

Finding global solutions, even for the subproblems, is almost impossible. In practice we can only find (approximate) stationary points.

(6)

squared`₂-norm penalty:

P(x) =kh(x)k²₂+kmax{0,g(x)}k²₂

`∞-norm penalty:

P₁(x) =max{0,|h₁(x)|, . . . ,|h_m(x)|,g₁(x), . . . ,gp(x)}

Theorem(exact penalty): The global solution of

Minimizef(x) +ρP1(x),subject tox∈Ω, is the global solution of¯ the original problem for allρ >ρ.¯

(7)

Consider the problem

Minimize f(x), subject to h(x) =0,

such that atx^∗, there exist a Lagrange multiplierλ^∗ fulfilling the second order sufficient condition

d^T(∇f(x^∗) +

m

X

i=1

λ^∗_i∇hi(x^∗))d>0, ∀d 6=0,∇h(x^∗)^Td=0.

Let’s prove thatx^∗is a solution. In particular, let’s consider the equivalent problem

Minimize f(x) +^ρ₂kh(x)k²₂, subject to h(x) =0,

(8)

Consider the “augmented lagrangian function” defined as the usual lagrangian function for the previous problem:

L_ρ(x, λ) =f(x) +h(x)^Tλ+ ρ

2kh(x)k² =f(x) +ρ

2kh(x) +λ ρk²+c, for some constantc.

Let’s prove thatx^∗is a local minimizer toL_ρ(x, λ^∗)for sufficiently largeρ.

(9)

Lemma:LetPandQbe symmetric matrices withQpositive semidefinite andx^TPx>0ifx6=0andx^TQx=0, thenP+ρQis positive definite for sufficiently largeρ.

Proof:

Assumex^T_k(P+kQ)xk ≤0,kxkk=1,xk →x x^T_kPxk+kx^T_kQxk≤0withx^T_kQxk ≥0

Hence, taking limits,x^TQx=0

Thenx^TPx>0. Sincex^T_kPx_k ≤0, we have a contradiction.

(10)

Evaluating derivatives at(x^∗, λ^∗):

L_ρ(x, λ) =f(x) +h(x)^Tλ+^ρ₂kh(x)k²

∇xL_ρ(x^∗, λ^∗) =∇f(x^∗) +

m

X

i=1

λ^∗_i∇hi(x^∗) +ρ

m

X

i=1

hi(x^∗)∇hi(x^∗)

=∇f(x^∗) +Pm

i=1(λ^∗_i +ρhi(x^∗))∇hi(x^∗) =∇xL(x^∗, λ^∗) =0.

∇²_xxL_ρ(x^∗, λ^∗) =

∇²f(x^∗) +

m

X

i=1

(λ^∗_i +ρhi(x^∗))∇²h_i(x^∗) +ρ∇h(x^∗)∇h(x)^T

=∇²L(x^∗, λ^∗) +ρ∇h(x^∗)∇h(x^∗)^T

(11)

Sinced^TL(x^∗, λ^∗)d>0for alld6=0,∇h(x^∗)^Td=0, this is true whend^T∇h(x^∗)∇h(x^∗)^Td=0. Hence, using the Lemma,

∇²_xxL_ρ(x^∗, λ^∗)is positive definite for sufficiently largeρ.

Hencex^∗is a local minimizer of L_ρ(x, λ^∗) =f(x) +h(x)^Tλ^∗+^ρ₂kh(x)k².

The result follows asL_ρ(x, λ^∗) =f(x)for feasiblex.

(12)

This proof suggests minimizing the augmented lagrangian and successively increasing the penalty parameter. The problem is that we don’t know the exact Lagrange multiplierλ^∗. But if we know an approximation toλ^∗, we can get an approximation to x^∗.

Minimize 1

2x²₁+x²₂, subject tox₁=1.

x^∗ = (1,0),(1,0) +λ^∗(1,0) =0, λ^∗ =−1.

L_ρ(x, λ) = 1

2x²₁+x²₂+λ(x₁−1) +ρ

2(x₁−1)²

∇_xL_ρ(x, λ) = (x₁+λ+ρ(x₁−1),x₂) = (0,0) x₁= ρ−λ

ρ+1,x₂=0 Ifλ→λ^∗, thenx→x^∗

(13)

Minimize f(x), subject to g(x)≤0, Definehi(x) :=gi(x) +z²_i =0then

MinimizeL_ρ(x, µ) =f(x) +^ρ₂Pp

j=1(gj(x) +z²_j +^µ_ρ^j)² z^∗_j =0ifg_j(x) +^µ_ρ^j ≥0,

z^∗_j =q

−g_j(x)−^µ_ρ^j otherwise.

Hence we may consider L_ρ(x, µ) =f(x) + ^ρ₂Pp

j=1max{0,g_j(x) +^µ_ρ^j}²

(14)

Given a penalty parameterρk and approximationsλ^k andµ^k to Lagrange multipliers, we findx^k minimizing the augmented lagrangianL_ρ_k(x, λ^k, µ^k) =

f(x) +ρk

2





m

X

i=1

hi(x) +λ^k_i ρk

2

+

p

X

j=1

max (

0,gj(x) +µ^k_j ρk

)2

.

Since∇_xL_ρ_k(x^k, λ^k, µ^k) =0, we get

∇f(x^k)+

m

X

i=1

(λ^k_i+ρkh_i(x^k))∇h_i(x^k)+

p

X

j=1

max{0, µ^k_j+ρkg_j(x^k)}∇g_j(x^k) =0.

This suggests that we should defineλ^k+1_i =λ^k_i +ρkh_i(x^k)and µ^k+1_j =max{0, µ^k_j +ρkgj(x^k)} ≥0.

Note that ifgj(x^∗)<0and{µ^k_j}is bounded then forρk

sufficiently large,µ^k+1_j =0and complementarity is fulfilled.

(15)

Since we are using Lagrange multipliers approximations, if a

“good” iteration is performed, we can avoid increasing the penalty parameter and only update Lagrange multipliers.

A “good” iteration should sufficiently decrease infeasibility and a complementarity measure.

ConsiderV_j^k =max{g_j(x^k),−^µ

k j

ρk}and note that V_j^k=0

if and only if

gj(x^k)≤0andµ^k_j =0whenevergj(x^k)<0

(16)

Initializex⁰, λ¹, µ¹≥0, ρ₁>0and parametersσ ∈(0,1)and γ >1. Fork=1,2, . . .

1 Findx^k, a solution to MinimizeL_ρ_k(x, λ^k, µ^k).

2 ComputeV_j^k =max{gj(x^k),−^µ

k j

ρk},j=1, . . . ,p.

3 Ifmax{kh(x^k)k,kV(x^k)k} ≤σmax{kh(x^k−1)k,kV(x^k−1)k}

Defineρ_k+1=ρ_k

Otherwise, defineρ_k+1=γρ_k

4 Defineλ^k+1_i =P_[λ_min_,λ_max_](λ^k_i +ρkhi(x^k))and

µ^k+1_j =P_[0,µ_max_](µ^k_j +ρkg_j(x^k)),i=1, . . . ,m,j=1, . . . ,p.

(17)

This framework allows to replace

1 Findx^k, a solution to MinimizeL_ρ_k(x, λ^k, µ^k).

by

1 Findx^k, an approximate stationary point of Minimize L_ρ_k(x, λ^k, µ^k), that is,k∇_xL_ρ_k(x^k, λ^k, µ^k)k ≤εk

withεk →0⁺.

Theorem:The limit points of a sequence generated by the algorithm are stationary points of the problem of minimizing the infeasibility measurekh(x)k²₂+kmax{0,g(x)}k²₂. If a limit point is feasible, it is an Approximate-KKT point (this means that under a weak constraint qualification, the KKT condition holds). Also, under Mangasarian-Fromovitz constraint qualification,{λ^k}and {µ^k}converges to true Lagrange multipliers.

(18)

Remark 1: Requiring second-order approximate stationarity in the subproblems, we get convergence (under a weak constraint qualification) to a second-order stationary point.

Remark 2: Under linear independence constraint qualification and second-order sufficient condition, the penalty parameter remains bounded.

(19)

Minimize f(x) Subject to h(x) =0,

x∈Ω,

f :Rⁿ→R,h:Rⁿ→R^msmooth functions.Ωis a bounded polytope.

(20)

Restoration phase: Given a pointx^k, obtainy^kcloser to the feasible region. This should be done in such a way that the objective function aty^k is not radically worst then atx^k. Optimization phase: Obtain a trial pointz^k with better objective function, without losing much of the feasibility of y^k.

Globalization: If the trial point is not good enough, obtain a new one closer toy^k.

(21)

(22)

(23)

(24)

(25)

(26)

(27)

Dealing with feasibility and optimality at the same time may have some drawbacks:

The subproblems may be infeasible.

The optimization model for the subproblem is based in a point away from the feasible set.

One of the goals may be much easier than the other.

One (or both) of the goals may have a structure that can be exploited.

(28)

One (or both) of the goals may have a structure that can be exploited.

Electronic structure calculation (Martínez, et al.) Optimal control (Kaya, et al.)

Derivative free optimization (Bueno, et al.) Bilevel optimization (Andreani, et al.) Multiobjective problems

(29)

Parameters:r ∈[0,1), β >0, τ >0, γ >0, γ >0

Step 0:Initialization: Choosex⁰∈Ωandθ−1 ∈(0,1). Define k=0.

Step 1:Restoration phase: Computey^k ∈Ωsuch that kh(y^k)k ≤ rkh(x^k)k

f(y^k)−f(x^k) ≤ βkh(x^k)k Step 2:Optimality phase:

Compute the directiond_k =P_k(−∇f(y^k)), whereP_k is de euclidean projection onto

Tk={d∈Rⁿ |y^k+d∈Ω,∇h(y^k)^Td=0}.

(30)

Step 3.1:Globalization: Penalty parameter

Computeθk as the supremum of the values ofθ∈[0, θk−1]such that

Φ(y^k, θk)−Φ(x^k, θk)≤ 1

2(1−r)(kh(y^k)k − kh(x^k)k), whereΦ(x, θ) =θf(x) + (1−θ)kh(x)k.

Step 3.1:Globalization: Line search

Computet_k ∈[0,1]as large as possible such that Φ(y^k+tkdk, θk)−Φ(x^k, θk)≤ 1

2(1−r)(kh(y^k)k − kh(x^k)k) and

f(y^k+tkd^k)<f(y^k).

Step 4:Update

Definex^k+1=y^k+t_kd_k,k :=k+1and repeat.

(31)

The penalty parameter is bounded away from zero (θk ≥θ >¯ 0).

The stepsizetk is accepted after a finite number of attempts (tk≥¯t>0).

The seriesP∞

k=0kh(x^k)kis convergent.

The sequence{d_k}converges to zero.

All limit points of{x^k}satisfy the L-AGP sequential optimality condition. In particular, KKT condition holds under the Constant Positive Generators (CPG) constraint qualification.

(32)

Step 2:Optimality phase:

Compute the directiond_k as the solution of Minimize 1

2d^THkd+∇f(y^k)^Td Subject to ∇h(y^k)^Td =0,

y^k+d∈Ω, where{H_k}is uniformly positive definite.

(33)

Step 3.1:Globalization: Penalty parameter

Computeθk as the supremum of the values ofθ∈[0, θk−1]such that

Φ(y^k,λ^kθk)−Φ(x^k,λ^k−1, θk)≤ 1

2(1−s_k)(kh(y^k)k − kh(x^k)k), whereΦ(x,λ, θ) =θ(f(x)+λ^Th(x)) + (1−θ)kh(x)k.

Step 3.1:Globalization: Line search

Computetk ∈[0,1]as large as possible such that Φ(y^k+tkdk,λ^k, θk)−Φ(x^k,λ^k−1, θk)≤ 1

2(1−r)(kh(y^k)k − kh(x^k)k) and

L(y^k+tkd^k, λ^k)<L(y^k, λ^k−1).

(34)

The original Fischer-Friedlander algorithm is recovered if:

Hk is the identity matrix, sk =rand,

λ^k≡0.

In our implementation:

H_k is the Hessian of the Lagrangian functionL(x, λ^k−1) =f(x) + (λ^k−1)^Th(x).

s_k =0(larger penalty parameter).

λ^kis the Lagrange multiplier obtained from the optimization phase

subproblem.

We use Fletcher’sfilderSD.fand qlcpd.ffor the restoration and optimization phase respectively.

(35)

The penalty parameter is bounded away from zero (θk ≥θ >¯ 0).

The stepsizet_k is accepted after a finite number of attempts (tk≥¯t>0).

The seriesP∞

k=0kh(x^k)kis convergent.

The sequence{d_k}converges to zero.

All limit points of{x^k}satisfy the L-AGP sequential optimality condition. In particular, KKT condition holds under the Constant Positive Generators (CPG) constraint qualification.

Under MFCQ, a suitable subsequence of{λ^k}converges to a Lagrange multiplier.

(36)

We consider the Mean-Variance problem in portfolio

optimization: Investor aims to maximize return and minimiz risk in investing innassets.

min −c⁰x and min x⁰Qx s.t.

n

X

i=1

xi =1, x≥0.

(37)

(38)

Under certain conditions, Pareto points are solutions of

minx α(−c⁰x) + (1−α)(x⁰Qx) s.t.

n

X

i=1

x_i=1, x≥0,

for someα≥0.

(39)

minα,x kx−xdk²

s.t. α≥0andxis a solution of:

minx α(−c⁰x) + (1−α)(x⁰Qx) s.t.

n

X

i=1

xi=1, x≥0.

Classical approach: Reformulate the “lower level Pareto problem” by its KKT condition.

(40)

Restoration phase: Given(x^k, α^k), the restored point(y^k, α^k)is given by

minx α^k(−c⁰x) + (1−α^k)(x⁰Qx) s.t.

n

X

i=1

xi =1, x≥0.

Optimization phase: We write the KKT conditions from the problem above asH(x, α, λ) =0and we solved the linearized subproblem

d_xmin,dα,dλ

ky^k+d_x−x_dk²

s.t. ∇H(y^k, α^k, λ^k)⁰(dx,d_α,d_λ) =0, a≤(dx,d_α,d_λ)≤b.

Obtainx^k+1=y^k+tdx, α^k+1=α^k+td_α, λ^k+1 =λ^k+td_λ

(41)

Test 1

In our simulation we used seven shares from the London exchange market plus a risk-free asset to generate problems withn∈ {8,100,1000}. We generate scenarios using the historical data and compute the expected returncand covariance matrixQ.

Problems could be solved under 1.5 seconds and no major difference could be detected between the inexact restoration and the classical approach.

(42)

Test 2

We compared the formulations in 300 problems with two objectives and fourth degree polynomials randomly chosen.

With the IR formulation we found a KKT point for the Pareto lower lever problem in all the problems.

Number of problems in which the KKT reformulation did not find a KKT point for the Pareto lower level problem:

n=1 n=10 n=20

13 84 100

(43)

Performance profile for120two-objective problems from the Moré-Garbow-Hillstrom unconstrained optimization collection.

(44)

Minimize f(x), Subject to g(x)≤0,

x∈X, wheref^∗= min

x∈X|g(x)≤0f(x)is assumed to be finite.

(45)

Lagrangian:L(x, µ) =f(x) +µ^Tg(x) f^∗ =min

x∈X max

µ≥0L(x, µ) =min

x∈X

f(x) ifg(x)≤0 +∞ otherwise The dual problem:

q^∗ =max

µ≥0min

x∈XL(x, µ) =max

µ≥0q(µ), whereq(µ) =minx∈XL(x, µ)is the dual function.

(46)

(47)

Effective domain ofq:D_q ={u|q(µ)>−∞}.

Theorem:The domainD_q is convex andqis a concave function overDq. (The dual problem is convex).

Theorem (weak duality):q^∗≤f^∗

Proof:For allµ≥0andx∈Xwithg(x)≤0:

q(µ) =mint∈XL(t, µ)≤f(x) +µ^Tg(x)≤f(x).

Theorem:Ifq^∗=f^∗, given a primal-dual solution(x^∗, µ^∗), it holds:L(x^∗, µ^∗) =minx∈XL(x, µ^∗)and(µ^∗)^Tg(x^∗) =0

Proof:

f(x^∗) =q(µ^∗) =minx∈XL(x, µ^∗)≤f(x^∗) + (µ^∗)^Tg(x^∗)≤f(x^∗).

(48)

f(x) = ¹₂(g(x) +1)²+x²₂)

q(µ) =min¹₂(x₁²+x²₂) +µ(x₁−1) =−¹₂µ²−µ(x₁=−µ,x₂ =0) maxµ≥0q(µ)⇒µ^∗=0,(x^∗₁=0,x^∗₂ =0)

Note that:∇q(µ) =g(−µ,0)

(49)

L(xµ, µ) =minx∈XL(x, µ)

q(¯µ) =minx∈XL(x,µ)¯ ≤f(xµ) + (¯µ)^Tg(xµ) =q(µ) + (¯µ−µ)^Tg(xµ) q(µ)may be non-differentiable, butg(xµ)acts as the gradient (subgradient) ofq(µ).

(50)

Subgradient method:

µk+1= [µ^k+s_kg(xµ_k)]⁺, where[·]⁺is the orthogonal projection onM={µ≥0|q(µ)>−∞}.

Theorem:Ifs_k is sufficiently small,kµ_k+1−µ^∗k<kµ_k−µ^∗k.

(51)

Cutting planes method:

µ_k+1is the solution of

Maximizeµ∈MQ^k(µ) = min

i=0,...,kq(µⁱ) + (µ−µⁱ)^Tg(x_µⁱ))

Theorem:q(µ)≤Q^k+1(µ)≤Q^k(µ)and every limit point of{µ^k} is a dual solution.

(52)

Example: Separable problem - Resource allocation Minimize

r

X

i=1

fi(xi), Subject to

r

X

i=1

gi(xi)≤0, xi∈Xi

q(µ) =min

x∈X r

X

i=1

fi(xi) +µ^T(

r

X

i=1

gi(xi))

=

r

X

i=1

xmin_i∈X_ifi(xi) +µ^Tgi(xi)

(53)

Example: Dual of a Linear Problem Minimize c^Tx Subject to Ax≥b L(x, µ) =c^Tx+µ^T(b−Ax)

=µ^Tb+ (c−A^Tµ)^Tx q(µ) = min

x∈Rⁿ

L(x, µ) =

b^Tµ ifA^Tµ=c

−∞ otherwise.

Maximize b^Tµ Subject to A^Tµ=c,

µ≥0

(54)

1 E. G. Birgin and J. M. Martínez, Practical Augmented Lagrangian Methods for Constrained Optimization, SIAM, Philadelphia, 2014.

2 L. F. Bueno, G. Haeser, J. M. Martínez - A Flexible Inexact Restoration Method for Constrained Optimization. Journal of Optimization Theory and Applications, June 2014.

3 D. Bertsekas. Nonlinear Programming, 1999.