• Nenhum resultado encontrado

Short course: Optimality Conditions and Algorithms in Nonlinear Optimization

N/A
N/A
Protected

Academic year: 2022

Share "Short course: Optimality Conditions and Algorithms in Nonlinear Optimization"

Copied!
54
0
0

Texto

(1)

Part IV - Algorithms

Gabriel Haeser

Department of Applied Mathematics Institute of Mathematics and Statistics

University of São Paulo São Paulo, SP, Brazil

Santiago de Compostela, Spain, October 28-31, 2014

(2)

Part I - Introduction to nonlinear optimization Examples and historical notes

First and Second order optimality conditions Penalty methods

Interior point methods Part II - Optimality Conditions

Algorithmic proof of Karush-Kuhn-Tucker conditions Sequential Optimality Conditions

Algorithmic discussion

Part III - Constraint Qualifications Geometric Interpretation

First and Second order constraint qualifications Part IV - Algorithms

Augmented Lagrangian methods Inexact Restoration algorithms Dual methods

(3)

Choose a sequence{ρk}withρk →+∞and for eachksolve the subproblem

Minimize f(x) +ρkP(x), Subject to x∈Ω¯

obtaining the global solutionxk, if it exists.

Penalty function:P(x) =0ifh(x) =0andg(x)≤0;

P(x)>0otherwise.

(4)

Theorem:If{xk}is well defined then every limit point of{xk}is a global solution to MinimizeP(x), Subject tox∈Ω.¯

Theorem:If{xk}is well defined and there exists a point inΩ¯ where the functionPvanishes (feasible region is not empty), then every limit point of{xk}is a global solution of

Minimizef(x), Subject toh(x) =0,g(x)≤0,x∈Ω.¯

(5)

This method is of little practical relevance for two main reasons:

The parameterρk must really tend to infinity in order to reach a solution, which makes subproblems hard to solve (ill-conditioning).

Finding global solutions, even for the subproblems, is almost impossible. In practice we can only find (approximate) stationary points.

(6)

squared`2-norm penalty:

P(x) =kh(x)k22+kmax{0,g(x)}k22

`-norm penalty:

P1(x) =max{0,|h1(x)|, . . . ,|hm(x)|,g1(x), . . . ,gp(x)}

Theorem(exact penalty): The global solution of

Minimizef(x) +ρP1(x),subject tox∈Ω, is the global solution of¯ the original problem for allρ >ρ.¯

(7)

Consider the problem

Minimize f(x), subject to h(x) =0,

such that atx, there exist a Lagrange multiplierλ fulfilling the second order sufficient condition

dT(∇f(x) +

m

X

i=1

λi∇hi(x))d>0, ∀d 6=0,∇h(x)Td=0.

Let’s prove thatxis a solution. In particular, let’s consider the equivalent problem

Minimize f(x) +ρ2kh(x)k22, subject to h(x) =0,

(8)

Consider the “augmented lagrangian function” defined as the usual lagrangian function for the previous problem:

Lρ(x, λ) =f(x) +h(x)Tλ+ ρ

2kh(x)k2 =f(x) +ρ

2kh(x) +λ ρk2+c, for some constantc.

Let’s prove thatxis a local minimizer toLρ(x, λ)for sufficiently largeρ.

(9)

Lemma:LetPandQbe symmetric matrices withQpositive semidefinite andxTPx>0ifx6=0andxTQx=0, thenP+ρQis positive definite for sufficiently largeρ.

Proof:

AssumexTk(P+kQ)xk ≤0,kxkk=1,xk →x xTkPxk+kxTkQxk≤0withxTkQxk ≥0

Hence, taking limits,xTQx=0

ThenxTPx>0. SincexTkPxk ≤0, we have a contradiction.

(10)

Evaluating derivatives at(x, λ):

Lρ(x, λ) =f(x) +h(x)Tλ+ρ2kh(x)k2

xLρ(x, λ) =∇f(x) +

m

X

i=1

λi∇hi(x) +ρ

m

X

i=1

hi(x)∇hi(x)

=∇f(x) +Pm

i=1i +ρhi(x))∇hi(x) =∇xL(x, λ) =0.

2xxLρ(x, λ) =

2f(x) +

m

X

i=1

i +ρhi(x))∇2hi(x) +ρ∇h(x)∇h(x)T

=∇2L(x, λ) +ρ∇h(x)∇h(x)T

(11)

SincedTL(x, λ)d>0for alld6=0,∇h(x)Td=0, this is true whendT∇h(x)∇h(x)Td=0. Hence, using the Lemma,

2xxLρ(x, λ)is positive definite for sufficiently largeρ.

Hencexis a local minimizer of Lρ(x, λ) =f(x) +h(x)Tλ+ρ2kh(x)k2.

The result follows asLρ(x, λ) =f(x)for feasiblex.

(12)

This proof suggests minimizing the augmented lagrangian and successively increasing the penalty parameter. The problem is that we don’t know the exact Lagrange multiplierλ. But if we know an approximation toλ, we can get an approximation to x.

Minimize 1

2x21+x22, subject tox1=1.

x = (1,0),(1,0) +λ(1,0) =0, λ =−1.

Lρ(x, λ) = 1

2x21+x22+λ(x1−1) +ρ

2(x1−1)2

xLρ(x, λ) = (x1+λ+ρ(x1−1),x2) = (0,0) x1= ρ−λ

ρ+1,x2=0 Ifλ→λ, thenx→x

(13)

Minimize f(x), subject to g(x)≤0, Definehi(x) :=gi(x) +z2i =0then

MinimizeLρ(x, µ) =f(x) +ρ2Pp

j=1(gj(x) +z2j +µρj)2 zj =0ifgj(x) +µρj ≥0,

zj =q

−gj(x)−µρj otherwise.

Hence we may consider Lρ(x, µ) =f(x) + ρ2Pp

j=1max{0,gj(x) +µρj}2

(14)

Given a penalty parameterρk and approximationsλk andµk to Lagrange multipliers, we findxk minimizing the augmented lagrangianLρk(x, λk, µk) =

f(x) +ρk

2

m

X

i=1

hi(x) +λki ρk

2

+

p

X

j=1

max (

0,gj(x) +µkj ρk

)2

.

Since∇xLρk(xk, λk, µk) =0, we get

∇f(xk)+

m

X

i=1

kikhi(xk))∇hi(xk)+

p

X

j=1

max{0, µkjkgj(xk)}∇gj(xk) =0.

This suggests that we should defineλk+1ikikhi(xk)and µk+1j =max{0, µkjkgj(xk)} ≥0.

Note that ifgj(x)<0and{µkj}is bounded then forρk

sufficiently large,µk+1j =0and complementarity is fulfilled.

(15)

Since we are using Lagrange multipliers approximations, if a

“good” iteration is performed, we can avoid increasing the penalty parameter and only update Lagrange multipliers.

A “good” iteration should sufficiently decrease infeasibility and a complementarity measure.

ConsiderVjk =max{gj(xk),−µ

k j

ρk}and note that Vjk=0

if and only if

gj(xk)≤0andµkj =0whenevergj(xk)<0

(16)

Initializex0, λ1, µ1≥0, ρ1>0and parametersσ ∈(0,1)and γ >1. Fork=1,2, . . .

1 Findxk, a solution to MinimizeLρk(x, λk, µk).

2 ComputeVjk =max{gj(xk),−µ

k j

ρk},j=1, . . . ,p.

3 Ifmax{kh(xk)k,kV(xk)k} ≤σmax{kh(xk−1)k,kV(xk−1)k}

Defineρk+1=ρk

Otherwise, defineρk+1=γρk

4 Defineλk+1i =Pminmax]kikhi(xk))and

µk+1j =P[0,µmax]kjkgj(xk)),i=1, . . . ,m,j=1, . . . ,p.

(17)

This framework allows to replace

1 Findxk, a solution to MinimizeLρk(x, λk, µk).

by

1 Findxk, an approximate stationary point of Minimize Lρk(x, λk, µk), that is,k∇xLρk(xk, λk, µk)k ≤εk

withεk →0+.

Theorem:The limit points of a sequence generated by the algorithm are stationary points of the problem of minimizing the infeasibility measurekh(x)k22+kmax{0,g(x)}k22. If a limit point is feasible, it is an Approximate-KKT point (this means that under a weak constraint qualification, the KKT condition holds). Also, under Mangasarian-Fromovitz constraint qualification,{λk}and {µk}converges to true Lagrange multipliers.

(18)

Remark 1: Requiring second-order approximate stationarity in the subproblems, we get convergence (under a weak constraint qualification) to a second-order stationary point.

Remark 2: Under linear independence constraint qualification and second-order sufficient condition, the penalty parameter remains bounded.

(19)

Minimize f(x) Subject to h(x) =0,

x∈Ω,

f :Rn→R,h:Rn→Rmsmooth functions.Ωis a bounded polytope.

(20)

Restoration phase: Given a pointxk, obtainykcloser to the feasible region. This should be done in such a way that the objective function atyk is not radically worst then atxk. Optimization phase: Obtain a trial pointzk with better objective function, without losing much of the feasibility of yk.

Globalization: If the trial point is not good enough, obtain a new one closer toyk.

(21)
(22)
(23)
(24)
(25)
(26)
(27)

Dealing with feasibility and optimality at the same time may have some drawbacks:

The subproblems may be infeasible.

The optimization model for the subproblem is based in a point away from the feasible set.

One of the goals may be much easier than the other.

One (or both) of the goals may have a structure that can be exploited.

(28)

One (or both) of the goals may have a structure that can be exploited.

Electronic structure calculation (Martínez, et al.) Optimal control (Kaya, et al.)

Derivative free optimization (Bueno, et al.) Bilevel optimization (Andreani, et al.) Multiobjective problems

(29)

Parameters:r ∈[0,1), β >0, τ >0, γ >0, γ >0

Step 0:Initialization: Choosex0∈Ωandθ−1 ∈(0,1). Define k=0.

Step 1:Restoration phase: Computeyk ∈Ωsuch that kh(yk)k ≤ rkh(xk)k

f(yk)−f(xk) ≤ βkh(xk)k Step 2:Optimality phase:

Compute the directiondk =Pk(−∇f(yk)), wherePk is de euclidean projection onto

Tk={d∈Rn |yk+d∈Ω,∇h(yk)Td=0}.

(30)

Step 3.1:Globalization: Penalty parameter

Computeθk as the supremum of the values ofθ∈[0, θk−1]such that

Φ(yk, θk)−Φ(xk, θk)≤ 1

2(1−r)(kh(yk)k − kh(xk)k), whereΦ(x, θ) =θf(x) + (1−θ)kh(x)k.

Step 3.1:Globalization: Line search

Computetk ∈[0,1]as large as possible such that Φ(yk+tkdk, θk)−Φ(xk, θk)≤ 1

2(1−r)(kh(yk)k − kh(xk)k) and

f(yk+tkdk)<f(yk).

Step 4:Update

Definexk+1=yk+tkdk,k :=k+1and repeat.

(31)

The penalty parameter is bounded away from zero (θk ≥θ >¯ 0).

The stepsizetk is accepted after a finite number of attempts (tk≥¯t>0).

The seriesP

k=0kh(xk)kis convergent.

The sequence{dk}converges to zero.

All limit points of{xk}satisfy the L-AGP sequential optimality condition. In particular, KKT condition holds under the Constant Positive Generators (CPG) constraint qualification.

(32)

Step 2:Optimality phase:

Compute the directiondk as the solution of Minimize 1

2dTHkd+∇f(yk)Td Subject to ∇h(yk)Td =0,

yk+d∈Ω, where{Hk}is uniformly positive definite.

(33)

Step 3.1:Globalization: Penalty parameter

Computeθk as the supremum of the values ofθ∈[0, θk−1]such that

Φ(ykkθk)−Φ(xkk−1, θk)≤ 1

2(1−sk)(kh(yk)k − kh(xk)k), whereΦ(x,λ, θ) =θ(f(x)+λTh(x)) + (1−θ)kh(x)k.

Step 3.1:Globalization: Line search

Computetk ∈[0,1]as large as possible such that Φ(yk+tkdkk, θk)−Φ(xkk−1, θk)≤ 1

2(1−r)(kh(yk)k − kh(xk)k) and

L(yk+tkdk, λk)<L(yk, λk−1).

(34)

The original Fischer-Friedlander algorithm is recovered if:

Hk is the identity matrix, sk =rand,

λk≡0.

In our implementation:

Hk is the Hessian of the Lagrangian functionL(x, λk−1) =f(x) + (λk−1)Th(x).

sk =0(larger penalty parameter).

λkis the Lagrange multiplier obtained from the optimization phase

subproblem.

We use Fletcher’sfilderSD.fand qlcpd.ffor the restoration and optimization phase respectively.

(35)

The penalty parameter is bounded away from zero (θk ≥θ >¯ 0).

The stepsizetk is accepted after a finite number of attempts (tk≥¯t>0).

The seriesP

k=0kh(xk)kis convergent.

The sequence{dk}converges to zero.

All limit points of{xk}satisfy the L-AGP sequential optimality condition. In particular, KKT condition holds under the Constant Positive Generators (CPG) constraint qualification.

Under MFCQ, a suitable subsequence of{λk}converges to a Lagrange multiplier.

(36)

We consider the Mean-Variance problem in portfolio

optimization: Investor aims to maximize return and minimiz risk in investing innassets.

min −c0x and min x0Qx s.t.

n

X

i=1

xi =1, x≥0.

(37)
(38)

Under certain conditions, Pareto points are solutions of

minx α(−c0x) + (1−α)(x0Qx) s.t.

n

X

i=1

xi=1, x≥0,

for someα≥0.

(39)

minα,x kx−xdk2

s.t. α≥0andxis a solution of:

minx α(−c0x) + (1−α)(x0Qx) s.t.

n

X

i=1

xi=1, x≥0.

Classical approach: Reformulate the “lower level Pareto problem” by its KKT condition.

(40)

Restoration phase: Given(xk, αk), the restored point(yk, αk)is given by

minx αk(−c0x) + (1−αk)(x0Qx) s.t.

n

X

i=1

xi =1, x≥0.

Optimization phase: We write the KKT conditions from the problem above asH(x, α, λ) =0and we solved the linearized subproblem

dxmin,dα,dλ

kyk+dx−xdk2

s.t. ∇H(yk, αk, λk)0(dx,dα,dλ) =0, a≤(dx,dα,dλ)≤b.

Obtainxk+1=yk+tdx, αk+1k+tdα, λk+1k+tdλ

(41)

Test 1

In our simulation we used seven shares from the London exchange market plus a risk-free asset to generate problems withn∈ {8,100,1000}. We generate scenarios using the historical data and compute the expected returncand covariance matrixQ.

Problems could be solved under 1.5 seconds and no major difference could be detected between the inexact restoration and the classical approach.

(42)

Test 2

We compared the formulations in 300 problems with two objectives and fourth degree polynomials randomly chosen.

With the IR formulation we found a KKT point for the Pareto lower lever problem in all the problems.

Number of problems in which the KKT reformulation did not find a KKT point for the Pareto lower level problem:

n=1 n=10 n=20

13 84 100

(43)

Performance profile for120two-objective problems from the Moré-Garbow-Hillstrom unconstrained optimization collection.

(44)

Minimize f(x), Subject to g(x)≤0,

x∈X, wheref= min

x∈X|g(x)≤0f(x)is assumed to be finite.

(45)

Lagrangian:L(x, µ) =f(x) +µTg(x) f =min

x∈X max

µ≥0L(x, µ) =min

x∈X

f(x) ifg(x)≤0 +∞ otherwise The dual problem:

q =max

µ≥0min

x∈XL(x, µ) =max

µ≥0q(µ), whereq(µ) =minx∈XL(x, µ)is the dual function.

(46)
(47)

Effective domain ofq:Dq ={u|q(µ)>−∞}.

Theorem:The domainDq is convex andqis a concave function overDq. (The dual problem is convex).

Theorem (weak duality):q≤f

Proof:For allµ≥0andx∈Xwithg(x)≤0:

q(µ) =mint∈XL(t, µ)≤f(x) +µTg(x)≤f(x).

Theorem:Ifq=f, given a primal-dual solution(x, µ), it holds:L(x, µ) =minx∈XL(x, µ)and(µ)Tg(x) =0

Proof:

f(x) =q(µ) =minx∈XL(x, µ)≤f(x) + (µ)Tg(x)≤f(x).

(48)

f(x) = 12(g(x) +1)2+x22)

q(µ) =min12(x12+x22) +µ(x1−1) =−12µ2−µ(x1=−µ,x2 =0) maxµ≥0q(µ)⇒µ=0,(x1=0,x2 =0)

Note that:∇q(µ) =g(−µ,0)

(49)

L(xµ, µ) =minx∈XL(x, µ)

q(¯µ) =minx∈XL(x,µ)¯ ≤f(xµ) + (¯µ)Tg(xµ) =q(µ) + (¯µ−µ)Tg(xµ) q(µ)may be non-differentiable, butg(xµ)acts as the gradient (subgradient) ofq(µ).

(50)

Subgradient method:

µk+1= [µk+skg(xµk)]+, where[·]+is the orthogonal projection onM={µ≥0|q(µ)>−∞}.

Theorem:Ifsk is sufficiently small,kµk+1−µk<kµk−µk.

(51)

Cutting planes method:

µk+1is the solution of

Maximizeµ∈MQk(µ) = min

i=0,...,kq(µi) + (µ−µi)Tg(xµi))

Theorem:q(µ)≤Qk+1(µ)≤Qk(µ)and every limit point of{µk} is a dual solution.

(52)

Example: Separable problem - Resource allocation Minimize

r

X

i=1

fi(xi), Subject to

r

X

i=1

gi(xi)≤0, xi∈Xi

q(µ) =min

x∈X r

X

i=1

fi(xi) +µT(

r

X

i=1

gi(xi))

=

r

X

i=1

xmini∈Xifi(xi) +µTgi(xi)

(53)

Example: Dual of a Linear Problem Minimize cTx Subject to Ax≥b L(x, µ) =cTx+µT(b−Ax)

Tb+ (c−ATµ)Tx q(µ) = min

x∈Rn

L(x, µ) =

bTµ ifATµ=c

−∞ otherwise.

Maximize bTµ Subject to ATµ=c,

µ≥0

(54)

1 E. G. Birgin and J. M. Martínez, Practical Augmented Lagrangian Methods for Constrained Optimization, SIAM, Philadelphia, 2014.

2 L. F. Bueno, G. Haeser, J. M. Martínez - A Flexible Inexact Restoration Method for Constrained Optimization. Journal of Optimization Theory and Applications, June 2014.

3 D. Bertsekas. Nonlinear Programming, 1999.

Referências

Documentos relacionados

Using the multiobjective normal cone, similarly to Theorem 4.1, we obtain the weakest CQ needed to guarantee that a local weak Pareto minimizer is a strong KKT point..

Diante disso, diversas metodologias podem ser utilizadas para auxiliar na construção dos intervalos de confiança para o ponto crítico, como: metodologia bootstrap, que consiste

 Conforto do paciente: As bases graxas são mais confortáveis para os pacientes que as bases PEG, por serem menos irritantes... Supositórios – Processos

We finish this section proving that, under Assumptions 2.2 and 3.1, if Algorithm 2.1 fails to find approximate feasible points, then every limit point satisfies the AKKT

This new constraint qualification is strictly weaker than the typical condition associated to the convergence of second-order algorithms, namely the joint condition MFCQ and

Silveira, Global convergence of algorithms under constant rank conditions for nonlinear second-order cone programming, To appear in Journal of Optimization Theory and

quadro, a Agenda 2030 é uma oportunidade para o setor privado avaliar e melhorar o seu impacto no desenvolvimento, podendo constituir-se como uma agenda de educação das empresas

There are traditional debugging techniques, where a software developer tries to localize the fault by executing each test at a time, and analyzing the system behavior during