• Nenhum resultado encontrado

Short course: Optimality Conditions and Algorithms in Nonlinear Optimization

N/A
N/A
Protected

Academic year: 2022

Share "Short course: Optimality Conditions and Algorithms in Nonlinear Optimization"

Copied!
50
0
0

Texto

(1)

Short course: Optimality Conditions and Algorithms in Nonlinear Optimization

Part I - Introduction to nonlinear optimization

Gabriel Haeser

Department of Applied Mathematics Institute of Mathematics and Statistics

University of São Paulo São Paulo, SP, Brazil

Santiago de Compostela, Spain, October 28-31, 2014

(2)

Outline

Part I - Introduction to nonlinear optimization Examples and historical notes

First and Second order optimality conditions Penalty methods

Interior point methods Part II - Optimality Conditions

Algorithmic proof of Karush-Kuhn-Tucker conditions Sequential Optimality Conditions

Algorithmic discussion

Part III - Constraint Qualifications Geometric Interpretation

First and Second order constraint qualifications Part IV - Algorithms

Augmented Lagrangian methods Inexact Restoration algorithms Dual methods

www.ime.usp.br/∼ghaeser

(3)
(4)

Optimization

Optimization is a mathematical problem with many “real world”

applications. The goal is to find minimizers or maximizers of a multivariable real function, under a restricted domain.

to draw a map of America with areas proportional to the

real areas

hard-spheres problem: to placempoints on a n-dimensional sphere in such

a way that the smallest distance between two points

is maximized.

www.ime.usp.br/∼ghaeser

(5)

Problem America

To draw a map of America, similar to the usual map, with areas proportional to real areas.

Minimize 12Pm

i=1kpi−¯pik2, Subject to 12Pnj

i=1(pxipyi+1−pxi+1pyi) =βj,j=1, . . . ,c

c=17countries

βjis the real area of countryj m=132given points¯pi on the frontiers of the usual map

Green-Gauss formula to compute areas

(6)

Problem America

United States (without Alaska and Hawaii) = 8.080.464km2 Brazil = 8.514.876km2 Usual map ratio ≈ 1.32

Real ratio ≈ 0.95

Usual map Areas proportional to real areas

www.ime.usp.br/∼ghaeser

(7)

Problem America

Areas proportional to GDP Areas proportional to population

(8)

Kissing and hard-spheres problems

Thekissing numberof dimensionnis the largest number of unit spheres that may be put touching an-dimensional unit sphere without overlapping.

Thehard-spheres problem consists of maximizing the smallest distanced betweenmpoints on then-dimensional sphere of ra- dius2.

n Kissing number

2 6

3 12

4 24

5 40–44

6 72–78

7 126–134

8 240

9 306–364

10 500–554

d≥2⇒kissing number≥m

n=2, n=3,

m=6,d=2 m=12,d ≈2.194

www.ime.usp.br/∼ghaeser

(9)

Applications: Packing

(10)

Applications: Packing

Initial configuration for molecular dynamics

www.ime.usp.br/∼ghaeser

(11)

Large scale problems: Finance

Jacek Gondzio and Andreas Grothey (May 2005):

quadratic convex program with 353 million constraints and 1010 million variables.

Tool: Interior Point Method

(12)

Large scale problems: Localization

Find a pointin the rectangle but not in the ellipsissuch that the sum of the distances to thepolygonsis minimized.

1.567.804polygons.

3.135.608variables.

1.567.804upper level constraints.

12.833.106lower level constraints.

convergence in 10outer iterations, 56inner iterations, 133funct. evaluations, 185seconds.

Tool: Augmented Lagrangian method

www.ime.usp.br/∼ghaeser

(13)

TANGO Project - www.ime.usp.br/∼egbirgin/tango

TrustableAlgorithms forNonlinearGeneralOptimization

(14)

TANGO Project - www.ime.usp.br/∼egbirgin/tango

40.370 visits registered by Google Analytics - Since 2007 (More than 3.000 downloads)

USA: 7.969, Brazil: 7.230, Germany: 2.974

www.ime.usp.br/∼ghaeser

(15)

TANGO Project - www.ime.usp.br/∼egbirgin/tango

Spain: 733

(16)

Historical Notes

Military Programsformulated as asystem of linear inequalitiesgave rise to the termProgramming in a linear structure(title of the first paper by G. Dantzig, 1948).

Koopmans shortened the term toLinear Programming.

Dorfman (in 1949) thought that Linear Programming was too restrictive and suggest the more general term

Mathematical Programming, now calledMathematical Optimization.

Nonlinear Programmingis the title of the 1951 paper by Kuhn and Tucker that deals withOptimality Conditions.

These results are the extension of the Lagrange rule of multipliers (1813) to the case of equality and inequality constraints. These were previously considered on the 1939 unpublished master’s thesis of Karush (KKT conditions).

These works are particularly important because they suggest thedevelopment of algorithms to deal with practical problems.

www.ime.usp.br/∼ghaeser

(17)

Historical Notes

Linear Programmingis part of a revolutionary development that gave humanity the capability toformulate an objective and determine a way of detailed decisions toreach this goal in the best way possible.

Tools: Models, algorithms, computers and softwares.

Theimpossibility to perform large computationsis the main reason, according to Dantzig, to the lack of interest in optimization before 1947.

Important topics in computing: (a) Dealing withsparsityallows for solving larger problems; (b)Global optimization; (c)

Automatic differentiationof a function represented in a programming language.

(18)

Automatic Differentiation

f(x1,x2) =sin(x1) +x1x2

www.ime.usp.br/∼ghaeser

(19)

Duality

Game theory and linear programming:

1948 - G. Dantzig visited John von Neumann in Princeton.

J. von Neumann, 1963. Discussion of a maximum problem.

D. Gale, H. W. Kuhn, A. W. Tucker, 1951. Linear programming and the theory of games.

Elements of duality:

apair of optimization problems, one a maximum problem with objective functionf and the other a minimum problem with objective functionh, based on the same data

forfeasible solutionsto the pair of problems, alwaysh≥f necessary and sufficient conditions foroptimalityareh=f

(20)

Duality

(Fermat XVII century): Given3pointsp1,p2andp3on the plane, find the pointxthat minimizes the sum of the distances fromxto p1,p2andp3.

www.ime.usp.br/∼ghaeser

(21)

Duality

(Thomas Moss,The Ladies Diary, 1755): “In the three sides of an equiangular field stand three trees, at the distances of10,12 and16chains from one another: to find the content of the field, it being the greatest the data will admit.”

(22)

Duality

(J.D. Gergonne (ed), Annales de Mathématiques Pures et Ap- pliquées, 1810-1811): Given any triangle, circumscribe the largest possible equilateral triangle about it.

Solution given in the 1811-1812 edition by Rochat, Vecten, Fauguier and Pilatte where duality was acknowledged.

www.ime.usp.br/∼ghaeser

(23)

The problem (NLP)

Minimize f(x),

Subject to hi(x) =0,i=1, . . . ,m.

gj(x)≤0,j=1, . . . ,p.

f,hi,gj :Rn→Rare (twice) continuously differentiable functions.

Ω ={x∈Rn |h(x) =0,g(x)≤0}(feasible set)

(24)

Solution

Global Solution: A feasible pointx∈Ωis a global minimizer of NLP when

f(x)≤f(x),∀x∈Ω

Local Solution: A feasible pointx ∈Ωis a local minimizer of NLP when there exists a neighbourhoodB(x, ε)ofx such that

f(x)≤f(x),∀x∈Ω∩ B(x, ε)

A(x) ={j∈ {1, . . . ,p} |gj(x) =0}(set of active inequalities at x∈Ω)

www.ime.usp.br/∼ghaeser

(25)

Example

Minimize x2+y2, Subject to x+y−1=0.

(26)

First order optimality condition - Lagrange multipliers

Minimize x2+y2, Subject to x+y−1=0.

x= 12,y= 12, 1

1

+ (−1) 1

1

=0

www.ime.usp.br/∼ghaeser

(27)

Example

Maximize x2+y2,

Subject to x+2y−2≤0, x≥0,

y≥0.

(28)

Minimize −x2−y2, Subject to x+2y−2≤0,

−x≤0,

−y≤0.

x=2,y=0, −4

0

+4 1

2

+8 0

−1

=0 x=0,y=1,

0

−2

+1 1

2

+1 −1

0

=0 x=0.4,y=0.8,

−0.8

−1.6

+0.8 1

2

=0

www.ime.usp.br/∼ghaeser

(29)

First order optimality condition - KKT condition

(Karush-Kuhn-Tucker) Under some condition (constraint qualification), ifxis a local solution, there exist Lagrange multipliersλ∈Rmandµ∈Rp such that:

∇f(x) +

m

X

i=1

λi∇hi(x) +

p

X

j=1

µj∇gj(x) =0, (Lagrange condition)

µjgj(x) =0,j=1, . . . ,p, (complementarity) h(x) =0,g(x)≤0, (feasibility)

µ≥0.(dual feasibility)

Interpretation: up to first order, a feasible direction cannot be a descent direction.

(30)

Second order optimality condition

x= 0.4

0.8

,∇g1(x) = 1

2

,∇2f(x) =

−2 0 0 −2

. There exists somed∈Rn,∇g1(x)Td≤0,dT2f(x)d<0.

Theorem:Under some conditions, ifx is a local minimizer

dT

∇2f(x) +

m

X

i=1

λi2hi(x) +

p

X

j=1

µj2gj(x)

d ≥0,

for everyd∈Rn such that

∇f(x)Td≤0,

∇hi(x)Td=0,i=1. . . ,m

∇gj(x)Td≤0,j∈A(x).

Interpretation: All critical directions must be of ascent nature.

www.ime.usp.br/∼ghaeser

(31)

History of nonlinear programming

Kuhn, Tucker, 1951.

Nonlinear programming.

Albert William Tucker (1905 - 1995)

Princeton University Topology

Harold William Kuhn (1925 - 2014) Princeton University PhD 1950, Algebra

Game Theory, Optimization

Saddle point problem

φ(x,u)≤φ(x,u)≤φ(x,u),∀x,u

(32)

History of nonlinear programming

William Karush (1917-1997)

1939. Minima of Functions of Several Variables with Inequalities as Side Conditions.

M.Sc. thesis, Department of Mathematics, University of Chicago

Calculus of Variations and Optimization

University of Chicago and California State University (also Manhattan Project)

I concluded that you two had exploited and de- veloped the subject so much further than I, that there was no justification for my announcing to the world, “Look what I did, first.”, 1975.

www.ime.usp.br/∼ghaeser

(33)

History of nonlinear programming

Fritz John (1910 - 1994)

1948. Extremum problems with inequalities as subsidiary conditions.

PhD 1933 in Göttingen under Courant New York University

Partial differential equations, convex geometry, nonlinear elasticity

(34)

History of nonlinear programming

Fritz John (1910 - 1994)

LetS be a bounded set inRm. Find the sphere of least positive radius enclosingS.

MinimizeF(x) :=xm+1, Subject to

G(x,y) :=xm+1−Pm

i=1(xi−yi)2≥0for ally∈S.

the boundary of a compact convex setS in Rn lies between two homothetic ellipsoids of ratio

≤n, and the outter ellipsoid can be taken to be the ellipsoid of least volume containingS.

www.ime.usp.br/∼ghaeser

(35)

Snell’s law of diffraction

sinθy

vy

=

sinvθz

z

(36)

Snell’s law of diffraction

sinθy

vy

=

sinvθz

z

Minimize T(x) := kx−yk

vy +kx−zk vz Subject to h(x) =0

At the solutionx,∇T(x) = v x−y

yky−xk +v x−z

zkz−xk is parallel to

∇h(x), the normal vector to the surface.

Define¯y=x+ v y−x

yky−xk and¯z=x+v z−x

zkz−xk.

Hence−∇T(x) = (¯y−x) + (¯z−x)is the diagonal of the following parallelogram:

www.ime.usp.br/∼ghaeser

(37)

Snell’s law of diffraction

sinθy

vy

=

sinvθz

z

By triangular sim- ilarity, ¯y and ¯z are equally away from the normal line.

Hence

k¯y−xksinθy = k¯z−xksinθz. The calculation k¯y −xk = v1

y and

k¯z−xk= v1

z yields Snell’s law.

(38)

External Penalty Method

Choose a sequence{ρk}withρk →+∞and for eachksolve the problem

Minimize f(x) +ρkP(x), obtaining the (global) solutionxk, if it exists.

Pis a smooth function P(x)≥0

P(x) =0⇔h(x) =0,g(x)≤0

For example:P(x) =kh(x)k22+kmax{0,g(x)}k22

www.ime.usp.br/∼ghaeser

(39)

External Penalty Method

Theorem:If{xk}is well defined then every limit point of{xk}is a global solution to MinimizeP(x)

Theorem:If{xk}is well defined and there exists a point where the functionPvanishes (feasible region is not empty), then every limit point of{xk}is a global solution of

Minimizef(x), Subject toh(x) =0,g(x)≤0.

The External Penalty Method can be used as a theoretical tool to prove KKT conditions, but also, it can be adjusted to be an efficient algorithm (augmented lagrangian method).

(40)

External Penalty Method

Minimize x21+x22, Subject to x1−1=0

x2−1≤0.

Minimize x21+x22k((x1−1)2+max{0,x2−1}2)(= Φk(x)).

Solving∇Φk(x) =0we getxk = (1+ρρk

k,0)→(1,0).

Show simulation

www.ime.usp.br/∼ghaeser

(41)

Internal Penalty Method

Choose a sequence{µk}withµk →0+and for eachksolve the problem

Minimize f(x) +µkB(x), Subject to h(x) =0

g(x)<0.

Bis smooth B(x)≥0

B(x)→+∞if somegi(x)→0withg(x)<0.

For example:B(x) =−Pm

i=1log(−gi(x))

(42)

Interior Point Method

Consider the convex quadratic problem Minimize cTx+12xTQx, Subject to Ax=b

x≥0.

and the barrier subproblem

Minimize cTx+12xTQx−µPn

j=1logxj, Subject to Ax=b

x>0.

KKT condition

c−ATλ+Qx−µX−1e=0,Ax=b,

whereX−1=diag{x−11 , . . . ,x−1n }ande= (1, . . . ,1)T. Denoting s=µX−1ewe get

ATλ+s−Qx=c,Ax=b, XSe=µe,(x,s)>0.

www.ime.usp.br/∼ghaeser

(43)

Interior Point Method

Active-set methods ATλ+s−Qx=c,

Ax=b,

XSe=0, (x,s)≥0.

Interior point methods ATλ+s−Qx=c,

Ax=b,

XSe=µe, (x,s)>0.

(44)

Interior Point Method

Complementarity: xisi =0,∀i=1, . . . ,n.

Active-set methodstry to guess the optimal active subset A⊆ {1, . . . ,n}and setxi =0fori∈A(active constraints),si=0 fori6∈A(inactive constraints).

Interior point methodsuseε-mathematics:

Replace xisi =0,∀i=1, . . . ,n by xisi =µ,∀i=1, . . . ,n.

Force convergence by lettingµ→0+.

www.ime.usp.br/∼ghaeser

(45)

Interior Point Method

Solve the nonlinear system of equations f(x, λ,s) =0,

wheref :R2n+m→R2n+mis the mapping:

f(x, λ,s) =

ATλ+s−Qx−c Ax−b

XSe−µe

.

(46)

Interior Point Method

Newton direction:

−Q AT I

A 0 0

S 0 X

.

∆x

∆λ

∆s

=

c−ATλ−s+Qx b−Ax

µe−XSe

.

Reduceµat each Newton iteration.

www.ime.usp.br/∼ghaeser

(47)

Interior Point Method

Algorithm: Step 0: Choose(x0, λ0,s0),(x0,s0)>0, µ0 >0and parameters0< γ <1andε >0. Setk=0.

Step 1: Compute the Newton direction(∆x,∆λ,∆s)at (x, λ,s) := (xk, λk,sk).

Step 2: Choose a stepsizeαsuch that(xk+α∆x,sk+α∆s)>0.

Step 3: Updateµk+1=γµk.

Step 4: Ifxksk ≤εx0s0, stop. Else setk:=k+1and go to Step 1.

(48)

Interior Point Method

Consider the merit function ψ(x,s) = (n+√

n)log(xTs)−

n

X

i=1

(xisi),

(Note thatψ(x,s)→ −∞ ⇒xTs→0.)

Choosing the stepsizeαthat minimizesψ(xk+α∆x,sk+α∆s) (exact line search) we get:

Theorem: Ifγ = n+nn, we havexTksk ≤εxT0s0 inO √

nlog nε iterations.

www.ime.usp.br/∼ghaeser

(49)

Algorithms

There are no “direct method” to solve NLP.

NLP is solved usingiterative methods.

An iterative method generates asequence of points

xk∈Rnthatconverges(or not) to a solution of the problem.

Iterative methods areprogrammedand implemented on computers, where real mathematical operations are replaced by floating point operations.

(50)

Algorithms

Theoryis necessaryto avoid performing an infinite number of experiments.

Usefultheory should be able topredictthe behavior of many experiments.

Usually, the theory does not refer to the real sequences generated by the computer, but theoretical sequences defined by the algorithms.

The analogy betweenreal sequencesandtheoretical sequencesis not perfect.

There are practical phenomena that the theory is not able to predict, but relevant theory is the one that contributs in explaining practical phenomena.

www.ime.usp.br/∼ghaeser

Referências

Documentos relacionados

Para verificarmos a validade dessa conjectura pode-se recorrer `a express˜ao polinomial associada `a func¸˜ao quadr´atica, pois, pelo teorema de caracterizac¸˜ao das

Section 2 introduces the calculation of fractional derivatives for ideal data, the problem of noise and the formulation of the inverse problem, and the optimization scheme based

A perspectiva inclusiva da educação especial, implementada no país, requer a formulação de políticas de formação dos professores, que cientes da importância da inclusão e

Todos os cinco municípios pertencentes à este grupo cumpriram ao critério de publicação de receitas, sendo este o critério mais atendido em relação ao

Estabelecer um elo entre a formação de trabalhadores para o desempenho de atividades na reforma psiquiátrica e a política de educação permanente do Ministério da Saúde foi

Ou seja, novos modelos de negócio que apostam, por um lado, numa maior eletrificação da economia e, por outro, numa transição energética apoiada em energia elétrica com origem

- em entrevista para Gustavo Lassala e Abilio Guerra, pela ‘Vitruvius’ - “{. } o “pixo” questiona de uma forma pacífica, porque existe um conflito simbólico através