Short course: Optimality Conditions and Algorithms in Nonlinear Optimization

(1)

Short course: Optimality Conditions and Algorithms in Nonlinear Optimization

Part III - Constraint Qualifications

Gabriel Haeser

Department of Applied Mathematics Institute of Mathematics and Statistics

University of São Paulo São Paulo, SP, Brazil

Santiago de Compostela, Spain, October 28-31, 2014

www.ime.usp.br/∼ghaeser Gabriel Haeser

(2)

Outline

Part I - Introduction to nonlinear optimization Examples and historical notes

First and Second order optimality conditions Penalty methods

Interior point methods Part II - Optimality Conditions

Algorithmic proof of Karush-Kuhn-Tucker conditions Sequential Optimality Conditions

Algorithmic discussion

Part III - Constraint Qualifications Geometric Interpretation

First and Second order constraint qualifications Part IV - Algorithms

Augmented Lagrangian methods Inexact Restoration algorithms Dual methods

(3)

The general optimization problem

Minimize f(x)

Subject to hi(x) =0, i∈ I ={1, . . . ,m}

gj(x)≤0, j∈ J ={m+1, . . . ,m+p},

f,h_i,g_j :Rⁿ→Rare continuously differentiable.

Feasible set:

Ω ={x|h_i(x) =0, i∈ I, g_j(x)≤0, j∈ J }

Set of indexes of active inequality constraints:

A(x) ={j∈ J |gj(x) =0}.

(4)

Optimality conditions

xsolution⇒optimality condition Examples:

xis feasible

xis a local minimizer Desirable characteristics:

Strong Easy to verify

Associated to the convergence of algorithms

(5)

Optimality conditions

Sequential optimality conditions:

xis a local minimizer⇒there exists a sequencex_k →xsuch thatP({x_k}).

Punctual optimality conditions:

local minimizer⇒KKT or not-CQ.

Weaker Constraint Qualifications (CQ) generate stronger optimality conditions.

(6)

Feasible set

(7)

The tangent cone

(8)

Geometric optimality condition

(9)

Geometric optimality condition

Ifxis a local solution, then

−∇f(x)⁰d≤0, ∀d ∈ T(x),

where thetangent cone inxis T(x)^def=

( d ∈Rⁿ

∃x^k∈Ω, x^k →x

x^k−x

kx^k−xk → _kdk^d )

∪ {0}.

Alternatively, the geometric optimality condition can be written as:

−∇f(x)∈ T(x)^◦,

whereT(x)^◦={v∈Rⁿ|v⁰d≤0, ∀d∈ T(x)}is the polar of T(x).

(10)

Geometric optimality condition

The tangent cone is not easy to be computed, nor is its polar.

But the tangent cone can be approximated by thelinearized cone

F(x)^def= {d| ∇h_i(x)⁰d=0, i∈ I, ∇g_j(x)⁰d≤0, j∈ A(x)}.

Its polar can be easily computed:

F(x)^◦ =







v|v=X

i∈I

λi∇hi(x) + X

j∈A(x)

µj∇gj(x), µj ≥0





 .

(11)

T (x) = F (x)

(12)

T (x) = F (x)

(13)

T (x) 6= F (x), T(x)

^◦

= F (x)

^◦

(14)

T (x) 6= F (x), T(x)

^◦

= F (x)

^◦

(15)

T (x) 6= F (x), T(x)

^◦

= F (x)

^◦

(16)

T (x) 6= F (x), T(x)

^◦

= F (x)

^◦

(17)

T (x) 6= F (x), T(x)

^◦

= F (x)

^◦

(18)

T (x) 6= F (x), T(x)

^◦

= F (x)

^◦

(19)

T (x) 6= F (x), T(x)

^◦

6= F (x)

^◦

(20)

T (x) 6= F (x), T(x)

^◦

6= F (x)

^◦

(21)

T (x) 6= F (x), T(x)

^◦

6= F (x)

^◦

(22)

T (x) 6= F (x), T(x)

^◦

6= F (x)

^◦

(23)

T (x) 6= F (x), T(x)

^◦

6= F (x)

^◦

(24)

Karush-Kuhn-Tucker (KKT) condition

Under the condition thatT(x)^◦ =F(x)^◦, the geometric optimality condition is:

−∇f(x)∈ F(x)^◦ =







v|v=X

i∈I

λi∇hi(x) + X

j∈A(x)

µj∇gj(x), µj ≥0





 ,

that is,

∇f₀(x) +X

i∈I

λi∇h_i(x) + X

j∈A(x)

µj∇g_j(x) =0, µj≥0.

a suficient condition:{∇h_i(x)}_i∈I ∪ {∇g_j(x)}_j∈A(x)is linearly independent.

(25)

Constraint qualifications

A Constraint Qualification (CQ) is a condition around a feasible pointxon the functions that define the feasible set in such a way that the KKT condition holds wheneverxis a local solution (for every objective functionf).

The condition

T(x)^◦=F(x)^◦. (Guignard, 1969) Is the weakest possible CQ (Gould e Tolle, 1971), in the sense that ifxis a KKT point for each objective functionf that takes a local minimum atx(subject tox∈Ω), then Guignard CQ holds.

But Guignard CQ is to weak for practical applications, in the sense that a practical algorithm that converges to a KKT point assuming only Guignard CQ is not known.

(26)

Constraint qualifications

KKT is not an optimality condition without an additional assumption on the problem:

Minimize x Subject to x² =0

KKT:1+λ0=0

(27)

Constraint qualifications

Linear Independence Constraint Qualification (LICQ) - Regularity

{∇h_i(x)}^m_i=1∪ {∇g_j(x)}_j∈A(x)is linearly independent or

∀I⊂ {1, . . . ,m},∀J⊂A(x)

{∇h_i(y)}_i∈I∪ {∇g_j(y)}_j∈Jis linearly independent ∀y∈N(x) Avoid the existance of a sequence of linearly independent gradients, that are linearly dependent in the limit.

(28)

Constraint qualifications

Constant Rank Constraint Qualification (CRCQ, Janin 1984)

∀I⊂ {1, . . . ,m},∀J⊂A(x)

If{∇hi(x)}_i∈I∪ {∇gj(x)}_j∈Jis linearly dependent, then {∇hi(y)}_i∈I∪ {∇gj(y)}_j∈Jis linearly dependent ∀y∈N(x) Example: Linear constraints

(29)

Constraint qualifications

Givenu₁, . . . ,u_m+p inRⁿ, we say that

((u₁, . . . ,u_m),(um+1, . . . ,u_m+p))is positive-linearly dependent if there existα₁, . . . , α_m+psuch that:

α₁, . . . , α_m+pare not all zero;

αm+1≥0, . . . , αm+p ≥0;

m+p

X

i=1

αiui =0.

Otherwise we say that((u₁, . . . ,u_m),(um+1, . . . ,u_m+p))is positive-linearly independent.

Note that: linearly independent⇒positive-linearly independent.

(30)

Constraint qualifications

Mangasarian-Fromovitz Constraint Qualification (MFCQ, 1967) {∇h_i(x)}^m_i=1,{∇g_j(x)}_j∈A(x)

is positive-linearly independent or

∀I⊂ {1, . . . ,m},∀J⊂A(x) {∇h_i(y)}_i∈I,{∇g_j(y)}_j∈J

is positive-linearly independent ∀y∈N(x)

Example:g(x)≤0,g(x)≤0,∇g(x)6=0.

Example:∇gj(x)^Td<0,∀j∈A(x).

Avoid the existance of a sequence of positive-linearly

independent gradients, that are positive-linearly dependent in the limit.

(31)

Constraint qualifications

Constant Positive Linear Dependence (CPLD, Qi, Wei, 2000;

Andreani, Martínez, Schuverdt, 2005)

∀I⊂ {1, . . . ,m},∀J⊂A(x) If

{∇hi(x)}_i∈I,{∇gj(x)}_j∈J

is positive-linearly dependent, then

{∇hi(y)}_i∈I,{∇gj(y)}_j∈J

is (positive-)linearly dependent ∀y∈N(x)

Example:h(x)≤0,−h(x)≤0,∇h(x)6=0.

(32)

Constraint qualifications

Relaxed Constant Rank (RCRCQ):

[Minchenko, Stakhovski, 2011]

∀J ⊂A(x),

the rank of{∇hi(y)}^m_i=1∪ {∇gi(y)}_i∈J is constant∀y∈N(x).

Constant Rank (CRCQ):

[Janin, 1984]

∀I ⊂ {1, . . . ,m},∀J ⊂A(x),

the rank of{∇h_i(y)}_i∈I∪ {∇g_i(y)}_i∈J is constant∀y∈N(x).

(33)

Example

h₁ :=x−y=0 g₁:=−x+y²≤0

g₂:=−x≤0 At the pointx=0,y=0.

∇h₁= 1

−1

∇g₁=

−1

2y

∇g₂=

−1

0

CRCQ failsbut RCRCQ holds.

(34)

Constraint qualifications

Relaxed Constant Positive Linear Dependence (RCPLD):

{∇hi(y)}^m_i=1has the same rank for everyy∈N(x), FixI ⊂ {1, . . . ,m}such that{∇h_i(x)}_i∈I is a basis for span{∇h_i(x)}^m_i=1.

For everyJ ⊂A(x), if({∇h_i(x)}_i∈I,{∇g_i(x)}_i∈J)is positive-linearly dependent, then

({∇hi(y)}i∈I,{∇gi(y)}i∈J)is (positive-)linearly dependent for everyy∈N(x).

(35)

Example

−(x+1)²−y²+1=0

x²+ (y+1)²−1≤0

−y≤0 At the pointx=0,y=0.

−2x−2

−2y

2x 2y+2

0

−1

1 0

2

+2 0

−1

=0, positive-linearly dependent.

x6=0, α

2x 2y+2

+β

0

−1

0⇒α=β =0, linearly independent.

CPLD fails.

(36)

Example

0 −2

0

+1 0

2

+2 0

−1

=0, positive-linearly dependent.

−2x−2

−2y

,

2x 2y+2

,

0

−1

, linearly dependent.

RCPLD holds.

(37)

How can we get rid of verificationsfor everysubset of inequality constraints?

RCRCQ:

∀J ⊂A(x),

the rank of{∇h_i(y)}^m_i=1∪ {∇g_i(y)}_i∈J is constant∀y∈N(x).

RCPLD:

{∇hi(y)}^m_i=1has the same rank for everyy∈N(x), For everyJ ⊂A(x), if({∇h_i(x)}_i∈I,{∇g_i(x)}_i∈J)is positive-linearly dependent, then

({∇h_i(y)}_i∈I,{∇g_i(y)}_i∈J)is (positive-)linearly dependent for everyy∈N(x).

(38)

Constraint qualifications

Cone generated by the full set of constraint gradients (polar of the linearized cone):

F^◦(x) =





 y|y=

m

X

i=1

λi∇h_i(x) + X

i∈A(x)

µi∇g_i(x), µi≥0





 .

Constant Rank of the Subspace Component (CRSC):

LetJ₋(x) ={i∈A(x)| − ∇g_i(x)∈ F^◦(x)}.

The rank of{∇h_i(y)}^m_i=1∪ {∇g_i(y)}_i∈J

−(x)is constant∀y∈N(x).

(39)

Example

g₁(x) :=x≤0 g₂(x) :=−x≤0

g₃(x) :=x²≤0 At the pointx=0.

∇g₁(x) =1

∇g₂(x) =−1

∇g₃(x) =2x

F^◦(x) =R⇒ J₋(x) ={1,2,3}.

rank{∇g₁(x),∇g₂(x),∇g₃(x)}=1for everyx(CRSC holds) {∇g₃(0) =0}is positive-linearly dependent but {∇g₃(x)}is linearly independent forx6=0(RCPLD fails)

(40)

Properties

RCRCQ ensures strong second order optimality conditions.

Stable CQs - not sensible to perturbations inx.

Inequalities inJ−(x)are locally equalities.

An error bound holds: For everyy∈N(x),

d(y,feasible set)≤αmax{kmax{0,g(y)}k∞,kh(y)k∞}.

Practical CQs (global convergence of Approximate-KKT algorithms).

Augmented Lagrangian [Andreani, Birgin, Martínez, Schuverdt, 2007, 2008]

Inexact Restoration [Martínez, Pilotta, 2000], [Fischer, Friedlander, 2010]

Sequential Quadratic Programming [Qi, Wei, 2000], [Panier, Tits, 1993]

Interior Point [Chen, Goldfarb, 2006]

(41)

Constraint qualifications

Guignard CQ (1969):F^◦(x) =T^◦(x) Abadie CQ (1967): F(x) =T(x)

(42)

Constraint qualifications

Quasinormality [Hestenes, 1975]:

For allλ∈R^m, µ≥0such that

m

X

i=1

λi∇h_i(x) + X

j∈A(x)

µj∇g_j(x) =0,

there is no sequencey^k →xsuch that(λi 6=0⇒λihi(y^k)>0) and(µj>0⇒g_j(y^k)>0).

Ex:−x²≤0(holds),x²≤0(doesn’t hold).

(43)

Constraint qualifications

Pseudonormality [Bertsekas, Ozdaglar, 2002]:

For everyλ∈R^m, µ≥0such that

m

X

i=1

λi∇h_i(x) + X

j∈A(x)

µj∇g_j(x) =0,

there is no sequencey^k →xsuch that Pm

i=1λihi(y^k) +P

j∈A(x)µjgj(y^k)>0.

(44)

Relation

Pseudonormality

Abadie MFCQ LICQ

RCRCQ CRCQ

CPLD

RCPLD

CRSC Quasinormality

(45)

Punctual vs Sequential optimality conditions

Practical CQs:

AKKT+CQ⇒KKT ?

yes: LICQ, MFCQ, (R)CRCQ, (R)CPLD, CRSC.

no: Pseudonormality, Quasinormality, Abadie, Guignard.

CRSC is not the weakest practical CQ.

(46)

Weak practical CQs

V ={v₁, . . . ,v_K},I,J ⊂ {1, . . . ,K},

span₊(I,J,V) =





 X

i∈I

λivi+X

j∈J

µjvj | λi∈R, µj≥0





 We can always find a “basis” for this cone:

I⁰⊂ I ∪ J,J⁰ ⊂ J such that

span₊(I⁰,J⁰,V) =span₊(I,J,V) and{vi}_i∈I⁰∪ {vj}_j∈J⁰ is positive-linearly independent

X

i∈I⁰

αiv_i+X

j∈J⁰

βjv_j =0, β≥0⇒(α, β) =0

(47)

Weak practical CQs

Ω ={x∈R|g₁(x) :=x≤0,g₂(x) :=−x≤0},x=0 V={∇g₁(x),∇g₂(x)},I=∅,J ={1,2}

span₊(I,J,V) ={µ₁∇g₁(x) +µ₂∇g₂(x)|µ₁ ≥0, µ₂≥0} basis:I⁰ ={1},J⁰ =∅

span₊(I⁰,J⁰,V) ={λ₁∇g₁(x)|λ₁∈R}

(48)

Weak practical CQs

x∈Ω,I :={1, . . . ,m},J :=A(x), V={∇f_i(x)}_i∈I∪J,

fi :=hi,i∈ I, fj:=gj,j∈ J.

Note that span₊(I,J,{∇f_i(x)}_i∈I∪J) =F^◦(x). The Constant Positive Generators CQ (CPG) holds atxif there is a basis I⁰,J⁰ of span₊(I,J,{∇f_i(x)}_i∈I∪J)such that

span₊(I⁰,J⁰,{∇f_i(y)}_i∈I∪J)⊃span₊(I,J,{∇f_i(y)}_i∈I∪J), for allyin some neighborhood ofx.

Every nice property of CRSC is lost, but CPG is still practical.

(49)

AKKT + CPG ⇒ KKT

∇f₀(x^k) +X

i∈I

λ^k_i∇fi(x^k) +X

j∈J

µ^k_j∇fj(x^k)→0, µ^k ≥0

∇f₀(x^k) +X

i∈I⁰

λ¯^k_i∇f_i(x^k) +X

j∈J⁰

¯

µ^k_j∇f_j(x^k)→0,µ¯^k ≥0

If(¯λ^k,µ¯^k)is unbounded

∇f₀(x^k)

k(¯λ^k,µ¯^k)k_∞+X

i∈I⁰

¯λ^k_i

k(¯λ^k,µ¯^k)k_∞∇f_i(x^k)+X

j∈J⁰

¯ µ^k_j

k(¯λ^k,µ¯^k)k_∞∇f_j(x^k)→0,

X

i∈I⁰

λi∇f_i(x) +X

j∈J⁰

µj∇f_j(x) =0, µ≥0,(λ, µ)6=0

Otherwise

∇f₀(x) +X

i∈I⁰

λ^∗_i∇f_i(x) +X

j∈J⁰

µ^∗_j∇f_j(x) =0, µ^∗≥0

(50)

Example

Ω =

x|f₁(x) :=x³₁−x₂≤0,f₂(x) :=x³₁+x₂≤0,f₃(x) :=x₁≤0

∇f₁(x) = 3x²₁

−1

,∇f₂(x) = 3x²₁

1

,∇f₃(x) = 1

0

x=0,I =∅,J ={1,2,3} I⁰={1},J⁰={3}.

(51)

Weak practical CQs

The weakest CQ such that AKKT+CQ⇒KKT

AKKT-CQ (Ramos, 2014): It holds atxif the point-to-set mapping

y7→ F^◦(y) =





 v|v=

m

X

i=1

λi∇h_i(y) + X

i∈A(x)

µi∇g_i(y), µi≥0





 is continuous atx.

(52)

References

1 R. Andreani, G. Haeser, M.L. Schuverdt, P.J.S. Silva - Two new weak constraint qualifications and applications. SIAM Journal on Optimization, 22(3), 1109-1135, 2012.

2 R. Andreani, G. Haeser, M.L. Schuverdt, P.J.S. Silva - A relaxed constant positive linear dependence constraint qualification and applications. Mathematical Programming, v. 135, p. 255-273, 2012.

3 E. G. Birgin and J. M. Martínez, Practical Augmented Lagrangian Methods for Constrained Optimization, SIAM, Philadelphia, 2014.