Short course: Optimality Conditions and Algorithms in Nonlinear Optimization
Part III - Constraint Qualifications
Gabriel Haeser
Department of Applied Mathematics Institute of Mathematics and Statistics
University of São Paulo São Paulo, SP, Brazil
Santiago de Compostela, Spain, October 28-31, 2014
www.ime.usp.br/∼ghaeser Gabriel Haeser
Outline
Part I - Introduction to nonlinear optimization Examples and historical notes
First and Second order optimality conditions Penalty methods
Interior point methods Part II - Optimality Conditions
Algorithmic proof of Karush-Kuhn-Tucker conditions Sequential Optimality Conditions
Algorithmic discussion
Part III - Constraint Qualifications Geometric Interpretation
First and Second order constraint qualifications Part IV - Algorithms
Augmented Lagrangian methods Inexact Restoration algorithms Dual methods
www.ime.usp.br/∼ghaeser Gabriel Haeser
The general optimization problem
Minimize f(x)
Subject to hi(x) =0, i∈ I ={1, . . . ,m}
gj(x)≤0, j∈ J ={m+1, . . . ,m+p},
f,hi,gj :Rn→Rare continuously differentiable.
Feasible set:
Ω ={x|hi(x) =0, i∈ I, gj(x)≤0, j∈ J }
Set of indexes of active inequality constraints:
A(x) ={j∈ J |gj(x) =0}.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Optimality conditions
xsolution⇒optimality condition Examples:
xis feasible
xis a local minimizer Desirable characteristics:
Strong Easy to verify
Associated to the convergence of algorithms
www.ime.usp.br/∼ghaeser Gabriel Haeser
Optimality conditions
Sequential optimality conditions:
xis a local minimizer⇒there exists a sequencexk →xsuch thatP({xk}).
Punctual optimality conditions:
local minimizer⇒KKT or not-CQ.
Weaker Constraint Qualifications (CQ) generate stronger optimality conditions.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Feasible set
www.ime.usp.br/∼ghaeser Gabriel Haeser
The tangent cone
www.ime.usp.br/∼ghaeser Gabriel Haeser
Geometric optimality condition
www.ime.usp.br/∼ghaeser Gabriel Haeser
Geometric optimality condition
Ifxis a local solution, then
−∇f(x)0d≤0, ∀d ∈ T(x),
where thetangent cone inxis T(x)def=
( d ∈Rn
∃xk∈Ω, xk →x
xk−x
kxk−xk → kdkd )
∪ {0}.
Alternatively, the geometric optimality condition can be written as:
−∇f(x)∈ T(x)◦,
whereT(x)◦={v∈Rn|v0d≤0, ∀d∈ T(x)}is the polar of T(x).
www.ime.usp.br/∼ghaeser Gabriel Haeser
Geometric optimality condition
The tangent cone is not easy to be computed, nor is its polar.
But the tangent cone can be approximated by thelinearized cone
F(x)def= {d| ∇hi(x)0d=0, i∈ I, ∇gj(x)0d≤0, j∈ A(x)}.
Its polar can be easily computed:
F(x)◦ =
v|v=X
i∈I
λi∇hi(x) + X
j∈A(x)
µj∇gj(x), µj ≥0
.
www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) = F (x)
www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) = F (x)
www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦6= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦6= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦6= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦6= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
T (x) 6= F (x), T(x)
◦6= F (x)
◦www.ime.usp.br/∼ghaeser Gabriel Haeser
Karush-Kuhn-Tucker (KKT) condition
Under the condition thatT(x)◦ =F(x)◦, the geometric optimality condition is:
−∇f(x)∈ F(x)◦ =
v|v=X
i∈I
λi∇hi(x) + X
j∈A(x)
µj∇gj(x), µj ≥0
,
that is,
∇f0(x) +X
i∈I
λi∇hi(x) + X
j∈A(x)
µj∇gj(x) =0, µj≥0.
a suficient condition:{∇hi(x)}i∈I ∪ {∇gj(x)}j∈A(x)is linearly independent.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
A Constraint Qualification (CQ) is a condition around a feasible pointxon the functions that define the feasible set in such a way that the KKT condition holds wheneverxis a local solution (for every objective functionf).
The condition
T(x)◦=F(x)◦. (Guignard, 1969) Is the weakest possible CQ (Gould e Tolle, 1971), in the sense that ifxis a KKT point for each objective functionf that takes a local minimum atx(subject tox∈Ω), then Guignard CQ holds.
But Guignard CQ is to weak for practical applications, in the sense that a practical algorithm that converges to a KKT point assuming only Guignard CQ is not known.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
KKT is not an optimality condition without an additional assumption on the problem:
Minimize x Subject to x2 =0
KKT:1+λ0=0
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Linear Independence Constraint Qualification (LICQ) - Regularity
{∇hi(x)}mi=1∪ {∇gj(x)}j∈A(x)is linearly independent or
∀I⊂ {1, . . . ,m},∀J⊂A(x)
{∇hi(y)}i∈I∪ {∇gj(y)}j∈Jis linearly independent ∀y∈N(x) Avoid the existance of a sequence of linearly independent gradients, that are linearly dependent in the limit.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Constant Rank Constraint Qualification (CRCQ, Janin 1984)
∀I⊂ {1, . . . ,m},∀J⊂A(x)
If{∇hi(x)}i∈I∪ {∇gj(x)}j∈Jis linearly dependent, then {∇hi(y)}i∈I∪ {∇gj(y)}j∈Jis linearly dependent ∀y∈N(x) Example: Linear constraints
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Givenu1, . . . ,um+p inRn, we say that
((u1, . . . ,um),(um+1, . . . ,um+p))is positive-linearly dependent if there existα1, . . . , αm+psuch that:
α1, . . . , αm+pare not all zero;
αm+1≥0, . . . , αm+p ≥0;
m+p
X
i=1
αiui =0.
Otherwise we say that((u1, . . . ,um),(um+1, . . . ,um+p))is positive-linearly independent.
Note that: linearly independent⇒positive-linearly independent.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Mangasarian-Fromovitz Constraint Qualification (MFCQ, 1967) {∇hi(x)}mi=1,{∇gj(x)}j∈A(x)
is positive-linearly independent or
∀I⊂ {1, . . . ,m},∀J⊂A(x) {∇hi(y)}i∈I,{∇gj(y)}j∈J
is positive-linearly independent ∀y∈N(x)
Example:g(x)≤0,g(x)≤0,∇g(x)6=0.
Example:∇gj(x)Td<0,∀j∈A(x).
Avoid the existance of a sequence of positive-linearly
independent gradients, that are positive-linearly dependent in the limit.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Constant Positive Linear Dependence (CPLD, Qi, Wei, 2000;
Andreani, Martínez, Schuverdt, 2005)
∀I⊂ {1, . . . ,m},∀J⊂A(x) If
{∇hi(x)}i∈I,{∇gj(x)}j∈J
is positive-linearly dependent, then
{∇hi(y)}i∈I,{∇gj(y)}j∈J
is (positive-)linearly dependent ∀y∈N(x)
Example:h(x)≤0,−h(x)≤0,∇h(x)6=0.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Relaxed Constant Rank (RCRCQ):
[Minchenko, Stakhovski, 2011]
∀J ⊂A(x),
the rank of{∇hi(y)}mi=1∪ {∇gi(y)}i∈J is constant∀y∈N(x).
Constant Rank (CRCQ):
[Janin, 1984]
∀I ⊂ {1, . . . ,m},∀J ⊂A(x),
the rank of{∇hi(y)}i∈I∪ {∇gi(y)}i∈J is constant∀y∈N(x).
www.ime.usp.br/∼ghaeser Gabriel Haeser
Example
h1 :=x−y=0 g1:=−x+y2≤0
g2:=−x≤0 At the pointx=0,y=0.
∇h1= 1
−1
∇g1=
−1
2y
∇g2=
−1
0
CRCQ failsbut RCRCQ holds.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Relaxed Constant Positive Linear Dependence (RCPLD):
{∇hi(y)}mi=1has the same rank for everyy∈N(x), FixI ⊂ {1, . . . ,m}such that{∇hi(x)}i∈I is a basis for span{∇hi(x)}mi=1.
For everyJ ⊂A(x), if({∇hi(x)}i∈I,{∇gi(x)}i∈J)is positive-linearly dependent, then
({∇hi(y)}i∈I,{∇gi(y)}i∈J)is (positive-)linearly dependent for everyy∈N(x).
www.ime.usp.br/∼ghaeser Gabriel Haeser
Example
−(x+1)2−y2+1=0
x2+ (y+1)2−1≤0
−y≤0 At the pointx=0,y=0.
−2x−2
−2y
2x 2y+2
0
−1
1 0
2
+2 0
−1
=0, positive-linearly dependent.
x6=0, α
2x 2y+2
+β
0
−1
0⇒α=β =0, linearly independent.
CPLD fails.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Example
0 −2
0
+1 0
2
+2 0
−1
=0, positive-linearly dependent.
−2x−2
−2y
,
2x 2y+2
,
0
−1
, linearly dependent.
RCPLD holds.
www.ime.usp.br/∼ghaeser Gabriel Haeser
How can we get rid of verificationsfor everysubset of inequality constraints?
RCRCQ:
∀J ⊂A(x),
the rank of{∇hi(y)}mi=1∪ {∇gi(y)}i∈J is constant∀y∈N(x).
RCPLD:
{∇hi(y)}mi=1has the same rank for everyy∈N(x), For everyJ ⊂A(x), if({∇hi(x)}i∈I,{∇gi(x)}i∈J)is positive-linearly dependent, then
({∇hi(y)}i∈I,{∇gi(y)}i∈J)is (positive-)linearly dependent for everyy∈N(x).
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Cone generated by the full set of constraint gradients (polar of the linearized cone):
F◦(x) =
y|y=
m
X
i=1
λi∇hi(x) + X
i∈A(x)
µi∇gi(x), µi≥0
.
Constant Rank of the Subspace Component (CRSC):
LetJ−(x) ={i∈A(x)| − ∇gi(x)∈ F◦(x)}.
The rank of{∇hi(y)}mi=1∪ {∇gi(y)}i∈J
−(x)is constant∀y∈N(x).
www.ime.usp.br/∼ghaeser Gabriel Haeser
Example
g1(x) :=x≤0 g2(x) :=−x≤0
g3(x) :=x2≤0 At the pointx=0.
∇g1(x) =1
∇g2(x) =−1
∇g3(x) =2x
F◦(x) =R⇒ J−(x) ={1,2,3}.
rank{∇g1(x),∇g2(x),∇g3(x)}=1for everyx(CRSC holds) {∇g3(0) =0}is positive-linearly dependent but {∇g3(x)}is linearly independent forx6=0(RCPLD fails)
www.ime.usp.br/∼ghaeser Gabriel Haeser
Properties
RCRCQ ensures strong second order optimality conditions.
Stable CQs - not sensible to perturbations inx.
Inequalities inJ−(x)are locally equalities.
An error bound holds: For everyy∈N(x),
d(y,feasible set)≤αmax{kmax{0,g(y)}k∞,kh(y)k∞}.
Practical CQs (global convergence of Approximate-KKT algorithms).
Augmented Lagrangian [Andreani, Birgin, Martínez, Schuverdt, 2007, 2008]
Inexact Restoration [Martínez, Pilotta, 2000], [Fischer, Friedlander, 2010]
Sequential Quadratic Programming [Qi, Wei, 2000], [Panier, Tits, 1993]
Interior Point [Chen, Goldfarb, 2006]
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Guignard CQ (1969):F◦(x) =T◦(x) Abadie CQ (1967): F(x) =T(x)
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Quasinormality [Hestenes, 1975]:
For allλ∈Rm, µ≥0such that
m
X
i=1
λi∇hi(x) + X
j∈A(x)
µj∇gj(x) =0,
there is no sequenceyk →xsuch that(λi 6=0⇒λihi(yk)>0) and(µj>0⇒gj(yk)>0).
Ex:−x2≤0(holds),x2≤0(doesn’t hold).
www.ime.usp.br/∼ghaeser Gabriel Haeser
Constraint qualifications
Pseudonormality [Bertsekas, Ozdaglar, 2002]:
For everyλ∈Rm, µ≥0such that
m
X
i=1
λi∇hi(x) + X
j∈A(x)
µj∇gj(x) =0,
there is no sequenceyk →xsuch that Pm
i=1λihi(yk) +P
j∈A(x)µjgj(yk)>0.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Relation
Pseudonormality
Abadie MFCQ LICQ
RCRCQ CRCQ
CPLD
RCPLD
CRSC Quasinormality
www.ime.usp.br/∼ghaeser Gabriel Haeser
Punctual vs Sequential optimality conditions
Practical CQs:
AKKT+CQ⇒KKT ?
yes: LICQ, MFCQ, (R)CRCQ, (R)CPLD, CRSC.
no: Pseudonormality, Quasinormality, Abadie, Guignard.
CRSC is not the weakest practical CQ.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Weak practical CQs
V ={v1, . . . ,vK},I,J ⊂ {1, . . . ,K},
span+(I,J,V) =
X
i∈I
λivi+X
j∈J
µjvj | λi∈R, µj≥0
We can always find a “basis” for this cone:
I0⊂ I ∪ J,J0 ⊂ J such that
span+(I0,J0,V) =span+(I,J,V) and{vi}i∈I0∪ {vj}j∈J0 is positive-linearly independent
X
i∈I0
αivi+X
j∈J0
βjvj =0, β≥0⇒(α, β) =0
www.ime.usp.br/∼ghaeser Gabriel Haeser
Weak practical CQs
Ω ={x∈R|g1(x) :=x≤0,g2(x) :=−x≤0},x=0 V={∇g1(x),∇g2(x)},I=∅,J ={1,2}
span+(I,J,V) ={µ1∇g1(x) +µ2∇g2(x)|µ1 ≥0, µ2≥0} basis:I0 ={1},J0 =∅
span+(I0,J0,V) ={λ1∇g1(x)|λ1∈R}
www.ime.usp.br/∼ghaeser Gabriel Haeser
Weak practical CQs
x∈Ω,I :={1, . . . ,m},J :=A(x), V={∇fi(x)}i∈I∪J,
fi :=hi,i∈ I, fj:=gj,j∈ J.
Note that span+(I,J,{∇fi(x)}i∈I∪J) =F◦(x). The Constant Positive Generators CQ (CPG) holds atxif there is a basis I0,J0 of span+(I,J,{∇fi(x)}i∈I∪J)such that
span+(I0,J0,{∇fi(y)}i∈I∪J)⊃span+(I,J,{∇fi(y)}i∈I∪J), for allyin some neighborhood ofx.
Every nice property of CRSC is lost, but CPG is still practical.
www.ime.usp.br/∼ghaeser Gabriel Haeser
AKKT + CPG ⇒ KKT
∇f0(xk) +X
i∈I
λki∇fi(xk) +X
j∈J
µkj∇fj(xk)→0, µk ≥0
∇f0(xk) +X
i∈I0
λ¯ki∇fi(xk) +X
j∈J0
¯
µkj∇fj(xk)→0,µ¯k ≥0
If(¯λk,µ¯k)is unbounded
∇f0(xk)
k(¯λk,µ¯k)k∞+X
i∈I0
¯λki
k(¯λk,µ¯k)k∞∇fi(xk)+X
j∈J0
¯ µkj
k(¯λk,µ¯k)k∞∇fj(xk)→0,
X
i∈I0
λi∇fi(x) +X
j∈J0
µj∇fj(x) =0, µ≥0,(λ, µ)6=0
Otherwise
∇f0(x) +X
i∈I0
λ∗i∇fi(x) +X
j∈J0
µ∗j∇fj(x) =0, µ∗≥0
www.ime.usp.br/∼ghaeser Gabriel Haeser
Example
Ω =
x|f1(x) :=x31−x2≤0,f2(x) :=x31+x2≤0,f3(x) :=x1≤0
∇f1(x) = 3x21
−1
,∇f2(x) = 3x21
1
,∇f3(x) = 1
0
x=0,I =∅,J ={1,2,3} I0={1},J0={3}.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Weak practical CQs
The weakest CQ such that AKKT+CQ⇒KKT
AKKT-CQ (Ramos, 2014): It holds atxif the point-to-set mapping
y7→ F◦(y) =
v|v=
m
X
i=1
λi∇hi(y) + X
i∈A(x)
µi∇gi(y), µi≥0
is continuous atx.
www.ime.usp.br/∼ghaeser Gabriel Haeser
References
1 R. Andreani, G. Haeser, M.L. Schuverdt, P.J.S. Silva - Two new weak constraint qualifications and applications. SIAM Journal on Optimization, 22(3), 1109-1135, 2012.
2 R. Andreani, G. Haeser, M.L. Schuverdt, P.J.S. Silva - A relaxed constant positive linear dependence constraint qualification and applications. Mathematical Programming, v. 135, p. 255-273, 2012.
3 E. G. Birgin and J. M. Martínez, Practical Augmented Lagrangian Methods for Constrained Optimization, SIAM, Philadelphia, 2014.
www.ime.usp.br/∼ghaeser Gabriel Haeser