Semideﬁnite Optimization

(1)

Semidefinite Optimization

Mastermath Spring 2012

Monique Laurent

Centrum Wiskunde & Informatica Science Park 123 1098 XG Amsterdam

The Netherlands

[email protected]

Frank Vallentin

Delft Institute of Applied Mathematics Technical University of Delft

P.O. Box 5031 2600 GA Delft The Netherlands

[email protected]

May 22, 2012

(2)

Theory and algorithms for

semidefinite optimization

(9)

CHAPTER 1 BACKGROUND: CONVEX SETS AND POSITIVE SEMIDEFINITE

MATRICES

A setCis called convex if, given any two pointsxandy inC, the straight line segment connectingxandy lies completely inside ofC. For instance, cubes, balls or ellipsoids are convex sets whereas a torus is not. Intuitively, convex sets do not have holes or dips.

Usually, arguments involving convex sets are easy to visualize by two-dimensional drawings. One reason being that the definition of convexity only involves three points which always lie in some two-dimensional plane. On the other hand, convexity is a very powerful concept which appears (sometimes unexpected) in many branches of mathematics and its applications. Here are a few areas where convexity is an important concept: mathematical optimization, high-dimensional geometry, analysis, probability theory, system and con- trol, harmonic analysis, calculus of variations, game theory, computer science, functional analysis, economics, and there are many more.

Our aim is to work with convex sets algorithmically. So we have to discuss ways to represent them in the computer, in particular which data do we want to give to the computer. Roughly speaking, there are two convenient possibil- ities to represent convex sets: By an implicit description as an intersection of halfspaces or by an explicit description as the convex combination of extreme points. The goal of this chapter is to discuss these two representations. In the context of functional analysis they are connected to two famous theorems, the Hahn-Banach theorem and the Krein-Milman theorem. Since we are only work- ing in finite-dimensional Euclidean spaces (and not in the more general setting of infinite-dimensional topological vector spaces) we can derive the statements

(10)

using simple geometric arguments.

Later we develop the theory of convex optimization in the framework of conic programs. For this we need a special class of convex sets, namely convex cones. The for optimization most relevant convex cones are at the moment two involving vectors inRⁿand two involving symmetric matrices inR^nˆn, namely the non-negative orthant, the second order cone, the cone of positive semidefinite matrices, and the cone of copositive matrices. Clearly, the cone of positive semidefinite matrices plays the main role here. As background information we collect a number of basic properties of positive semidefinite matrices.

1.1 Some fundamental notions

Before we turn to convex sets we recall some fundamental geometric notions.

The following is a brief review, without proofs, of some basic definitions and notations appearing frequently in the sequel.

1.1.1 Euclidean space

LetEbe ann-dimensionalEuclidean spacewhich is ann-dimensional real vector space having an inner product. We usually use the notationx¨y for the inner product between the vectorsxandy. This inner product defines a norm onE by}x} “?

x¨xand a metric bydpx, yq “ }x´y}.

For sake of concreteness we will work with coordinates most of the time:

One can always identify E with Rⁿ where the inner product of the column vectorsx“ px₁, . . . , x_nq^Tandy“ py₁, . . . , y_nq^Tis the usual one: x¨y“x^Ty “ řn

i“1x_iy_i. This identification involves a linear transformation T : E Ñ Rⁿ which is an isometry, i.e.x¨y“T x¨T yholds for allx, yPE. Then the norm is the Euclidean norm (or`₂-norm):}x}₂“ař

ix²_i anddpx, yq “ }x´y}₂is the Euclidean distance between two pointsx, yPRⁿ.

1.1.2 Topology in finite-dimensional metric spaces

Theballwith centerxPRⁿand radiusris

Bpx, rq “ tyPRⁿ:dpx, yq ďru.

LetAbe a subset ofn-dimensional Euclidean space. A pointxPAis aninterior point of A if there is a positive radius ε ą 0 so that Bpx, εq Ď A. The set of all interior points of A is denoted by intA. We say that a set A is open if all points ofA are interior points, i.e. ifA “ intA. The set A is closed if its complementRⁿzAis open. The(topological) closureAofAis the smallest (inclusion-wise) closed set containingA. One can show that a set A inRⁿ is closed if and only if every converging sequence of points inAhas a limit which also lies inA. A pointxPAbelongs to theboundary BAofAif for everyεą0 the ballBpx, εqcontains points inAand inRⁿzA. The boundaryBAis a closed

(11)

set and we haveA“AY BA, andBA“AzintA. The setAiscompactif every sequence inAcontains a convergent subsequence. The setAis compact if and only if it is closed and bounded (i.e. it is contained in a ball of sufficiently large, but finite, radius).

Figure 1.1: A compact, non-convex setA. Which points lie inintA,A,BA?

For instance, the boundary of the ball with radius1and center0is theunit sphere

BBp0,1q “ tyPRⁿ:dp0, yq “1u “ txPRⁿ:x^Tx“1u.

Traditionally, it is called thepn´1q-dimensional unit sphere, denoted asS^n´1, where the superscriptn´1indicates the dimension of the manifold.

1.1.3 Affine geometry

A subsetA Ď Rⁿ is called an affine subspaceof Rⁿ if it is a translated linear subspace: One can writeAin the form

A“x`L“ tx`y:yPLu

where x P Rⁿ and where L is a linear subspace of Rⁿ. The dimensionof A is defined asdimA “ dimL. Affine subspaces are closed under affine linear combinations:

@N PN@x1, . . . , xN PA@α1, . . . , αN PR:

N

ÿ

i“1

αi“1ùñ

N

ÿ

i“1

αixiPA.

The smallest affine subspace containing a set of given points is itsaffine hull.

The affine hull ofAĎRⁿis the set of all possible affine linear combinations affA“

#_N ÿ

i“1

αixi:N PN, x1, . . . , xN PA, α1, . . . , αN PR,

N

ÿ

i“1

αi“1, +

.

(12)

A fact which requires a little proof (exercise). The dimension of an arbitrary set AisdimA“dimpaffAq. One-dimensional affine subspaces arelinesandpn´1q- dimensional affine subspaces arehyperplanes. A hyperplane can be specified as

H“ txPRⁿ :c^Tx“βu,

wherec PRⁿzt0uis the normal ofH (which lies orthogonal toH) and where βPR. Sometimes we writeH_c,βfor it.

•

Figure 1.2: Determine (as accurate as possible) the coefficientsα₁, α₂, α₃of the affine combinationy“α1x1`α2x2`α3x3withα1`α2`α3“1.

If the dimension ofAĎRⁿ is strictly smaller thann, thenAdoes not have an interior, intA “ H. In this situation one is frequently interested in the interior points ofA relative to the affine subspaceaffA. We say that a point x P A belongs to the relative interior of A when there is a ball Bpx, εq with strictly positive radiusεą0so thataffAXBpx, εq ĎA. We denote the set of all relative interior points ofAby relintA. Of course, if dimA “ n, then the interior coincides with the relative interior:intA“relintA.

1.2 Convex sets

A subsetC ĎRⁿ is called aconvex setif for every pair of pointsx, y P Calso the entire line segment betweenxandy is contained in C. The line segment between the pointsxandyis defined as

rx, ys “ tp1´αqx`αy: 0ďαď1u.

Convex sets are closed underconvex combinations:

@N PN@x₁, . . . , x_N PC@α₁, . . . , α_N PRě0:

N

ÿ

i“1

α_i“1ùñ

N

ÿ

i“1

α_ix_iPC.

The convex hull ofAĎRⁿis the smallest convex set containingA. It is convA“

#_N ÿ

i“1

αixi:NPN, x1, . . . , xN PA, α1, . . . , αN PRě0,

N

ÿ

i“1

αi “1 +

,

(13)

which requires an argument. We can give a mechanical interpretation of the convex hull of finitely many pointconvtx1, . . . , xNu: The convex hull consists of all centres of gravity of point massesα1, . . . , αN at the positionsx1, . . . , xN.

The convex hull of finitely many points is called apolytope. Two-dimensional, planar, polytopes are polygons. Other important examples of convex sets are balls, halfspaces, and line segments. Furthermore, arbitrary intersections of convex sets are convex again. TheMinkowski sumof convex setsC, Dgiven by

C`D“ tx`y:xPC, yPDu is a convex set.

Figure 1.3: Exercise: What is the Minkowski sum of a square and a disk?

Here are two useful properties of convex sets. The first result gives an al- ternative description of the relative interior of a convex set and the second one permits to embed a convex set with an empty interior into a lower dimensional affine space.

Lemma 1.2.1. LetC Ď Rⁿ be a convex set. A point x P C lies in the relative interior ofCif and only if

@yPCDzPC, αP p0,1q:x“αy` p1´αqz, wherep0,1qdenotes the open interval0ăαă1.

Theorem 1.2.2. LetCĎRⁿbe a convex set. IfintC“ Hthen the dimension of its affine closure is at mostn´1.

1.3 Implicit description of convex sets

In this section we show how one can describe a closed convex set implicitly as the intersection of halfspaces (Theorem 1.3.7). For this we show the intuitive fact that through every of its boundary points there is a hyperplane which has the convex set on only one of its sides (Lemma 1.3.5). We also prove an important fact which we will need later: Any two convex sets whose relative interiors do not intersect can be properly separated by a hyperplane (Theorem 1.3.8).

After giving the definitions of separating and supporting hyperplanes we look

(14)

at the metric projection which is a useful tool to construct these separating hyperplanes.

Thehyperplaneat a pointxPRⁿwith normal vectorcPRⁿzt0uis H “ tyPRⁿ :c^Ty“c^Txu.

It is an affine subspace of dimensionn´1. The hyperplaneH dividesRⁿ into two closedhalfspaces

H^`“ tyPRⁿ :c^Tyěc^Txu, H^´“ tyPRⁿ:c^Tyďc^Txu.

A hyperplaneH is said toseparatetwo setsAĎRⁿ andB ĎRⁿ if they lie on different sides of the hyperplane, i.e., ifAĎH^` andBĎH^´or conversely. In other words,AandBare separated by a hyperplane if there exists a non-zero vectorcPRⁿ and a scalarβPRsuch that

@xPA, yPB:c^Txďβďc^Ty.

The separation is said to bestrictif both inequalities are strict, i.e.,

@xPA, yPB:c^Txăβăc^Ty.

The separation is said to be proper whenH separates A andB but does not contain bothAandB.

A hyperplaneHis said tosupportAatxPAifxPHand ifAis contained in one of the two halfspacesH^`orH^´, sayH^´. ThenHis asupporting hyperplane ofAatxandH^´is a supporting halfspace.

Figure 1.4: The hyperplaneH supportsAand separatesAandB.

1.3.1 Metric projection

Let C P Rⁿ be a non-empty closed convex set. One can project every point xPRⁿ ontoCby simply taking the point inCwhich is closest to it. This fact is very intuitive and in the case whenCis a linear subspace we are talking simply about the orthogonal projection ontoC.

(15)

Lemma 1.3.1. LetCbe a non-empty closed convex set inRⁿ. LetxPRⁿzCbe a point outside ofC. Then there exists a unique pointπCpxqinCwhich is closest to x. Moreover,πCpxq P BC.

Proof. The argument forexistenceis a compactness argument: AsCis not empty, pickz₀PCand consider the intersectionC¹ofCwith the ballBpz₀, rqcentered atz₀ and with radiusr “ }z₀´x}. Then C¹ is closed, convex and bounded.

Moreover the minimum of the distance}y´x}foryPCis equal to the minimum taken overC¹. As we minimize a continuous function over a compact set, the minimum is attained. Hence there is at least one closest point toxinC.

The argument foruniquenessrequires convexity: Letyandzbe two distinct points inC, both having minimum distance tox. In this case, the midpoint ofy andz, which lies in C, would even be closer to x, because the distance dpx,¹₂py`zqqis the height of the isosceles triangle with verticesx, y, z.

Hence there is a unique point inCwhich is at minimum distance tox, which we denote byπCpxq. Clearly, πCpxq P BC, otherwise one would find another point inCcloser toxlying in some small ballBpπCpxq, εq ĎC.

Thus, the mapπC:Rⁿ ÑCdefined by the property

@yPC:dpy, xq ědpπ_Cpxq, xq

is well-defined. This map is calledmetric projectionand sometimes we refer to the vectorπ_Cpxqas thebest approximationofxin the setC.

The metric projectionπ_Cis a contraction:

Lemma 1.3.2. LetCbe a non-empty closed and convex set inRⁿ. Then,

@x, yPRⁿ:dpπCpxq, πCpyqq ďdpx, yq.

In particular, the metric projectionπCis a Lipschitz continuous map.

Proof. We can assume that dpπCpxq, πCpyqq ‰ 0. Consider the line segment rπCpxq, πCpyqs and the two parallel hyperplanes Hx andHy atπCpxq and at πCpyqboth having normal vectorπCpxq ´πCpyq. The points xandπCpyqare separated byHxbecause otherwise there would be a point inrπCpxq, πCpyqs Ď Cwhich is closer to xthan toπ_Cpxq, which is impossible. In the same way,y andπ_Cpxq are separated byH_y. Hence,xandy are on different sides of the

“slab” bounded by the parallel hyperplanesH_x and byH_y. So their distance dpx, yqis at least the width of the slab, which isdpπ_Cpxq, π_Cpyqq.

The metric projection can reach every point on the boundary ofC:

Lemma 1.3.3. LetCbe a non-empty closed and convex set inRⁿ. Then, for every boundary pointyP BCthere is a pointxlying outside ofCso thaty“πCpxq.

Proof. First note that one can assume thatC is bounded (since otherwise re- place C by its intersection with a ball around y). Since C is bounded it is contained in a ballBof sufficiently large radius. We will construct the desired

(16)

pointxwhich lies on the boundaryBB by a limit argument. For this choose a sequence of pointsyiPRⁿzCsuch thatdpy, yiq ă1{i, and hencelimiÑ8yi“y.

Because the metric projection is a contraction (Lemma 1.3.2) we have dpy, πCpyiqq “dpπCpyq, πCpyiqq ďdpy, yiq ă1{i.

By intersecting the lineafftyi, πCpyiquwith the boundaryBBone can determine a pointxi P BB so that πCpxiq “ πCpyiq. Since the boundary BB is compact there is a convergent subsequencepxi_jqhaving a limitxP BB. Then, because of the previous considerations and becauseπCis continuous

y“πCpyq “πC

ˆ

jÑ8lim yij

˙

“ lim

jÑ8πCpyijq

“ lim

jÑ8π_Cpx_i_jq “π_C ˆ

lim

jÑ8x_i_j

˙

“π_Cpxq, which proves the lemma.

Figure 1.5: The construction which proves Lemma 1.3.3.

1.3.2 Separating and supporting hyperplanes

One can use the metric projection to construct separating and supporting hyperplanes:

Lemma 1.3.4. LetCbe a non-empty closed convex set inRⁿ. LetxPRⁿzCbe a point outsideCand letπ_Cpxqits closest point inC. Then the following holds.

(i) The hyperplane throughxwith normalx´πCpxqsupportsCatπCpxqand thus it separatestxuandC.

(ii) The hyperplane throughpx`πCpxqq{2with normalx´πCpxqstrictly sepa- ratestxuandC.

(17)

Proof. It suffices to prove (i) and then (ii) follows directly. Consider the hyper- planeHthroughxwith normal vectorc“x´πCpxq, defined by

H “ tyPRⁿ:c^Ty“c^TπCpxqu.

As c^Tx ą c^Tπ_Cpxq, x lies in the open halfspace ty : c^Ty ą c^Tπ_Cpxqu. We show that C lies in the closed halfspacety : c^Ty ď c^Tπ_Cpxqu. Suppose for a contradiction that there existsyPCsuch thatc^Tpy´π_Cpxqq ą0. Then select a scalarλP p0,1qsuch that0ăλă ^2c

Tpy´π_Cpxqq

}y´πCpxq}² ă1and setw“λy`p1´λqπCpxq which is a pointC. Now verify that}w´x} ă }πCpxq ´x} “ }c}, which follows from

}w´x}²“ }λpy´πCpxqq ´c}²“ }c}²`λ²}y´πCpxq}²´2λc^Tpy´πCpxqq and which contradicts the fact thatπCpxqis the closest point inCtox.

Figure 1.6: A separating hyperplane constructed usingπ_C.

Combining Lemma 1.3.3 and Lemma 1.3.4 we deduce that one can construct a supporting hyperplane at every boundary point.

Lemma 1.3.5. LetCĎRⁿbe a closed convex set and letxP BCbe a point lying on the boundary ofC. Then there is a hyperplane which supportsCatx.

One can generalize Lemma 1.3.4 (i) and remove the assumption thatC is closed.

Lemma 1.3.6. LetC Ď Rⁿ be a non-empty convex set and let x P RⁿzC be a point lying outsideC. Then,txuandCcan be separated by a hyperplane.

Proof. In view of Lemma 1.3.1 we only have to show the result for non-closed convex setsC. We are left with two cases: IfxRC, then a hyperplane separating txuand the closed and convex setC also separatestxuandC. If xPC, then xP BC. By Lemma 1.3.5 there is a hyperplane supportingCatx. In particular, it separatestxuandC.

(18)

As a direct application of the strict separation result in Lemma 1.3.4 (ii), we can formulate the following fundamental structural result for closed convex sets.

Theorem 1.3.7. A non-empty closed convex set is the intersection of its supporting halfspaces.

This is an implicit description as it gives a method to verify whether a point belongs to the closed convex set in question: One has to check whether the point lies in all these supporting halfspaces. If the closed convex set is given as an intersection of finitely many halfspaces, then it is called apolyhedronand the test we just described is a simple algorithmic membership test.

We conclude with the following result which characterizes when two convex sets can be separated properly. When both sets are closed and one of them is bounded, one can show a strict separation. These separation results will be the basis in our discussion of the duality theory of conic programs.

Theorem 1.3.8. LetC, DĎRⁿ be non-empty convex sets.

(i) C andD can be properly separated if and only if their relative interiors do not have a point in common:relintCXrelintD“ H.

(ii) Assume thatCandDare closed and that at least one of them is bounded. If CXD“ H, then there is a hyperplane strictly separatingCandD.

Proof. (i)The “only if” part (ùñ):LetH_c,βbe a hyperplane properly separating CandDwithCĎH^´andDĎH^`, i.e.,

@xPC, yPD:c^Txďβďc^Ty.

Suppose there is a pointx0 PrelintCXrelintD. Thenc^Tx0 “β, i.e.,x0 PH. Pick anyxP C. By Lemma 1.2.1 there existsx¹ P C andα P p0,1qsuch that x0“αx` p1´αqx¹. Now

β“c^Tx0“αc^Tx` p1´αqc^Tx¹ ďαβ` p1´αqβ“β,

hence all inequalities have to be tight and soc^Tx“β. ThusC is contained in the hyperplaneH. Similarly,D ĎH. This contradicts the assumption that the separation is proper.

The “if part” (ðù): Consider the set

E“relintC´relintD“ tx´y:xPrelintC, yPrelintDu,

which is convex. By assumption, the origin0does not lie inE. By Lemma 1.3.6 there is a hyperplaneH separatingt0uandE which goes through the origin.

SayH “Hc,0and

@xPrelintC, yPrelintD:c^Tpx´yq ě0.

(19)

Define

β “inftc^Tx:xPrelintCu.

Then,

CĎ txPRⁿ :c^Txěβu, and we want to show that

DĎ ty:c^Tyďβu.

For suppose not. Then there is a pointy PrelintD so thatc^Ty ąβ. Moreover, by definition of the infimum there is a pointxPrelintCso thatβďc^Txăc^Ty.

But then we findc^Tpx´yq ă0, a contradiction. Thus,CandD are separated by the hyperplaneHc,β.

If CYD lies in some lower dimensional affine subspace, then the argument above gives a hyperplane in the affine subspaceaffpCYDqwhich can be extended to a hyperplane inRⁿ which properly separatesCandD.

(ii) Assume thatCis bounded andCXD“ H. Consider now the set E“C´D

which is closed (check it) and convex. As the origin 0 does not lie in E, by Lemma 1.3.4 (ii), there is a hyperplane strictly separatingtxuandE: There is a non-zero vectorcand a positive scalarβ such that

@xPC, yPD:c^Tpx´yq ąβą0.

This implies inf

xPCc^Txěβ`sup

yPD

c^Tyąβ 2 `sup

yPD

c^Tyąsup

yPD

c^Ty.

Hence the hyperplaneH_c,αwithα“ ^β₂`sup

yPD

c^Tystrictly separatesCandD.

1.4 Explicit description of convex sets

Now we turn to an explicit description of convex sets. An explicit description gives an easy way to generate points lying in the convex set.

We say that a point x P C is extreme if it is not a relative interior point of any line segment inC. In other words, ifxcannot be written in the form x“ p1´αqy`αzwithy, zPCand0ăαă1. The set of all extreme points of Cwe denote byextC.

Theorem 1.4.1. LetCĎRⁿbe a compact and convex set. Then, C“convpextCq.

(20)

Proof. We prove the theorem by induction on the dimensionn. Ifn“0, thenC is a point and the result follows.

Let the dimensionnbe at least one. If the interior ofCis empty, thenClies in an affine subspace of dimension at mostn´1and the theorem follows from the induction hypothesis. Suppose thatintC‰ H. We have to show that every xPCcan be written as the convex hull of extreme points ofC. We distinguish between two cases:

First case: If xlies on the boundary ofC, then by Lemma 1.3.5 there is a supporting hyperplaneH ofCthroughx. Consider the setF “H XC. This is a compact and convex set which lies in an affine subspace of dimension at most n´1 and hence we have by the induction hypothesesxP convpextFq. Since extF ĎextC, we are done.

Second case: Ifxdoes not lie on the boundary ofC, then the intersection of a line throughxwithC is a line segmentry, zswithy, zP BC. By the previous argument we havey, zPconvpextCq. Sincexis a convex combination ofyand z, the theorem follows.

1.5 Convex cones

We will develop the theory of convex optimization using the concept of conic programs. Before we can say what a “conic program” is, we have to define convex cones.

Definition 1.5.1. A non-empty subset K of Rⁿ is called aconvex cone if it is closed under non-negative linear combinations:

@α, βPRě0@x, yPK:αx`βyPK.

Moreover,Kispointedif

x,´xPK ùñ x“0.

One can easily check that convex cones are indeed convex sets. Furthermore, thedirect product

KˆK¹“ tpx, x¹q PR^n`n

1:xPK, x¹PK¹u of two convex conesKĎRⁿandK¹ĎRⁿ

1 is a convex cone again.

A pointed convex cone inRⁿdefines apartial orderonRⁿby xľyðñx´yPK

forx, yPRⁿ. This partial order satisfies the following conditions:

reflexivity:

@xPRⁿ :xľx

(21)

antisymmetry:

@x, yPRⁿ:xľy, yľxùñx“y transitivity:

@x, y, zPRⁿ:xľy, yľzùñxľz homogenity:

@x, yPRⁿ @αPRě0:xľyùñαxľαy additivity:

@x, y, x¹, y¹ PRⁿ :xľy, x¹ ľy¹ùñx`x¹ľy`y¹.

In order that a convex cone is useful for practical algorithmic optimization methods we will need two additional properties to eliminate undesired degen- erate conditions: A convex cone should be closed and full-dimensional, that is, it has a non-empty interior. Then, we define strict inequalities by:

xąyðñx´yPintK.

LetpxiqiPNandpyiqiPNbe sequences of elements inRⁿ which have limitsxand y, then we can pass to limits in the inequalities:

pDN PN@iěN :x_iľy_iq ðñxľy.

The separation result from Lemma 1.3.4 specializes to convex cones in the following way.

Lemma 1.5.2. Let C Ď Rⁿ be a closed convex cone and let x P RⁿzC be a point outside ofC. Then there is a linear hyperplane separatingtxuandC. Even stronger, there is a non-zero vectorcPRⁿsuch that

@yPC:c^Tyě0ąc^Tx, thus with the strict inequalityc^Txă0.

1.6 Examples

The convex cone generated by a set of vectorsA ĎRⁿ is the smallest convex cone containingA. It is

coneA“

#_N ÿ

i“1

αixi:N PN, x1, . . . , xN PA, α1, . . . , αN PRě0

+ . Furthermore, every linear subspace ofEis a convex cone, however a somewhat boring one. More interesting are the following examples. We will use them, especially cone of positive semidefinite matrices, very often.

(22)

1.6.1 The non-negative orthant and linear programming

The convex cone which is connected to linear programming is the non-negative orthant. It lies in the Euclidean spaceRⁿ with the standard inner product. The non-negative orthantis defined as

Rⁿě0“ tx“ px₁, . . . , x_nq^TPRⁿ :x₁, . . . , x_ně0u.

It is a pointed, closed and full-dimensional cone. Alinear programis an optimization problem of the following form

maximize c1x1` ¨ ¨ ¨ `cnxn

subject to a11x1` ¨ ¨ ¨ `a1nxněb1

a₂₁x₁` ¨ ¨ ¨ `a_2nx_něb₂ ...

am1x1` ¨ ¨ ¨ `amnxněbm.

One can express the above linear program more conveniently using the partial order defined by the non-negative orthantRⁿě0:

maximizec^Tx subject toAxľb,

wherec “ pc₁, . . . , c_nq^T P Rⁿ is the objective vector,A “ pa_ijq P R^mˆn is the matrix of linearconstraints,x“ px₁, . . . , x_nq^TPRⁿis theoptimization variable, andb“ pb₁, . . . , b_mq^TPR^mis theright hand side. Here, the partial orderxľy means inequality coordinate-wise:x_iěy_ifor alliP rns.

1.6.2 The second-order cone

While the non-negative orthant is a polyhedron, the following cone is not. The second-order cone is defined in the Euclidean space R^n`1 “ RⁿˆR with the standard inner product. It is

L^n`1“

"

px, tq PRⁿˆR:}x}₂“ b

x²₁` ¨ ¨ ¨ `x²_nďt

* .

Sometimes it is also calledice cream cone(make a drawing to convince yourself) orLorentz cone. The second-order cone will turn out to be connected to conic quadratic programming.

1.6.3 The cone of semidefinite matrices

The convex cone which will turn out to be connected to semidefinite programming is the cone of positive semidefinite matrices. It lies in the npn`1q{2- dimensional Euclidean space of nˆn-symmetric matrices Sⁿ with the trace

(23)

inner product. Namely, for two matricesX, Y PR^nˆn, xX, Yy “TrpX^TYq “

n

ÿ

i“1 n

ÿ

j“1

X_ijY_ij, where TrX“

n

ÿ

i“1

X_ii.

Here we identify the Euclidean space Sⁿ with R^npn`1q{2 by the isometry T : SⁿÑR^npn`1q{2defined by

TpXq “ pX₁₁,?

2X₁₂,?

2X₁₃, . . . ,?

2X_1n, X₂₂,?

2X₂₃, . . . ,?

2X_2n, . . . , X_nnq where we only consider the upper triangular part (in good old FORTRAN 77 tradition) of the matrixX.

Thecone of semidefinite matricesis

S_ľ0ⁿ “ tXPSⁿ :X is positive semidefiniteu, where a matrixXispositive semidefiniteif

@xPRⁿ:x^TXxě0.

More characterizations are given in Section 1.7 below.

1.6.4 The copositive cone

The copositive cone is a cone inSⁿ which contains the semidefinite cone. It is the basis of copositive programming and it is defined as the set of all copositive matrices:

Cⁿ “ tX PSⁿ: x^TXxě0 @xPRⁿě0u.

Unlike for the semidefinite cone no easy characterization (for example in terms of eigenvalues) of copositive matrices is known. Even stronger: Unless the complexity classesP andN P coincide no easy characterization (meaning one which is polynomial-time computable) exists.

1.7 Positive semidefinite matrices

1.7.1 Basic facts

A matrix P is orthogonal if P P^T “ In or, equivalently, P^TP “ In, i.e. the rows (resp., the columns) ofP form an orthonormal basis of Rⁿ. ByOpnqwe denote the set ofnˆnorthogonal matrices which forms a group under matrix multiplication.

The spectral decomposition theorem is probably the most important theorem about real symmetric matrices.

(24)

Theorem 1.7.1. (Spectral decomposition theorem)Any real symmetric matrix XPSⁿcan be decomposed as

X“

n

ÿ

i“1

λ_iu_iu^T_i,

whereλ₁, . . . , λ_n P Rare the eigenvalues of X and where u₁, . . . , u_n P Rⁿ are the corresponding eigenvectors which form an orthonormal basis ofRⁿ. In matrix terms,X “P DP^T, whereDis the diagonal matrix with theλi’s on the diagonal andP is the orthogonal matrix with theui’s as its columns.

Theorem 1.7.2. (Positive semidefinite matrices)LetX PSⁿ be a symmetric matrix. The following assertions are equivalent.

(1) Xis positive semidefinite, written asXľ0, which is defined by the property:

x^TXxě0for allxPRⁿ.

(2) The smallest eigenvalue ofX is non-negative, i.e., the spectral decomposition ofX is of the formX“řn

i“1λ_iu_iu^T_i with allλ_iě0.

(3) X “LL^Tfor some matrixL PR^nˆk (for somek ě 1), called aCholesky decompositionofX.

(4) There exist vectorsv₁, . . . , v_n PR^k (for some kě1) such thatX_ij “v^T_iv_j for alli, jP rns; the vectorsvi’s are called aGram representationofX.

(5) All principal minors ofXare non-negative.

The setS_ľ0ⁿ of all positive semidefinite matrices is a pointed, closed, convex, full-dimensional cone inSⁿ. Moreover, it is generated by rank one matrices, i.e.

S_ľ0ⁿ “conetxx^T:xPRⁿu.

Matrices lying in the interior of the coneS_ľ0ⁿ are calledpositive definite. The above result extends to positive definite matrices. A matrixX is positive definite (denoted asXą0) if it satisfies any of the following equivalent properties:

(1)x^TXx ą0 for allxP Rⁿzt0u, (2) all eigenvalues are strictly positive, (3) in a Cholesky decomposition the matrixLis non-singular, (4) any Gram representation has full rankn, and (5) all the principal minors are positive (in fact already positivity of all the leading principal minors implies positive definite- ness; Sylvester’s criterion).

1.7.2 The trace inner product

Thetrace of annˆn matrixA is defined as TrpAq “ řn

i“1Aii. The trace is a linear form onR^nˆn and satisfies the following properties: TrpAq “TrpA^Tq, TrpABq “TrpBAq. Moreover, ifAis symmetric then the trace ofAis equal to the sum of the eigenvalues ofA.

(25)

One can define an inner product onR^nˆnby setting xA, By “TrpA^TBq “

n

ÿ

i,j“1

A_ijB_ij.

This defines theFrobenius normonR^nˆnby setting}A} “a

xA, Ay. For a vector xP Rⁿ we have x^TAx“ xA, xx^Ty. For positive semidefinite matrices we have the following result.

Lemma 1.7.3. For a symmetric matrixAPSⁿ,

Aľ0 ðñ @BPS_ľ0ⁿ :xA, By ě0.

In other words, the coneS_ľ0ⁿ is self-dual:pS_ľ0ⁿ q^˚“S_ľ0ⁿ .

Proof. Direct verification using the conditions (1) and (2) in Theorem 1.7.2.

1.7.3 Hoffman-Wielandt inequality

Here is a nice inequality to know about eigenvalues.

Theorem 1.7.4. (Hoffman, Wielandt (1953)) LetA, BPSⁿ be symmetric ma- trices with respective eigenvalues α₁, . . . , α_n and β₁, . . . , β_n ordered as follows:

α₁ď. . .ďα_nandβ₁ě. . .ěβ_n. Then,

n

ÿ

i“1

αiβi “mintTrpAXBX^Tq:X POpnqu. (1.1) In particular,

TrpABq ě

n

ÿ

i“1

αiβi.

Proof. WriteA“P DP^TandB “QEQ^TwhereP, QPOpnqandD (resp.,E) is the diagonal matrix with diagonal entriesα_i (resp.β_i). As TrpAXBX^Tq “ TrpDY EY^Tq where Y “ P^TXQ P Opnq, the optimization problem (1.1) is equivalent to

mintTrpDXEX^Tq:X POpnqu.

We want to prove that the minimum is TrpDEq, which is attainedX “I. For this consider the linear program¹

max

x,yPRⁿ

# _n ÿ

i“1

xi`

n

ÿ

j“1

yj:αiβj´xi´yj ě0@i, jP rns +

(1.2)

1We now use only the dual linear program (1.3), but we will use also the primal linear program (1.2) in Chapter 2 for reformulating program (1.1) as a a semidefinite program.

(26)

and its dual linear program min

ZPR^nˆn

# _n ÿ

i,j“1

αiβjZij:

n

ÿ

i“1

Zij“1@jP rns,

n

ÿ

j“1

Zij“1@iP rns, Zě0 +

. (1.3) Note that the feasible region of the linear program (1.3) is the set of alldoubly- stochastic matrices,i.e., the matrices with non-negative entries where all rows and all columns sum up to one. By Birkhoff’s theorem the set of doubly- stochastic matrices is equal to the convex hull of all permutation matrices. In other words, the minimum of (1.3) is equal to the minimum value ofřn

i“1αiβ_σpiq taken over all permutationsσofrns. It is an easy exercise to verify that this minimum is attained for the identity permutation. This shows that the optimum value of (1.3) (and thus of (1.2)) is equal toř

iαiβi“TrpDEq.

Now pick any X P Opnq. Observe that the matrix Z “ ppXijq²qⁿ_i,j“1 is doubly-stochastic (by the definition thatXis orthogonal) and that

TrpDXEX^Tq “

n

ÿ

i,j“1

α_iβ_jpX_ijq²,

which implies that TrpDXEX^Tq is at least the minimum TrpDEq of program (1.3). This shows that the minimum of (1.1) is at least TrpDEq, which finishes the proof of the theorem.

1.7.4 Schur complements

The following notion ofSchur complementcan be very useful for showing positive semidefiniteness.

Definition 1.7.5. (Schur complement)Consider a symmetric matrixXin block form

X“

ˆA B B^T C

˙

, (1.4)

withA PR^nˆn, B P R^nˆl andC PR^lˆl. Assume thatAis non-singular. Then, the matrixC´B^TA^´1B is called theSchur complementofAinX.

Lemma 1.7.6. LetX PSⁿ be in block form (1.4) whereAis non-singular. Then, X ľ0 ðñ Aľ0andC´B^TA^´1B ľ0.

Proof. The following identity holds:

X “P^T

ˆA 0

0 C´B^TA^´1B

˙

P, whereP “

ˆI A^´1B

0 I

˙ .

AsP is non-singular, we deduce thatX ľ 0 if and only ifpP^´1q^TXP^´1 ľ 0 which is thus equivalent toAľ0andC´B^TA^´1Bľ0.

(27)

1.7.5 Block-diagonal matrices

Given matricesX1 P Sⁿ¹, . . . , Xr P Sⁿ^r, X1‘ ¨ ¨ ¨ ‘Xr denotes the following block-diagonal matrixX PSⁿ, wheren“n₁` ¨ ¨ ¨ `n_r,

X “X1‘ ¨ ¨ ¨Xr“

¨

˚

˝

X1 0 . . . 0 0 X2 . . . 0 ... . .. ...

0 0 . . . Xr

˛

‹

‚

. (1.5)

Then, X is positive semidefinite if and only if all the blocks X₁, . . . , X_r are positive semidefinite.

Given two sets of matricesAandB, A‘B denotes the set of all matrices X‘Y, whereX PAandY PB. Moreover, for an integermě1,mAdenotes A‘ ¨ ¨ ¨ ‘A, them-fold sum.

From an algorithmic point of view it is much more economical to deal with positive semidefinite matrices in block-form like (1.5).

For instance, if we have a setAof matrices that pairwise commute, then it is well known that they admit a common set of eigenvectors. In other words, there exists an orthogonal matrixP POpnqsuch that the matricesP^TXP are diagonal for allX PA.

In general one may use the following powerful result about C^˚-algebras which permits to show that certain sets of matrices can be block-diagonalized.

Consider a non-empty setAĎC^nˆnof matrices.Ais said to be aC^˚-algebra if it satisfies the following conditions:

1. A is closed under matrix addition and multiplication, and under scalar multiplication.

2. For any matrixAPA, its conjugate transposeA^˚also belongs toA.

For instance, the full matrix algebraC^nˆnis a simple instance ofC^˚-algebra, and the algebraÀr

i“1m_iCⁿⁱ^ˆnⁱ as well, wheren_i, m_iare integers. The following fundamental result shows that up to an orthogonal transformation this is the general form of aC^˚-algebra.

Theorem 1.7.7. (Wedderburn-Artin theorem) Assume A is a C^˚-algebra of matrices inC^nˆncontaining the identity matrix. Then there exists a unitary matrix P (i.e., such thatP P^˚ “In) and integersr, n1, m1, . . . , nr, mrě1such that the setP^˚AP “ tP^˚XP :X PAuis equal to

m1Cⁿ¹^ˆn¹‘. . .‘mrCⁿ^r^ˆn^r.

See e.g. the thesis of Gijswijt [4] for a detailed exposition and its use for bounding the size of error correcting codes in finite fields.

(28)

1.7.6 Kronecker and Hadamard products

Given two matricesA“ pAijq PR^nˆmandB “ pBhkq PR^pˆq, theirKronecker productis the matrixAbB PR^npˆmqwith entries

A_ih,jk“A_ijB_hk@iP rns, jP rms, hP rps, kP rqs.

It can also be seen as thenˆmblock matrix whoseij-th block is thepˆqmatrix AijB for alliP rns, jP rms.

This includes in particular defining the Kronecker product ubv P R^npof two vectorsuPRⁿandvPR^p, with entriespubvqih“uivhforiP rns, hP rps.

Given two matrices A, B P R^nˆm, their Hadamard product is the matrix A˝BPR^nˆmwith entries

pA˝Bqij“AijBij @iP rns, jP rms.

Note thatA˝Bcoincides with the principle submatrix ofAbBindexed by the subset of all ‘diagonal’ pairs of indices of the formpii, jjqforiP rns, jP rms.

Here are some (easy to verify) facts about these products, where the matrices and vectors have the appropriate sizes.

1. pAbBqpCbDq “ pACq b pBDq.

2. In particular,pAbBqpubvq “ pAuq b pBvq.

3. AssumeAPSⁿandBPS^phave, respectively, eigenvaluesα1, . . . , αnand β1, . . . , βp. ThenAbB PS^nphas eigenvaluesαiβhforiP rns, hP rps. In particular,

A, Bľ0ùñAbBľ0 and A˝Bľ0, Aľ0ùñA^˝k“ ppAijq^kqľ0@kPN.

1.8 Historical remarks

The history of convexity is astonishing: On the one hand, the notion of convexity is very natural and it can be found even in prehistoric arts. For instance, the Platonic solids are convex polyhedra and carved stone models of some of them were crafted by the late neolithic people of Scotland more than 4,000 years ago. For more information on the history, which unearthed some good hoax, see also John Baez’ discussion of “Who discovered the icosahedron?” http:

//math.ucr.edu/home/baez/icosahedron/.

On the other hand, the first mathematician who realized how important convexity is as a geometric concept was the brilliant Hermann Minkowski (1864–

1909) who in a series of very influential papers “Allgemeine Lehrsätze über die konvexen Polyeder” (1897), “Theorie der konvexen Körper, insbesondere

(29)

Begr¨undung ihres Oberfl¨achenbegriffs” (published posthumously) initiated the mathematical study of convex sets and their properties. All the results in this chapter on the implicit and the explicit representation of convex sets can be found there (although with different proofs).

Not much can be added to David Hilbert’s (1862–1943) praise in his obituary of his close friend Minkowski:

Dieser Beweis eines tiefliegenden zahlentheoretischen Satzes²ohne rech- nerische Hilfsmittel wesentlich auf Grund einer geometrisch anschau- lichen Betrachtung ist eine Perle Minkowskischer Erfindungskunst. Bei der Verallgemeinerung auf Formen mitnVariablen führte der Minkowski- sche Beweis auf eine natürlichere und weit kleinere obere Schranke für jenes Minimum M, als sie bis dahin Hermite gefunden hatte. Noch wichtiger aber als dies war es, daß der wesentliche Gedanke des Mink- owskischen Schlußverfahrens nur die Eigenschaft des Ellipsoids, daß dasselbe eine konvexe Figur ist und einen Mittelpunkt besitzt, benutzte und daher auf beliebige konvexe Figuren mit Mittelpunkt übertragen werden konnte. Dieser Umstand führte Minkowski zum ersten Male zu der Erkenntnis, daß überhaupt derBegriff des konvexen Körpersein fundamentaler Begriff in unserer Wissenschaft ist und zu deren frucht- barsten Forschungsmitteln gehört.

Ein konvexer (nirgends konkaver) Körper ist nach Minkowski als ein solcher Körper definiert, der die Eigenschaft hat, daß, wenn man zwei seiner Punkte in Auge faßt, auch die ganze geradlinige Strecke zwischen denselben zu dem Körper gehört.³

Until the end of the 1940s convex geometry was a small discipline in pure mathematics. This changed dramatically when in 1947 the breakthrough of general linear programming came. Then Dantzig formulated the linear programming problem and designed the simplex algorithm for solving it. Nowa- days, convex geometry is an important toolbox for researchers, algorithm de- signers and practitioners in mathematical optimization.

2Hilbert is refering to Minkowski’s lattice point theorem. It states that for any invertible matrix APR^nˆndefining a latticeAZⁿand any convex set inRⁿwhich is symmetric with respect to the origin and with volume greater than2ⁿdetpAq²contains a non-zero lattice point.

3It is not easy to translate Hilbert’s praise into English without losing its poetic tone, but here is an attempt. This proof of a deep theorem in number theory contains little calculation. Using chiefly geometry, it is a gem of Minkowski’s mathematical craft. With a generalization to forms havingn variables Minkowski’s proof lead to an upper boundMwhich is more natural and also much smaller than the bound due to Hermite. More important than the result itself was his insight, namely that the only salient features of ellipsoids used in the proof were that ellipsoids are convex and have a center, thereby showing that the proof could be immediately generalized to arbitrary convex bodies having a center. This circumstances led Minkowski for the first time to the insight that the notion of a convex body is a fundamental and very fruitful notion in our scientific investigations ever since.

Minkowski defines a convex (nowhere concave) body as one having the property that, when one looks at two of its points, the straight line segment joining them entirely belongs to the body.

(30)

1.9 Further reading

Two very good books which emphasize the relation between convex geometry and optimization are by Barvinok [1] and by Gruber [5] (available online).

Less optimization but more convex geometry is discussed in the little book of Bonnesen, Fenchel [3] and the encyclopedic book by Schneider [7]. The first one is now mainly interesting for historical reasons. Somewhat exceptional, and fun to read, is Chapter VII in the book of Berger [2] (available online) where he gives a panoramic view on the concept of convexity and its many relations to modern higher geometry.

Let us briefly mention connections to functional analysis. Rudin in his clas- sical book “Functional analysis” discusses Theorem 1.3.8 and Theorem 1.4.1 in an infinite-dimensional setting. Although we will not need these more general theorems, they are nice to know.

The Hahn-Banachseparation theoremis Theorem 3.4 in Rudin.

Theorem 1.9.1. SupposeAandBare disjoint, nonempty, convex sets in a topo- logical vector spaceX.

(a) IfAis open there existΛPX^˚andγPRsuch that

<Λxăγď<Λy

for everyx P A and for everyy P B. (Here,<z is the real part of the complex numberz.)

(b) IfAis compact,B is closed, andX is locally convex, there existΛ P X^˚, γ1PR,γ2PR, such that

<Λxăγ₁ăγ₂ă<Λy for everyxPAand for everyyPB.

The Krein-Milman theorem is Theorem 3.23 in Rudin.

Theorem 1.9.2. SupposeX is a topological vector space on whichX^˚ separates points. IfK is a nonempty compact convex set inX, thenKis the closed convex hull of the set of its extreme points.

In symbols,K“convpextpKqq.

In his blog “What’s new?” Terry Tao [8] gives an insightful discussion of the finite-dimensional Hahn-Banach theorem.

The book “Matrix analyis” by Horn and Johnson [6] contains a wealth of very useful information, more than 70 pages, about positive definite matrices.

1.10 Exercises

1.1. Give a proof for the following statement:

LetCĎRⁿbe a convex set. IfC‰ H, thenrelintC‰ H

(31)

1.2. Give a proof for the following statement:

LetCĎRⁿ be a closed convex set and letxPRⁿzCa point lying outside of C. A separating hyperplane H is defined in Lemma 1.3.4. Consider a point y on the line afftx, πCpxqu which lies on the same side of the separating hyperplaneH asx. Then,π_Cpxq “π_Cpyq.

1.3. (a) Prove or disprove: LetAĎRⁿbe a subset. Then, convA“convA.

(b) Construct two convex setsC, D ĎR²so that they can be separated by a hyperplane but which cannot be properly separated.

1.4. Show that thelⁿ_p unit ball

$

&

%

px1, . . . , xnq^TPRⁿ :}x}p“

˜ _n ÿ

i“1

|xi|^p

¸1{p

ď1 , . -

is convex forp“ 1, p“ 2andp“ 8(}x}₈ “ maxi“1,...,n|xi|). Deter- mine the extreme points and determine a supporting hyperplane for every boundary point.

(*) What happens for the otherp?

Semideﬁnite Optimization

Semidefinite Optimization

Monique Laurent

Frank Vallentin

May 22, 2012

CONTENTS

I Theory and algorithms for semidefinite optimization 1

II Applications in combinatorics 75

III Applications in geometry 138

IV Applications in algebra 197

Part I

Theory and algorithms for

semidefinite optimization

CHAPTER 1

BACKGROUND: CONVEX SETS AND POSITIVE SEMIDEFINITE

MATRICES

1.1 Some fundamental notions

1.1.1 Euclidean space

1.1.2 Topology in finite-dimensional metric spaces

1.1.3 Affine geometry

1.2 Convex sets

1.3 Implicit description of convex sets

1.3.1 Metric projection

1.3.2 Separating and supporting hyperplanes

1.4 Explicit description of convex sets

1.5 Convex cones

1.6 Examples

1.6.1 The non-negative orthant and linear programming

1.6.2 The second-order cone

1.6.3 The cone of semidefinite matrices

1.6.4 The copositive cone

1.7 Positive semidefinite matrices

1.7.1 Basic facts

1.7.2 The trace inner product

1.7.3 Hoffman-Wielandt inequality

1.7.4 Schur complements

1.7.5 Block-diagonal matrices

1.7.6 Kronecker and Hadamard products

1.8 Historical remarks

1.9 Further reading

1.10 Exercises