2 Projection and Convex Sets

(1)

Introduction to Convex Analysis Microeconomics II - Tutoring Class ^∗

Professor: V. Filipe Martins-da-Rocha TA: Cinthia Konichi

April 2010

1 Basic Concepts and Results

This is a first glance on basic convex analysis results that are going to be used exten- sively during the course. Convexity is a very important concept in optimization theory since if it is assumed, necessary conditions for optimality become sufficient conditions.

For an introduction to convex analysis as applied to optimization theory see Florenzano and Le Van (2001) and Izmailov and Solodov (2005). Borwein and Lewis (2000) and Bert- sekas et al. (2003) are other useful references. For a throughout treatment see Rockafellar (1970).

We begin by characterizing convex sets. Let E be a real vector space. Then the following definitions are appropriate:

Definition 1 A setD⊂E is said to be convex if for any x, y ∈D the set{αx+(1−α)y: α∈[0,1]} is contained inD.

The pointαx+ (1−α)y is called convex combination of x and y (with parameterα). By induction, it is easily seen that Dis convex if and only if Pp

k=1λ_kx^k∈D for every finite set{x¹, ..., x^p} of pelements of D and for every system of pnonnegative real coefficients {λ₁, ..., λ_p} such thatPp

k=1λ_k= 1. Hence, a subset Dof E is convex if and only if every convex combination of finitely many elements of D belongs to D.

Some examples of convex sets are the open (or closed) balls on a vector space, the line segments, the vector space itself and the empty set.

∗EPGE-FGV

(2)

x y αx+ (1−α)y

x y

Figure 1: A convex and a non-convex set

Lemma 2 Let Λ be an arbitrary set and {D_λ}λ∈Λ be a family of convex subsets of E. Then D=∩λ∈ΛD_λ is convex.

Proof: Let x ∈ D and y ∈ D. Then x ∈ D_λ and y ∈ D_λ for all λ ∈ Λ. Since the sets D_λ, λ ∈ Λ, are convex, αx + (1−α)y ∈ D_λ for any α ∈ [0,1] for all λ ∈ Λ. Hence, αx+ (1−α)y∈D, that isD is convex.

Then, on an optimization problem, for example, an arbitrary number of convex con- straints, on the same space, turns out to be convex.

With any arbitrary set C of E, we can associate another set, called the convex hull of C, denoted by coC, which is the intersection of all convex subsets ofE containingC.

By Lemma 2, the convex hull is convex. By the way, it is the smallest convex subset of E containingC.

Lemma 3 Let D⊂E be a convex set, then clD is convex¹

Proof: ² Letx, y ∈ clD and α ∈(0,1). Then, there are sequences (x_n)n∈N and (y_n)n∈N

in D, such that x_n → x and y_n → y. But, by the convexity of D αx_n+ (1−α)y_n ∈ D,

∀n∈N. Then αx_n+ (1−α)y_n →αx+ (1−α)y implies αx+ (1−α)y ∈clD.

cl C co C

C

Figure 2: The closure and the convex hull of a set C

1Where clD is the closure of D. Remember that the closure of an arbitrary set B on a vector space is the set of limit points of sequences belonging to B.

2For convenience, assume that E is a normed real vector space and take the usual convergence concept.

(3)

Definition 4 Let D⊂E be a convex set. The function f :D→R is convex in D when for any x∈D, y∈D e α∈[0,1] we have:

f(αx+ (1−α)y)≤αf(x) + (1−α)f(y)

The function f is said strictly convex if the inequality above is strict for all x 6= y and α∈(0,1).

x αx+ (1−α)y y f(x)

f(αx+ (1−α)y) f(y) αf(x) + (1−α)f(y)

Figure 3: An illustration of a convex function

Definition 5 Let D ⊂ E be a convex set. The function f : D → R is concave in D if (−f) is convex in D.

Next lemma states an equivalent definition of concavity:

Lemma 6 Let D⊂E be a convex set, then f :D→R is concave if and only if the set hypof :={(x, µ)∈D×R:f(x)≥µ}

is convex. Such set is called the hypograph of f.

Proof: First, suppose that hypof is convex. Let x∈D and y∈ D. Clearly, (x, f(x))∈ hypof and (y, f(y)) ∈ hypof. Because of the convexity of hypof, for all α ∈ [0,1], we have:

(αx+ (1−α)y, αf(x) + (1−α)f(y)) = α(x, f(x)) + (1−α)(y, f(y))∈hypof By the definition of hypof, we have:

f(αx+ (1−α)y)≥αf(x) + (1−α)f(y)

(4)

Sof is concave.

Conversely, suppose now that f is concave. Let (x, c₁)∈hypof and (x, c₂)∈hypof. Since f(x)≥c₁ and f(y)≥c₂, by the concavity off, for all α∈[0,1] we have:

f(αx+ (1−α)y)≥αf(x) + (1−α)f(y)≥αc₁+ (1−α)c₂ which means that:

α(x, c₁) + (1−α)(y, c₂) = (αx+ (1−α)y, αc₁+ (1−α)c₂)∈hypof Hence, the hypof is convex.

We say that

maxf(x) subject tox∈D (1)

is convex maximization problem when D ⊂E is a convex set and f : D→ R is concave inD. The importance of the convexity assumption can be seen in the following result.

Theorem 7 Let D⊂E be a convex set and f :D→R a concave function in D. Then, every local maximum of problem 1 is global. Moreover, the set of elements that maximize the problem is convex. If f is strictly concave, the problem does not have more than one maximizer.

Proof: Suppose by way of contradiction that ¯x ∈ D is a local maximizer that is not global. Then, exists y∈D such that f(y)> f(¯x). Define x(α) =αy+ (1−α)¯x. By the convexity of D, x(α)∈ D for all α ∈[0,1]. By the concavity of f, for all α ∈(0,1], we have:

f(x(α))≥αf(y) + (1−α)f(¯x) = f(¯x) +α(f(y)−f(¯x))> f(¯x)

Taking α > 0 sufficiently low, we can guarantee that the point x(α) is arbitrarily close to the point ¯x and f(x(α))> f(¯x). This contradicts the fact that ¯xis a local maximizer of problem 1. Then, any local solution must be a global solution.

Let S ⊂ D be the set of (global) maximizers and ¯v ∈ R the optimum value of the problem. Note that we havef(x) = ¯v for any x∈S. For anyx∈S, ¯x∈S andα∈[0,1], by the concavity of f, we have:

f(αx+ (1−α)¯x)≥αf(x) + (1−α)f(¯x) =α¯v+ (1−α)¯v = ¯v which implies that f(αx+ (1−α)¯x) = ¯v and then αx+ (1−α)¯x∈S.

Suppose now that f is strictly concave and that exist x∈ S and ¯x ∈S, with x 6= ¯x.

Let α ∈ (0,1). Since x and ¯x are global maximizers and αx+ (1 −α)¯x ∈ D by the convexity of D, it follows that:

f(αx+ (1−α)¯x)≤f(x) =f(¯x) = ¯v

(5)

However, as f is strictly concave:

f(αx+ (1−α)¯x)> αf(x) + (1−α)f(¯x) =α¯v + (1−α)¯v = ¯v (2) which is a contradiction.

2 Projection and Convex Sets

Henceforth, letEbe a real vector space equipped with an inner producth·,·i:E×E → R. Consider that k · k:E →R⁺ is the norm generated by such inner product.

Definition 8 Let B ⊂E be a nonempty set and let x0 ∈E be an arbitrary point. Then we define the distance of the point x₀ to the set B to be d_B(x₀) :E →R⁺, where

dB(x0) := inf

x∈Bkx−x0k

The set P_B(x₀) := {x∈B :kx−x₀k=d_B(x₀)} is called the projection of x₀ on B.

Note that, since B 6= ∅ and k · k ≥ 0, this function is well defined. It is easy to see that d_B is continuous³. If E is finite dimensional and B is a closed set, the minimum is attained. In fact, note that we can define a sequence in B whose distance from x₀ converges to d_B(x₀). But this implies that this sequence of distances is bounded, which, by the way, implies that the sequence in B is bounded. Therefore, by the Bolzano- Weierstrass Theorem⁴, this sequence admits a convergent subsequence. Thus, its limit point belongs to B (closed).

Although, under closedness the minimum is not necessarily unique, adding convexity guarantees uniqueness. The geometrical intuition (E = R²) for this result is that the distance is defined by a path which is orthogonal to the set. Therefore, two different projections imply two different paths that are orthogonal to the set. But, then we can define a triangle between those points. However, note that the convexity of B implies that the line segment joining the projections is in B. Hence we have a non-degenerate triangle with two right angles, a contradiction.

Let’s do the formal statement:

Theorem 9 Let E be a finite dimensional real vector space, with a norm defined by an inner product. Let D⊂E be a closed and convex set; and, fix x₀ ∈E. Then

3Letx, y∈E. Thenkx−x⁰k ≤ kx−yk+ky−x⁰k,∀x⁰ ∈Eby the triangle inequality⇒infx⁰∈Bkx−

x⁰k ≤ kx−yk+ infx⁰∈Bky−x⁰k. By the other hand,ky−x⁰k ≤ ky−xk+kx−x⁰k,∀x⁰∈E. Therefore

|dB(x)−dB(y)| ≤ kx−yk, i.e.,dB is a Lipschitz function (thus is continuous).

4Note that the assertion that a bounded sequence has a convergent subsequence may be invalid on an infinite dimensional space. For example, onR^∞consider the sequence (xn)n∈N such thatxn= (yn,t)t∈N

and yn,t = 1 if t = n and 0 otherwise. It’s easily seen that (xn)n∈N is bounded, however it has no convergent subsequence (on the usual definition of convergence).

(6)

−4 −3 −2 −1 1 2 3 4

−1 1 2 3

0

DA= 2.5 DC = 2.5

GH = 1

A C

D G

H

Figure 4: Geometrical intuition for the unique minimum 1. x¯∈P_D(x₀) if and only if x¯∈D and h¯x−x₀,x¯−yi ≤0, ∀y ∈D;

2. there is a unique x^? ∈D such that kx^?−x0k=dD(x0).

Proof: Let’s prove the first assertion and then use it to demonstrate the second one. Let

¯

x∈P_D(x₀). Then, since D is convex, for any α∈(0,1) and anyy ∈D\{¯x}

k¯x−x₀k ≤ k(1−α)¯x+αy−x₀k ⇒ k¯x−x₀k²− k(1−α)¯x+αy−x₀k² ≤0 h¯x−x₀,x¯−x₀i − h¯x−x₀−α¯x+αy,x¯−x₀−α¯x+αyi ≤0

h¯x−x0,x¯−x0i − h¯x−x0,x¯−x0 −αx¯+αyi − h−α¯x+αy,x¯−x0−α¯x+αyi ≤0 h¯x−x₀, α¯x−αyi − h−α¯x+αy,x¯−x₀−α¯x+αyi ≤0

h¯x−x₀, α¯x−αyi − h−α¯x+αy,x¯−x₀i − h−α¯x+αy,−αx¯+αyi ≤0 2αh¯x−x₀,x¯−yi −α²k¯x−yk² ≤0

Dividing both sides of the inequality above by 2α > 0 and letting α → 0, we get h¯x− x₀,x¯−yi ≤0.

On the other hand, let ¯x∈D be such thath¯x−x0,x¯−yi ≤0,∀y∈D. Note that, for any y∈D, h¯x−x0,x¯−yi=k¯x−x0k²+h¯x−x0, x0−yi. But, by the Cauchy-Schwartz inequality we have k¯x−x₀kkx₀−yk ≥ h¯x−x₀, x₀−yi. Therefore, ∀y∈D,

0≥ h¯x−x0,x¯−yi ⇒ k¯x−x0k² ≤ h¯x−x0, x0−yi ≤ k¯x−x0kkx0 −yk

If x_o ∈D, then k¯x−x₀k² ≤0 so ¯x=x₀ ∈P_D(x₀). If x₀ ∈/ D then k¯x−x₀k ≤ kx₀−yk,

∀y∈D, so ¯x∈PD(x0). Thus (1) is proved.

Now, let x, x⁰ ∈PD(x0). Then, ∀y ∈D,hx−x0, x−yi ≤0 and hx⁰−x0, x⁰−yi ≤0.

In particular, hx−x₀, x−x⁰i ≤0 andhx⁰−x₀, x⁰−xi ≤0. Therefore

0≥ hx−x0, x−x⁰i − hx⁰ −x0, x−x⁰i=hx−x⁰, x−x⁰i=kx−x⁰k²

(7)

Hence, x=x⁰.

Therefore, for a convex D ⊂ E, it is possible to define a function p_D : E → D such that P_D(x) ={p_D(x)}, ∀x∈E.

3 Separating Hyperplane Theorems

Before going to the results, let us define some objects.

Definition 10 For a∈E\{0} and c∈R the set

H(a, c) :={x∈E :ha, xi=c}

is said to be a hyperplane.

Note that E may be written as the union of two disjoint sets and the hyperplane.

That is E =H(a, c)∪ {x∈E :ha, xi< c} ∪ {x∈E :ha, xi> c}.

Definition 11 Let B₁, B₂ ⊂E. The hyperplane H(a, c) is said to separate B₁ and B₂ if

∀x∈B₁, ha, xi ≤c≤ ha, yi ,∀y∈B₂

If both inequalities are strict, we say that H(a, c) strictly separates B₁ and B₂.

In the geometric sense, separability means that a set is on one side of the hyperplane and the other set on the other side (see Figure 5). The next result is important for convex sets:

Lemma 12 LetD⊂E be a convex set and letH(a, c)be a hyperplane such thatH(a, c)∩

D=∅. Then D∩ {x∈E :ha, xi< c}=∅ or D∩ {x∈E :ha, xi> c}=∅.

Proof: Assume by way of contradiction that x⁰, x⁰⁰ ∈ D, x⁰ ∈ {x ∈ E : ha, xi < c} and x⁰⁰ ∈ {x∈E :ha, xi> c}. Then, ¯x=λx⁰+ (1−λ)x⁰⁰ ∈H(a, c), whereλ = _ha,x^ha,x00i−ha,x⁰⁰^i−c⁰i ∈ (0,1). Therefore, ¯x∈D∩H(a, c), contradiction.

3.1 Support Theorem

The next lemma states that a point that does not belong to the closure of a convex set can be strict separated from that set.

Lemma 13 (Minkowski Lemma) Let D ⊂E be a non-empty convex set, where E is finite dimensional. If x /∈clD, then there area ∈E\{0} andc∈Rsuch that x∈H(a, c) and ha, yi> c, ∀y∈D.

(8)

...

c

Figure 5: Strict and non-strict separability

Proof: By Lemma 3, we know that clD is convex. By Theorem 9, there is a unique projection ¯x=P_D(x)∈ clD. Let a= ¯x−x (a 6= 0 sincex /∈clD) and c=ha, xi. Then,

∀y∈D, ha, yi=h¯x−x, yi ≥ h¯x−x,xi. But, by Theorem 9,¯ h¯x−x,x¯−yi ≤0,∀y∈D.

Therefore,ha, yi ≥ h¯x−x,xi¯ =k¯x−xk²+h¯x−x, xi=k¯x−xk²+c > c, since ¯x6=x.

Definition 14 A hyperplane H(a, c) is said to support a set B if H(a, c)∩frB⁵ 6=∅ and B∩ {x∈ E :ha, xi< c} =∅ or B∩ {x∈E :ha, xi > c}=∅. If x⁰ ∈H(a, c)∩frB it is

cl D cl D

x a

Figure 6: An illustration of Minkowski Lemma

5Where frB is the boundary of B. Remember that x∈ E belongs to frB if and only if there are sequences onB andB^c which both converge tox. Thus, frB ⊂clB.

(9)

said that H(a, c) supports D at x⁰.

That is, a hyperplane supports a set if it contains at least one point of the set and if all the points of the set are on a same side of the hyperplane. Note that a set can admit more than one hyperplanes supporting it at the same point.

Theorem 15 (Support Theorem) Let E be finite dimensional and D ⊂E be a non- empty convex set. If x∈frD, then there are a∈E\{0} and c∈R such that x∈H(a, c) and ha, yi ≥c, ∀y ∈D, i.e., the hyperplane H(a, c) supports D at x.

Proof: Since x ∈ frD, there is (x_n)n∈N such that x_n → x and x_n ∈/ clD, ∀n ∈ N. By Minkowski Lemma, for each n ∈ N, there are a_n ∈ E\{0} and c_n ∈ R such that x_n∈H(a_n, c_n). Note that

ha_n, x_ni=c_n ⇒ a_n

ka_nk, x_n

= c_n ka_nk Since (x_n)_n∈_N is convergent, it is bounded. Furthermore,

an

kank

n∈N

is bounded. Then, by the Cauchy-Schwartz inequality,

cn

ka_nk

n∈N

is bounded. Therefore, there is an infinite subset N⁰ ⊂ N such that

cn

kank

n∈N⁰ and

an

kank

n∈N⁰ are convergent. Let a ∈ E and c∈R be the respective limit points of those subsequences. Sincek

an

kank

k= 1, ∀n∈N, we have a ∈ E\{0}. By continuity of the inner product, we have x ∈ H(a, c). By the other hand, since ha_n, yi> c_n, ∀y∈D, ∀n∈N, we have ha, yi ≥c.

3.2 Separating Theorems

Finally, the most important result of this material:

Theorem 16 (Separating Hyperplane) Let E be finite dimensional and D₁, D₂ ⊂E be disjoint non-empty convex sets. Then there are a∈E\{0} and c∈R such that

∀x∈D₁, ha, xi ≤c≤ ha, yi ,∀y∈D₂ i.e., there is a hyperplane H(a, c) which separates D₁ and D₂.

Proof: Define D = D₂ −D₁. Then D is convex and 0 ∈/ D because D₁ ∩D₂ = ∅. If 0∈/ clD we can apply Minkowski Lemma. Otherwise, if 0∈ frD, we apply the Support Theorem. In both cases, there exista∈E\{0}such thatha, zi ≥0 ∀z ∈D. Therefore,

∀x∈D₁, ha, xi ≤ ha, yi ,∀y∈D₂

(10)

In particular, the function ha,·i is bounded from below in D₂ and bounded from above inD₁. Then,

∀x∈D1, ha, xi ≤ sup

x⁰∈D1

ha, x⁰i ≤ inf

x⁰⁰∈D₂ha, x⁰⁰i ≤ ha, yi ,∀y∈D2

Since D₁ and D₂ are non-empty, the last inequality is well defined and defining c= sup_x⁰_∈D₁ha, x⁰i+ inf_x⁰⁰∈D2ha, x⁰⁰i

2 the result holds.

If, in addition to being convex, we assume that the sets are closed and at least one of them is compact, then we have strict separation. The intuition is that under compacity the sets cannot be, at the same time, arbitrarily near and disjoint.

Theorem 17 (Strict Separating Hyperplane) LetE be finite dimensional andD₁, D₂ ⊂ E be disjoint, non-empty, closed and convex sets. In addition assume that D1 is compact.

Then there are a∈E\{0} and b, c∈R such that

∀x∈D₁, ha, xi ≤b < c≤ ha, yi ,∀y ∈D₂ i.e., there is a hyperplane H(a, c) which separates D₁ and D₂.

Proof: Since d_D₂ is continuous andD₁ is compact, by the Weierstrass Theorem

∃x^? ∈arg min{d_D₂(x) :x∈D₁}

AsD₁ is compact, D₂ is closed and D₁∩D₂ =∅, we haved_D₂(x^?)>0. Lety^? =P_D₂(x^?).

Definea =y^? −x^? 6= 0.

But by Theorem 9, we have 0≥ hy^?−x^?, y^?−yi,∀y∈D₂ ⇒ ha, yi ≥ ha, y^?i,∀y∈D₂. Definec₂ =ha, y^?i. Note that, for any x∈D₁ we have

ky^?−xk ≥d_D₂(x)≥d_D₂(x^?) = kx^?−P_D₂(x^?)k=kx^?−y^?k ⇒x^? =P_D₁(y^?)

Hence, we have h−a, xi ≥ h−a, x^?i,∀x∈D₁. Definec₁ =ha, x^?i. Note that c₂−c₁ = ky^?−x^?k² >0. Therefore, let b = (c₂−c₁)/4 and c= (c₂−c₁)/2. Then, we have

∀x∈D₁, ha, xi ≤b < c≤ ha, yi ,∀y ∈D₂

(11)

4 Applications

In this section, we will give an important application of the results seen above.

Lemma 18 Assume E is finite dimensional and letD⊂E be a convex set. If f :D→R is concave and x∈intD, then the following set is non-empty:

∂f(x) ={b ∈E :f(y) +hb, y−xi ≤f(x), ∀y∈D}

Proof: By Lemma 6, hypof is convex. Then, by the Support Theorem, there is a ∈ (R×E)\{0} and c ∈ R such that (f(x), x) ∈ H(a, c) and ha, zi ≤ c, ∀z ∈ hypof. Let a= (a₁, a₂) where a₁ ∈R and a₂ ∈E. Analogously, let z = (z₁, z₂). For each z ∈hypof we have a₁z₁ +ha₂, z₂i ≤ c. Since for all v < z₁, (v, z₂) ∈ hypof, it should be the case that a₁ ≥0. Otherwise, for v sufficiently negative the inequality would be invalid.

Now we have to prove thata₁ >0. Assume by way of contradiction thata₁ = 0. Since x∈H(a, c), we haveha₂, xi=c. But forδ >0 sufficiently small we havex⁰ =x+δa₂ ∈D, and ha₂, x⁰i = ha₂, xi +δka₂k²_E ≤ c ⇒ a₂ = 0, a contradiction because the Support Theorem guarantees thata6= 0. Therefore, a₁ >0. Defineb=a₂/a₁ and ˆc=c/a₁. Since (f(y), y)∈hypof, we havef(y) +hb, yi ≤ˆc=f(x) +hb, xi. Thus,f(y) +hb, y−xi ≤f(x)

⇒ b∈∂f(x).

Note the implication of the Support Theorem as applied to concave functions. For each interior point of the function’s domain⁶, there is a concave programming problem whose solution is given by this point. In fact, for anyx∈intD and b ∈∂f(x) we have:

x∈arg max

x⁰∈D{f(x⁰) +hb, x⁰i}

Hence, the supporting hyperplanes gives us a way to distort (rotate around a point) the graph of the objective function in such a way that any point may be turned into a global optimum. But then, one may think about rotating the function on a special point, a point which is a solution of a restricted optimization problem.

The Lagrange Theorem builds upon this principle, but instead of using only the hypograph of the function, it uses the convex sets defined by the constraint functions.

This way, it is possible to create restrictions on the hyperplane which can arise as support for the restricted optimum. Indeed, under some conditions, we can uniquely identify such hyperplane. Then we can solve the global optimization problem first.

6If the function is assumed to be continuous over the entire domain, then the property is valid for all points. An important remark should be given (we omit the proof): A concave function is continuous at the interior of its domain.

(12)

References

D.P. Bertsekas, A. Nedi´c, and A.E. Ozdaglar. Convex analysis and optimization. Athena Scientific Belmont, Mass, 2003.

J.M. Borwein and A.S. Lewis. Convex analysis and nonlinear optimization. Springer Hong Kong, 2000.

M. Florenzano and C. Le Van. Finite dimensional convexity and optimization. Springer, 2001.

A. Izmailov and M. Solodov. Otimiza¸cão–volume 1: Condi¸cões de Otimalidade, Elemen- tos de Análise Convexa e de Dualidade. IMPA, 2005.

R.T. Rockafellar. Convex analysis. Princeton University Press, 1970.

2 Projection and Convex Sets

Introduction to Convex Analysis Microeconomics II - Tutoring Class ∗

Professor: V. Filipe Martins-da-Rocha TA: Cinthia Konichi

April 2010

1 Basic Concepts and Results

2 Projection and Convex Sets

3 Separating Hyperplane Theorems

3.1 Support Theorem

3.2 Separating Theorems

4 Applications

References

Introduction to Convex Analysis Microeconomics II - Tutoring Class ^∗