• Nenhum resultado encontrado

Since our focus in this text is optimization of convex functions, we will focus on the theory and results derived by applying the ideas from the previous section to convex functions. Before jumping to the main definition of this section, let us apply some of the ideas regarding separating hyperplanes to the epigraph of a proper closed convex functionf:E→(−∞,+∞].

By Theorem 3.3.2, the setepif ⊆E⊕Ris the intersection of a collection of closed half-spaces in E⊕R. To study the form of the half-spaces that contain epif, let us look at the form of hyperplanes in E⊕R. If H ⊆E⊕Ris a hyperplane, then there are y⊕γ ∈ E⊕R andα ∈R such that

H={x⊕µ∈E⊕R:hx⊕µ, y⊕αi=γ}.

If we multiply the equation regarding the points in H by a non-zero scalar, the hyperplane H keeps unchanged. With this in mind, let us look at two cases in which the hyperplane H may fit.

Either α = 0, in which case we say thatH is vertical, or α 6= 0, which we assume is the case for the remainder of this discussion. In this case, define x⊕µ := −α−1(y⊕γ) and multiply the equation in the definition ofH by−α−1. Then,

H ={x⊕µ∈E⊕R:hx⊕µ, x⊕ −1i=µ}

={x⊕µ∈E⊕R:hx, xi −µ =µ}.

Finally, define the closed half-spaces

H:={x⊕µ∈E⊕R:hx, xi −µ ≤µ}

and

H:={x⊕µ∈E⊕R:hx, xi −µ ≥µ}.

Sincex⊕µ∈epif for anyx∈domf andµ∈Rsuch thatµ≥f(x), we haveepif 6⊆H. Moreover, by setting h(x) :=hx, xi −µ, one can readily see thatH= epih. Not only that, we also have epif ⊆epih if and only ifh(x)≤f(x) for everyx∈E.

From the above discussion, we conclude that the epigraph of a proper closed convex function f:E→(−∞,+∞]is the intersection of half-spaces of two types. In one of these cases, the half-spaces are epigraphs of affine functionswhich lower boundf everywhere, that is, functions of the form x∈E7→ hx, xi−µwherex⊕µ∈E⊕Rsuch thathx, xi−µ ≤f(x)for everyx∈E. The second type of half-spaces considered are associated with vertical hyperplanes in E⊕R. Sincef is proper, f(x)6=−∞for any x∈Eand, thus,epif cannot be the intersection of only half-spaces associated with vertical hyperplanes. The next theorem tell us an interesting fact: half-spaces associated with vertical hyperplanes do not need to be considered at all, that is,epif is the intersection of all affine functions which lower boundf everywhere.

Theorem 3.4.1 ([59, Theorem 12.1]). Letf:E→(−∞,+∞]be a proper closed convex function and define2

H:={x∈E7→ hx, xi −µ :x⊕µ ∈E⊕Rand hx, xi −µ≤f(x) for eachx∈E}.

Then f(x) = suph∈Hh(x) and epif =T

h∈Hepih for every x∈E.

The above discussion helps us gain some intuition on how the results from Section 3.3 can be interpreted when applied to epigraphs of convex functions. Maybe more importantly, the above discussion tries to give a bit of the intuition on theFenchel conjugate of a convex function. Take the set

F :={x⊕µ ∈E⊕(−∞,+∞] : epif ⊆epih, whereh:=hx,·i −µ}, (3.3) that is, that set of points which define the affine functions that lower boundf. In this section we will study the function whose epigraph is the set F. This function is know as theFenchel conjugate of f.

Formally, let f:E → [−∞,+∞]. The (Fenchel) conjugate of f is the function f:E → (−∞,+∞]defined by

f(x) := sup

x∈E

(hx, xi −f(x)), ∀x ∈E.

Note that for any function f:E→[−∞,+∞]we have that epif can be written as follows:

epif ={x⊕µ∈E⊕R:f(x)≤µ}

={x⊕µ∈E⊕R:hx, xi −f(x)≤µ for allx∈E}

={x⊕µ∈E⊕R:hx, xi −µ≤µ for allx⊕µ∈epif}

= \

x⊕µ∈epif

{x⊕µ ∈E⊕R:hx, xi −µ≤µ}

= \

x⊕µ∈epif

{x⊕µ ∈E⊕R:hx⊕µ, x⊕ −1i ≤µ}.

That is, epif is the intersection of closed half-spaces. Thus, epif is a closed convex set and by Theorem 3.2.4 we have that f is a closed convex function. Moreover, note that the epigraph of the conjugate matches the set from (3.3) from our discussion. On Figure 3.3 we give a illustration of the evaluation of the conjugatef of a function f at a pointx ∈E.

In spite of the discussion regarding the connections between the Fenchel conjugate and separating hyperplanes, one may still feel that the results from Section 3.3 were not useful. Indeed, the only result so far which relies on the Hyperplane Separation Theorem is Theorem 3.4.1, which was not yet put to use. One may even note that the definition of Fenchel conjugate applies to general functions, not only to convex functions. The importance of Theorem 3.4.1 and, thus, of the Hyperplane Separation Theorem, is to show that the conjugate of the conjugate of a closed convex function is the function itself. This result is fundamental for most of the results regarding Fenchel conjugates.

Additionally, the property that the dual of the dual of some object is the object itself is usually one of the most important properties of many duality theories (see [9] for some examples).

Theorem 3.4.2 ([59, Theorem 12.2]). Letf:E→(−∞,+∞]be a convex function. Thenf is a closed convex function and proper if and only if f is proper. Moreover,(clf) =f and f∗∗= clf.

2In words,His the set of affine functions which lower boundf.

Figure 3.3: Illustration of the Fenchel conjugate of a functionf evaluated at a point x ∈E. One can think of the supremum in the definition of f as sliding the red line (the graph of the affine function) vertically up to the point where it touches the graph of the function.

Proof. As we have already discussed,epif is the intersection of closed half-spaces. Thus,epif is a closed convex set and f is a closed function by Theorem 3.2.4. For the remainder of the claims in the statement, let us first look at the case when f is improper. If there isx¯ ∈ E such thatf(¯x) =−∞, thenf(x) = +∞ for anyx∈E. In this case we have(clf)=f sinceclf is the constant −∞function by definition. Moreover, f∗∗ is the constant−∞ function, that is, f∗∗= clf. If domf =∅, then f = clf, we clearly have f(x) =−∞ for every x ∈ E, and, thus, f∗∗ is the constant −∞function.

Suppose now that f is proper. In this case,clf is the function whose epigraph iscl(epif). With this in mind, we have

epif ={x⊕µ ∈E⊕R:f(x)≤µ}

={x⊕µ ∈E⊕R:hx, xi −f(x)≤µ for every x∈E}

={x⊕µ ∈E⊕R:hx, xi −µ ≤f(x)for every x∈E}

={x⊕µ ∈E⊕R:h(x)≤f(x) for every x∈E, whereh:=hx,·i −µ} (3.4)

={x⊕µ ∈E⊕R: epif ⊆epih whereh:=hx,·i −µ}

={x⊕µ ∈E⊕R: cl(epif)⊆epih whereh:=hx,·i −µ} (3.5)

={x⊕µ ∈E⊕R: epi(clf)⊆epih whereh:=hx,·i −µ}

= epi((clf)),

where in (3.5) we used thatepih is closed since it is a closed half-space inE⊕R. Thus,f = (clf). By Theorem 3.4.1, clf is the pointwise supremum of all affine functions h := hx,·i −µ with x ⊕µ ∈E⊕R such that h(x) ≤(clf)(x) holds for every x ∈ E. By (3.4), the latter holds for x⊕µ ∈E⊕R if and only if we havex⊕µ∈epif. Therefore, for every x∈E we have

clf(x) = sup{ hx, xi −µ :x⊕µ ∈epif}= sup{ hx, xi −f(x) : x∈E}=f∗∗(x).

If f:E→(−∞,+∞] is a proper function, then by the definition of conjugate one can easily get

theFenchel-Young inequality:

hx, xi ≤f(x) +f(x), ∀x, x∈E.

In spite of the simplicity of the above inequality, the case when this inequality holds as an equation will be very important when we look at subgradients in Section 3.5.

For the sake of concreteness, let us quickly compute the conjugates of some functions. As a warm-up, let us compute the conjugate of the indicator function of a convex set C ⊆E. For each x ∈E, we have

(δ(· |C))(x) = sup

x∈E

hx, xi −δ(x|C)

= sup

x∈C

hx, xi=δ(x|C).

That is, the conjugate of the indicator function is the support function! This is one of the reason of the similarity between the notation for both of these functions.

Fortunately, we will not need to compute the conjugate of very complex functions. Still, if we know the conjugate of a convex function f:E → (−∞,+∞], we often need to deal with the conjugate of the functions λf for any λ∈R++. The following theorem shows how to compute such conjugates and, even though we skip the proof for the sake of conciseness, the proof follows easily from the definition of conjugate.

Theorem 3.4.3 ([59, Theorem 16.1]). If f:E→(−∞,+∞]is a proper convex function, then for any λ∈R++ and x ∈Ewe have (λf)(x) =λf−1x).

Let us now compute the conjugate of the negative entropy function. This function will be used extensively in the text and understanding the behavior of its conjugate shall be very useful later on.

Proposition 3.4.4. DefineR(x) := 1ηPd

i=1[xi >0]xilnxi+δ(x|Rd+)for eachx∈Rd, whereη ∈R is some positive constant. Then R is a proper closed convex function and

R(y) = 1 η

d

X

i=1

eηyi−1, ∀y∈Rd.

Proof. Define ψ(α) := [α >0]αlnα+δ(α|R+) for every α ∈ R. Note thatR(x) =Pd i=1 1

ηψ(xi).

First, let us show that psi is a closed convex function, which implies that so is R. Define the functionφ:=ψ+δ(· |R++). Sinceφ00(α) =α−1 >0for any α∈R++ by Lemma 3.1.1 we conclude that φ is a convex function. Moreover, limα↓0φ(α) = 0 = ψ(α). Thus, clφ=ψ and we conclude thatψ is a closed convex function. Thus,R is a closed convex function and one can easily see that R is proper.

Let us now show that,

(3.6) for any β ∈R, the supremumsupα∈R(βα−ψ(α)) =ψ(β) is attained byeβ−1

and ψ(β) =eβ−1.

To see this, letβ ∈R and define h(α) :=βα−ψ(α) for each α∈R. Note that for anyα∈R++ we have

h0(α) =β−ψ0(α) =β−1−lnα.

Therefore, forα∈R++we have thath0(α) = 0if and only ifα=eβ−1, that is,eβ−1 is a critical point ofψ. Sincelimα→+∞ψ(α) = +∞, we haveinfα∈R++(βα−ψ(α)) =−∞. Thus,supα∈R++(βα−ψ(α)) is attained byeβ−1. Moreover, one can check that h(eβ−1) =eβ−1. Since ψ(0) = 0, we have that eβ−1 >0 = 0β−ψ(0). Finally, noting that βα−ψ(α) =−∞ for α ∈ Rwith α <0 finishes the proof of (3.6).

Therefore, for everyy ∈Rd R(y) = sup

z∈Rd

yTz−R(z)

= sup

z∈Rd

X

i∈E

yizi− 1 ηψ(zi)

=

d

X

i=1

1 ηψ

(yi)

Thm. 3.4.3

=

d

X

i=1

1

ηψ(ηyi)(3.6)= 1 η

d

X

i=1

eηyi−1.