• Nenhum resultado encontrado

On the Euclidean space E, norms are functions which assign non-negative “lengths” or “sizes” to each point inE, assigning length zero only to the zero vector. In the case of Rd, one is most used with the euclidean norm

x∈Rd7→Xd

i=1

x2i12 .

In this text we will be interested in cases where we use a bit less standard norms. Thus, let us define what properties a function needs to satisfy to be a norm and then let us see some concepts and results related to convex analysis duality applied to norms.

Let k·k:E→R. We say that k·kis a normon Eif, for all u, v∈E, (i) kvk ≥0, and equality holds if and only ifv= 0,

(ii) kαvk=|α|kvk for every α∈R,

(iii) ku+vk ≤ kuk+kvk, also known astriangle inequality.

If k·k satisfies only conditions (ii) and (iii) andkvk ≥ 0 for any v ∈E, then k·k is a semi-norm onE. Condition (ii) and the non-negativity imply that semi-norms and norms are convex functions.

Moreover, one may verify that ifk·k is a norm onE, then it is a continuous3 function on E.

3The definition of continuous function itself (at least at first sight) depends on a norm, so say that all norms are continuous without defining continuity is a somewhat circular statement. Thus, we use the following definition of continuity: a functionf:E[−∞,+∞]iscontinuousat a pointx¯Eif for everyε >0there isδ >0such that, for anyxEwithxx,x¯xi ≤δ, we have|f(¯x)f(x)| ≤ε. That is, we use the (squared) euclidean norm inE to define continuity.

Theeuclidean normor`2-normonEis the normk·k2given bykxk2 :=p

hx, xifor everyx∈E. Additionally, throughout the text we may use some special and known norms for Rd:

• the`1-normgiven by kxk1 :=Pd

i=1|xi|for every x∈Rd,

• the`-norm given bykxk:= maxi∈[d]|xi|for every x∈Rd,

• the`p-norm forp∈(1,∞) given by kxkp :=Xd

i=1

|xi|p1

p, ∀x∈Rd.

As we have already said, our main focus in this chapter is to define and gain intuition about duality relations and concepts in convex analysis. One very interesting dual object related to norms are the dual norms. If k·kis a norm on E, thedual norm ofk·k is the norm4 k·k on Edefined by

kxk := max{ hx, xi:x∈E,kxk ≤1}, ∀x ∈E.

At first sight, the definition of the dual norm k·k of a normk·k on Ehardly has any intuitive meaning. There are two ways of looking atk·k which may be helpful in gaining some intuition. One way is to see dual norms as special cases of support functions. Set Bk·k :={x∈E:kxk ≤1}, that is, Bk·k is theunit ball (w.r.t. the norm k·k). Then, by the definition of conjugate function we have

kxk(x|Bk·k), ∀x∈E.

This way of seeing the dual norm as the support function of the unit ball of the original norm may be useful in some cases, but it is arguably yet too abstract. A more concrete way of seeing norms is as norms in the space of linear functionals onE, that is, linear functions fromE toR. Let x ∈E and fix a normk·kon E. The pointx can be seen as representing the linear functionalTx

given byTx(x) :=hx, xi for every x∈E. Given that we have the normk·k to measure the sizes of elements inE, it would be interesting to have a related way to measure the “size” ofTx. Intuitively, we want a measure such that the bigger the norm of Tx(x) when compared to the norm ofx∈E, the bigger is the size of Tx. That is, we want to measure how muchTx stretches the vectors when we measure lengths withk·k. Of course, for distinct non-zero vectorsx, y∈Ethe ratiosTx(x)/kxk andTx(y)/kyk may differ. Thus, we measure a linear functional by the direction which it stretches the most, that is,

sup

x∈E\{0}

Tx(x)

kxk = sup

x∈E\{0}

hx, xi

kxk = sup

x∈E:kxk≤1

hx, xi=kxk.

Let us look at some properties and interesting special cases of dual norms. One interesting fact that we will use repeatedly during the remainder of the text, usually without reference, is that the

`2-norm onE isself-dual, that is, we have(k·k2) =k·k2. Lemma 3.8.1. The dual norm of k·k2 on Eis k·k2.

Proof. Let k·k2,∗ be the norm dual to k·k2. By the Cauchy–Schwarz inequality we have, for any x ∈E\ {0},

kxk2,∗ = max{x∗Tx:x∈E,kxk2≤1} ≤max{ kxk2kxk2:x∈E,kxk2≤1}=kxk2, and since the above inequality holds as an equation for x:=kxk−12 x, we have k·k2,∗=k·k2.

4We skip the proof that the dual norm is indeed a norm for the sake of conciseness.

As expected, we show in the next theorem that the dual norm of a dual norm is the original norm. Maybe more interestingly, the proof follows relatively easily when we use the results about Fenchel conjugates, mainly the fact that the conjugate of the conjugate of a closed function if the function itself. Since norms are continuous, and thus closed, functions, we are guaranteed that the conjugate of the conjugate of (the square of) a norm is the norm itself.

Theorem 3.8.2. Letk·kbe a norm on E. Then(12k·k2)= 12k·k2. In particular,k·k∗∗=k·k.

Proof. Letx ∈E. Note that

1 2k·k2

(x) = sup

x∈E

hx, xi −12kxk2

≤sup

x∈E

kxkkxk −12kxk2

= 12kxk2, (3.8) where in the last inequality we used that supα∈R(αkxk−α2/2) is attained bykxk. Lety¯∈E attain max{ hx, xi:x∈E,kxk ≤1}=kxk, and setx¯:=kxky. We have¯

hx,xi −¯ 12k¯xk2 =kxkhx,yi −¯ 12kxk2k¯yk2=kxk212kxk2kyk¯ 2 = 12kxk2.

Hence, (3.8) holds as an equation. Finally, since 12k·k2 is continuous (and, thus, closed), by what we have just proved and by Theorem 3.4.2 we have

1

2k·k2∗∗= (12k·k2)= (12k·k2)∗∗= 12k·k2, that is,k·k∗∗=k·k.

One result that we will use extensively in this text is the fact that `1 and `-norms are dual to each other.

Lemma 3.8.3. The dual norm of k·k1 on Rd isk·k.

Proof. Letx ∈Rd and letx∈Rd such thatkxk1≤1. We have (x)Tx=

d

X

i=1

xixi

d

X

i=1

|xi||xi| ≤ kxk

d

X

i=1

|xi| ≤ kxk.

Since the above chain of inequalities holds as an equation forx:=|xi|ei, wherei ∈arg maxi∈[d]|xi|, we are done.

When we start to look at regret bounds for OCO algorithms, most of them will depend on the norms of the subgradients of the functions used by the enemy. Thus, one may already imagine that if the player is able to have any control on the norm which measures the sizes of the subgradients, she could pick a norm under which the enemy functions’ subgradients have small norm. Still, to make such a choice, the players needs to have some information on the functions the enemy is allowed to pick. In optimization problems one usually assumes that the functions which one has to handle are Lipschitz continuous, that is, the functions cannot change too much between points which are close to each other (w.r.t. to some fixed norm). For differentiable functions, Lipschitz continuity means that the derivative in any direction is bounded by a constant. Interestingly, Lipschitz continuity and the (dual) norms of the subgradients of the function are deeply connected. Before proving this result, let us define Lipschitz continuity.

Let ρ >0. A functionf:E→(−∞,+∞] isρ-Lipschitz continuouson a set X⊆domf w.r.t.

a normk·kon Eif

|f(x)−f(y)| ≤ρkx−yk, ∀x, y∈X, and whenX is not explicitly stated, assume X= domf.

Theorem 3.8.4 (Based on [67, Lemma 2.6]). LetX ⊆E be a convex set with nonempty interior, and letf:E→(−∞,+∞]be a proper closed convex function which is ρ-Lipschitz continuous onX w.r.t. a norm k·kon E. Then, for every x∈X there is g∈∂f(x)such that kgk ≤ρ. Additionally, for every x∈intX we have∂f(x)⊆ {g∈E:kgk ≤ρ}.

Proof. First, let us show that

∅6=∂f(˚x)⊆ {y∈E:kyk≤ρ}, ∀˚x∈intX. (3.9) Let ˚x ∈ intX and let u ∈ ∂f(˚x), which exists by Theorem 3.5.1 since X ⊆ domf and, thus, intX ⊆int(domf). Moreover, since the setB:={v∈E:kvk ≤1}is compact5,supv∈Bhv, ui=kuk is attained. Let

y∈˚x+ arg max

v∈B

hu, vi. (3.10)

Additionally, for every λ∈[0,1]definezλ:= ˚x+λ(y−˚x). Since˚x∈intX, there is ε >0 such that zε∈X. Therefore,

εkuk (3.10)= εhu, y−˚xi=hu, zε−˚xi ≤f(zε)−f(˚x)≤ρkzε−˚xk=ρεky−˚xk(3.10)≤ ρε, where in the first inequality we just used the subgradient inequality. Hence,kuk≤ρ. This completes the proof of (3.9).

Let x¯∈X, let˚x∈intX, and define xk:= ˚x+

1− 1 k+ 1

(¯x−˚x), ∀k∈N.

By Theorem 3.2.1, we have xk ∈intX for every k∈ N. Thus, by (3.9), for every k ∈N there is uk ∈∂f(xk)withkukk ≤ρ. That is,{uk}k∈Nis a bounded sequence and therefore it has a convergent subsequence. Namely, there is an increasing injection π:N → Nsuch that limk→∞uπ(k) = ¯u for some u¯∈Ewith k¯uk ≤ρ. Moreover, sincef is closed, by Theorem 3.2.6 we have

k→∞lim f(xπ(k)) =f(¯x). (3.11)

Finally, by the subgradient inequality we have, for every k∈Nand z∈N, f(z)≥f(xπ(k)) +huπ(k), z−xπ(k)i.

Taking the limit for k tending to+∞ in the above inequality together with (3.11) (and since the inner product= is a continuous function) yields

f(z)≥f(¯x) +hu, z¯ −xi,¯ ∀z∈E, that is,u¯∈∂f(¯x).

On Chapter 6, we will look at (semi-)norms on Rd which have a special form: they are the

`2-norm skewed by a positive semi-definite matrix A∈Sd+. Formally, for every A∈Sd+, define the (semi-)norm induced by Aby

kxkA:=

xTAx, ∀x∈Rd.

The next lemma shows us that the functions that we have just defined are indeed semi-norms.

Moreover, it shows that in the case of positive definite matrices, the above functions are indeed norms and that the dual norm of a norm induced byA∈Sd++ is the norm induced by the inverse matrix A−1.

5Recall that any norm inEis a continuous function.

Lemma 3.8.5. LetA∈Sd+. Then k·kA is a semi-norm, and ifA0, thenk·kA is actually a norm whose dual norm isk·kA−1.

Proof. Since A 0, we have that by Proposition 1.1.4 A1/2 ∈ Sd+ exists and is unique. Thus, kvkA = kA1/2vk2 for any v ∈ Rd. Since v ∈ Rd 7→ A1/2v is a linear function, it is clear that v∈Rd7→ kA1/2vk2 is non-negative everywhere on Rd and that it satisfies properties (iii) and (ii) from the definition of norm. SupposeA0. ThenA1/2 is invertible, and for any v∈Rd we have A1/2v = 0 if and only if v = A−1/20 = 0. Thus, in this case, k·kA is a norm on Rd. Finally, by Proposition 1.1.4 we have (A−1)1/2 = A−1/2 = (A1/2)−1. With this, note that for every x ∈ Rd there is y := A1/2x ∈ Rd such that x = A−1/2y. Thus, Rd = {A−1/2y:y∈Rd}. Therefore, by Theorem 3.8.2 and since the `2-norm is dual to itself we have, for any x ∈Rd,

(12k·k2A)(x) = sup

x∈Rd

((x)Tx−12kxk2A) = sup

y∈Rd

(x)TA−1/2y−12kA−1/2yk2A

= sup

y∈Rd

(A−1/2x)Ty−12yT(A−1/2AA−1/2)y

= sup

y∈Rd

(A−1/2x)Ty−12yTy

= sup

y∈Rd

A−1/2xT

y−12kyk22

Thm3.8.2

= kA−1/2xk22=kxk2A−1. Thus, by Theorem 3.8.2 we conclude that the norm dual to k·kA isk·kA−1.

Finally, at some points of the text we shall need some norms for the space of real square matrices.

One of the better-known norms for matrices are the operator norms, which are based on norms forRd. In this text we shall restrict our attention only to the operator norm induced by the`2-norm.

Formally, the operator norm(induced by the `2-norm) ofA∈Rd×d is kAk2 := max{ kAxk2:x∈Rd,kxk2≤1}.

The next lemma shows useful connections between the operator norm of a matrix and its eigenvalues.

We skip its proof for the sake of conciseness.

Lemma 3.8.6 ([39, Example 5.6.6]). IfA∈Sd, thenkAk2 = max{|λ1(A)|,|λd(A)|}. In particular, if A0, thenkAk2≤Tr(A).