• Nenhum resultado encontrado

In this section we describe theonline convex optimization setting, which may be seen as a special case of the online learning setting. Let us first describe the setting in an intuitive way, leaving the formalization for later. Recall from Section 1.1 that, throughout the text, Edenotes an arbitrary euclidean space (finite-dimensional real vector space equipped with an inner product), and we denote its inner product by h·,·i.

Similarly to the online learning setting, the OCO framework is a game played in rounds by a player and its enemy. At roundt, the player picks a point xt from a convex set X ⊆E, and the enemy picks, simultaneously, a convex function14ft:E→(−∞,+∞] from some setF. At the end of the round, the player suffers the lossft(xt). Similarly to the online learning setting, at roundtthe player knows the previous functionsf1, . . . , ft−1∈ F played by the enemy, and the enemy knows the previous pointsx1, . . . , xt−1 ∈X picked by the player. The goal of the player is to minimize, in some sense, the cumulative loss suffered along a sequence of T rounds. As one may already guess based on the results and discussions from Section 2.3, minimizing the raw cumulative loss is impossible in the case of adversarial enemy oracles. Thus, we shall define regret for OCO in a way analogous to the regret of the online learning setting. Let us now formalize this setting.

Definition 2.5.1 (Online (convex) optimization instance). An online optimization instanceis a pair (X,F) where X ⊆E is nonempty and F ⊆ (−∞,+∞]X is a set of functions such that15 X⊆domf for everyf ∈ F, and it is an online convex optimizaton (OCO) instanceifX and each f ∈ F are convex.

Let C:= (X,F) be an online optimization instance. We associate with C the function16 OCOC, which takes the following parameters:

• PLAYER : Seq(F)→X, which we call aplayer oracle;

• ENEMY : Seq(X)→ F, which we call an enemy oracle;

• T ∈N, which we call the number of roundsor iterations,

and outputs a point in Seq(X)×Seq(F). As in the case of online learning, we define the function OCOC in an iterative way in Algorithm 2.3. Fort∈N\ {0} we consider to be thet-th round the

13Recall that the latter sum is a random variable by taking(xt, yt)asω7→(Xt(ω), Yt(ω))for eacht[T], and we only fixedωhere to ease the discussion.

14We will use extended-real-valued functions, a convention justified in Chapter 3.

15We impose this condition on the effective domain of the functions since it would be mildly unfair to the player to make her suffer infinite loss in a single round.

16Although this function can be used for non-convex online optimization instances, we stick with the nameOCO since the convex case is our main focus, with sporadic mentions to online optimization in its general form.

Algorithm 2.3 Definition ofOCOC(PLAYER,ENEMY, T) Input:

(i) An OCO instance C,

(ii) player and enemy oracles for C denoted byPLAYERand ENEMY, respectively, and (iii) a numberT ∈Nof rounds.

Output: (x,f)∈XT × FT. fort= 1 to T do

xt←PLAYER hf1, . . . , ft−1i ft←ENEMY hx1, . . . , xt−1i return (x,f)

iteration of Algorithm 2.3 in which are defined thet-th elements of the sequence of points picked by the oracles. Even though this is intuitive in Algorithm 2.3, one may get confused later in the text when we define more complex algorithms and start talking about its actions on round t.

Definition 2.5.2(Regret for online convex optimization). LetC:= (X,F)be an online optimization instance and let T ∈N. Theregret ofx∈XT with respect to f ∈ FT and to a pointu∈Eis

Regret(x,f, u) :=

T

X

t=1

ft(xt)−ft(u) , and the regretof x∈XT w.r.t.f ∈ FT and to a setU ⊆Eis

Regret(x,f, U) := sup

u∈U

Regret(x,f, u).

Moreover, let PLAYER be a player oracle for C and define x0t := PLAYER(hf1, . . . , ft−1i) for each t∈[T]. Then, the regret ofPLAYERwith respect to f ∈ FT and tou∈Eis

Regret(PLAYER,f, u) := Regret(x0,f, u), and the regret ofPLAYERw.r.t. f ∈ FT and to a setU ⊆Eis

Regret(PLAYER,f, U) := Regret(x0,f, U).

Finally, let ENEMY be player and enemy oracle forC and define the pair of sequences (x00,f0) :=

OCOC(PLAYER,ENEMY, T). Then, theregret of PLAYERin T rounds w.r.t. ENEMY and to u∈E is

RegretCT(PLAYER,ENEMY, u) := Regret(x00,f0, u), and the regretof PLAYERinT rounds w.r.t. ENEMY and toU ⊆Eis

RegretCT(PLAYER,ENEMY, U) := Regret(x00,f0, U), where we omitC from the notation of regret when it is clear from context.

It is interesting to note that the regret for online optimization is computed comparing the loss of the player with that of fixed points, whereas in the case of online learning, regret is computed with respect to the loss of other functions (or hypotheses). Although this may seem arbitrary at first, the next theorem shows that the online optimization framework is a special case of the online learning setting with a nature oracle which is just a constant function. On account of this nature oracle, each hypothesis in the regret from online learning will evaluate to only one point. Thus, the regret for these online learning problems is exactly the regret defined here for online optimization instances.

Theorem 2.5.3. letC:= (X,F) be an online optimization instance and define the online learning instance P := ({0}, X,F, L), where17 L(x, f) := f(x) for every (x, f) ∈ X × F. Moreover, let PLAYEROCO and ENEMYOCO be player and enemy oracles for C, respectively, and let T ∈ N. Then, there are nature, player, and enemy oracles NATURE,PLAYEROL, andENEMYOL forP, respectively, such that

(0,x,f) = OLP(NATURE,PLAYEROL,ENEMYOL, T), (2.5) where (x,f) := OCOC(PLAYEROCO,ENEMYOCO, T) and 0 is a properly-sized sequence with all entries equal to 0. Additionally, for every u ∈ E we have Regret(PLAYEROCO,f, u) = Regret(0,PLAYEROL,f, hu, L), wherehu(0) :=u.

Proof. Define the nature, player, and enemy oracles NATURE, PLAYEROL, andENEMYOL forP by

NATURE(t) := 0 for every t∈N,

PLAYEROL(0,f) := PLAYEROCO(f) for every t∈N\ {0} and f ∈ Ft−1, and ENEMYOL(0,x) := ENEMYOCO(x) for every t∈N\ {0} and x∈Xt−1.

By the definition of these oracles and of the functions OLP andOCOC, it is clear that (2.5) holds.

Moreover, if xand yare as in (2.5), if u∈E, and if hu(0) :=u, then Regret(PLAYEROCO,f, u) =

T

X

t=1

(ft(xt)−ft(u)) =

T

X

t=1

(L(xt, ft)−L(u, ft))

=

T

X

t=1

(L(xt, ft)−L(hu(0), ft))

= Regret(0,PLAYEROL,f, hu, L).

The above result formally proves what we had commented earlier: online (convex) optimization is a special case of online learning. Still, what we want to do is to model problems from online learning into the onlineconvex optimization framework. The reason is that, as we are going to see in later chapters, there are player oracles for online convex optimization instances which, under some mild assumptions, have regret upper bounds which grow sublinearly with the number of rounds.

Some problems from the online learning setting fit almost seamlessly into the online optimization setting. For example, the next proposition shows how to model online linear regression as an online optimization instance. Not only that, one may note that if the loss function Lin the proposition is convex w.r.t. its first argument (that is, L(·, α) is convex for any α∈R), the online optimization instance given is actually an OCO instance. Later, we will see one reduction of a problem from OL to OCO where convexity is essential.

Proposition 2.5.4. Let P := (Rd,R,R, L) be an instance of online linear regression and let C:= (Rd,F) be a online optimization instance where

F :={w∈Rd7→L(wTx, y) : x∈Rd, y∈R}.

17One may note that there are many instances of online learning in which the loss function only evaluates one of the arguments at the other, which is the case here.

Finally, letPLAYEROCObe a player oracle forC, letW ⊆Rd, and setH:={x∈Rd7→wTx:w∈W}.

Then, there exists a player oracle PLAYEROL for P such that, for any T ∈ N and any se-quences x ∈ (Rd)T and y ∈ (R)T, there is f ∈ FT such that Regret(x,PLAYEROL,y,H) = Regret(PLAYEROCO,f, W).

Proof. For everyx∈Rd andy ∈R, define the functionf(x,y):Rd→Rbyf(x,y)(w) :=L(wTx, y)for everyw∈Rd. Moreover, define the player oraclePLAYEROL for P by

PLAYEROL x,y

:= PLAYEROCO hf(x1,y1), . . . , f(xT−1,yT−1)iT

xT, for every T ∈N, and all sequences x∈(Rd)T and y∈(R)T.

Sethw(x) :=wTx for eachx∈Rd. Let T ∈N, let x∈(Rd)T, let y∈RT, and define dt:= PLAYEROL(x1:t,y1:t−1) for eacht∈[T],

ROL:= Regret(x,PLAYEROL,y, hw, L),

ft:=f(xt,yt) for eacht∈[T], wt:= PLAYEROCO(f1:t−1) for eacht∈[T].

Letw∈Rd. In this case, we have, ROL=

T

X

t=1

L(dt, yt)−

T

X

t=1

L(hw(xt), yt) =

T

X

t=1

L(wtTxt, yt)−

T

X

t=1

L(wTxt, yt)

=

T

X

t=1

f(xt,yt)(wt)−

T

X

t=1

f(xt,yt)(w) = Regret(PLAYEROCO,f(x,y), w).

Let us look at one final example of an online learning problem which can be easily modeled as an online convex optimization problem. One may note that in this case convexity is fundamental for the reduction to yield an interesting relation between the regret of both instances. Consider an instance of the prediction with expert advice problem P := (AE, A, Y, L)such thatA is convex and L is convex w.r.t. its first argument. This case is interesting because the player can pick a convex combination of the experts’ advice and still have some information about the loss incurred by this point. Without any structure on A, the player is virtually forced to decide to follow only one of the experts at each round (unless the player has some kind of prior information about the game), and the enemy can exploit this fact, as we have seen earlier in the impossibility results. The next proposition shows that player oracles to a closely related online convex optimization problem yield player oracles for this convex version of the prediction with expert advice problem. Recall from Section 1.1 that ifE is a finite set, then ∆E :={p∈[0,1]E :1Tp= 1} denotes the simplex on the spaceRE.

Proposition 2.5.5. LetP := (AE, A, Y, L)be an instance of prediction with expert advice such that A⊆Eis a convex set and L:A×Y →Ris convex w.r.t. its first argument18, and let C:= (∆E,F) be an OCO problem where

F :={p∈RE 7→pTc:c∈[−1,1]E}.

Finally, letPLAYEROCO be a player oracle forC, let U := {ei ∈ {0,1}E :i∈E}, and define the hypothesis setH:={x∈AE 7→x(i) : i∈E}. Then, there exists a player oracle PLAYEROL forP such that, for any T ∈N and any sequences x∈(AE)T and y ∈YT, there is f ∈ FT such that Regret(x,PLAYEROL,y,H)≤Regret(PLAYEROCO,f, U).

18That is,L(·, y)is convex for anyyY.

Proof. For everyx∈AE andy ∈Y, definec(x, y)∈[−1,1]E by (c(x, y))e:=L(x(e), y), ∀e∈E.

Define the player oracle PLAYEROL forP given for everyT ∈N,x∈XT, and y∈YT−1 by PLAYEROL(x,y) := PLAYEROCO(f0)TxT, wheref0 ∈ FT−1 is given by

ft0(z) :=c(xt, yt)Tz for eachz∈RE andt∈ {1, . . . , T−1}.

Let T ∈ N, and let both x∈ (AE)T and y ∈ YT be arbitrary sequences of length T. Moreover, define

dt:= PLAYEROL(x1:t,y1:t−1), ∀t∈[T],

ft(z) :=c(xt, yt)Tz ∀z∈RE,∀t∈[T], zt:= PLAYEROCO(hf1, . . . , ft−1i) ∀z∈RE,∀t∈[T].

Finally, leti ∈E, defineh(x) :=x(i)for everyx∈AE, and setROL:= Regret(x,PLAYEROL,y, h).

Then, ROL=

T

X

t=1

[L(dt, yt)−L(xt(i), yt)]

=

T

X

t=1

[L(zTtxt, yt)−L(xt(i), yt)] by the def. of PLAYEROL

T

X

t=1

"

X

e∈E

zt(e)L(xt(e), yt)

−L(xt(i), yt)

#

by the convexity of L(·, yt)

=

T

X

t=1

[c(xt, yt)Tzt−c(xt, yt)Tei] by the def. ofc(xt, yt)

=

T

X

t=1

[ft(zt)−ft(ei)] = Regret(PLAYEROCO,f, ei) by the def. offt.