• Nenhum resultado encontrado

6.2 The max-min approach

6.2.2 Analysis

The goal ofMINis to find the best essential infimum of the arms, where the essential infimum is defined as follows.

Definition 6.2.1. Letνbe a probability distribution and Xνa real random variable.

Theessential infimumaνofνis defined by aνdef= max

a∈R {P(X <a)=0}

Let us make the mild assumption that distributionνsatisfies Equation6.2, as illus- trated in Fig.6.1. Then, the empirical min taken over a uniform sampling according toν, converges exponentially fast toward the essential infimum.

Lemma 6.2.1. Letνbe a bounded distribution with support in[0, 1], with a its essential infimum, and assume thatνsatisfies:

A>0,∀ε>0,P(X Éa+ε with Xν (6.2) Let x1. . .xt be a t-sample independently drawn afterν. Then, the minimum value over xu,u=1 . . .t goes exponentially fast to a:

P( min

1ÉuÉtxuÊa+ε)Éexp(−t Aε) (6.3) Proof. As thexuare iid, it comes:

P( min

1ÉuÉtxuÊa+ε) = P(∀u∈{1, . . . ,t},xuÊa+ε)

=

t

Y

u=1

P(xuÊa+ε)É(1−)t Éexp(−t Aε) where the last inequality follows from (1−zexp(−z).

0

0

1

Figure 6.1: Illustration of an example of distribution satisfying the assumption of Equation6.2.

The assumption supporting the above result (Equation6.2, illustrated in Figure6.1) does not require the positive constant A to be known; it only requires that there is enough probability mass in the neighborhood of a. Equation6.3confirms the exponential convergence towardaas a function ofA.

A surprising result is that, under this assumption, the convergence toward the mini- mum might be faster than the convergence toward the mean. Specifically, the Hoeffd- ing bound on the convergence toward the mean decreases exponentially like−2, whereas after Equation6.3the convergence toward the min decreases exponentially

6.2. The max-min approach

like−t Aε(asεis going to 0,>>ε2).

Under this assumption, it follows without difficulty that with high probability the empirical min of each arm is exponentially close to its essential infimum after each arm has been triedttimes.

Lemma 6.2.2. Letν1. . .νK denote K distributions with bounded support in[0, 1]with ai their essential infimum.

Assume thatνi satisfies Equation6.2for some constant A for i=1 . . .K .

Denoting xi,u, u=1 . . .t , i=1 . . .K , t samples independently drawn afterνi, one has:

P(∃i ∈{1, . . . ,K}, min

uÉtxi,uÊai+εKexp(−t Aε) (6.4)

Proof. After Lemma6.2.1,

P(∃i∈{1, . . . ,K}, min

uÉtxi,uÊai+ε) É 1−(1−(1−)t)K É K(1−)t

É Kexp(−t Aε)

Where the first inequality follows from (1−z)y≥1−y.z and the second inequality from (1−z)≤exp(−z), which concludes the proof.

Let us consider the two distinct goals of finding the arm with best expectation, and the arm with best essential infimum. If these goals are compatible (that is, the optimal arm in terms of min value also is the optimal arm in terms of mean value), then the MIN algorithm achieves a logarithmic regret under the above assumptions.

Proposition 6.2.1. Letν1. . .νK denote K distributions with bounded support in[0, 1]

withµi (resp. ai) their mean (resp. their essential infimum). Further assume that νi satisfies Equation 6.2for some constant A for i =1 . . .K , and that the arm with best mean valueµ?also is the arm with best min value a?. Letµ,i =µ?µi (resp.

a,i=a?ai) denote the mean-related (resp. essential infimum-related) margins.

Then, with probability at least1−δ, the cumulative pseudo-regret is upper bounded as follows:

Rt ÉK−1 A

µ,max

a,min

log µt K

δ

+(K−1)∆µ,max (6.5)

witha,min= min

i:∆a,i>0a,i andµ,max= max

i:∆µ,i>0µ,i.

Furthermore, the expectation of the cumulative pseudo-regret is upper-bounded as follows for t sufficiently large (tÊKA−1µa,min,max):

E[RtK−1 A

µ,max

a,min

µ log

µt2K A K−1

a,min

µ,max

¶ +1

+(K−1)∆µ,max (6.6)

Proof. Suppose that there exists a single optimal arm (this point will be discussed below). Taking inspiration from (Sani et al., 2012a), letxi,ube independent samples drawn afterνi, and define the eventE as follows:

E = n

i ∈{1, . . . ,K},∀s∈{1, . . .u} minxi,sai É ε u o

(6.7) The probability of the complementary eventEc is bounded after Lemma6.2.2:

P(Ec) = P(∃i∈{1, . . . ,K},∃u∈{1, . . . ,t}, min

1ÉsÉuxi,sai> ε u) É

t

X

u=1

P(∃i∈{1, . . . ,K}, min

1ÉsÉuxi,sai> ε u) É min(1,t Kexp(−))

Lett>1 be an iteration where a sub-optimal armi is selected; this implies that the empirical min of thei-th arm is higher than that of the best armi?:

uÉNmini?,t−1xi?,u< min

uÉNi,t−1

xi,u ⇔ min

uÉNi?,t−1xi?,uai

| {z }

Êai?ai=∆a,i

< min

uÉNi,t−1

xi,uai

| {z }

ÉNi,tε

1(?)

where (?) holds if t belongs to the event setE, thus with probability at least 1− t K exp(−) after Lemma6.2.2.

It follows that with probability at least 1−t K exp(−) ε

Ni,t−1Ê∆a,i henceNi,t É ε

a,i +1 sinceNi,tNi,t1+1.

With probability at least 1−t K exp(−), the cumulative regretRt can thus be upper- bounded:

Rt =

K

X

i=1

Ni,tµ,i É

K

X

i=1

( ε

a,i +1)∆µ,i (6.8)

É (K−1)

µ∆µ,max

a,mi nε+∆µ,max

with∆µ,max= max

1ÉiÉKµ,iand∆a,min= min

1ÉiÉKa,i

Finally, by settingδ=min(1,t Kexp(−)), it follows that with probability 1−δ, Rt ÉK−1

A

µ,max

a,mi n

log(t K

δ )+(K−1)∆µ,max (6.9) In the case where there existsk>1 optimal arms, Eq.6.9still holds, by replacingK−1

6.2. The max-min approach

factor withKk.

The expectation of the cumulative regret is similarly upper-bounded:

E[Rt] = E[RtIE]+E[RtIEc] É K−1

A

µ,max

a,mi n

log(t K

δ )+(K−1)∆µ,max+δtby boundingRt bytoverEC.

Fortsufficiently large (tÊKA1µa,mi n,max), by settingδ=Kt A1µa,mi n,max, it comes:

E[RtK−1 A

µ,max

a,mi n

µ log

µ t2K A (K−1)

a,min

µ,max

¶ +1

+(K−1)∆µ,max (6.10) which concludes the proof.

Remark 6.2.1. This result can be compared to the regret bound derived for theUCB algorithm, similarly achieving a logarithmic regret (Auer et al., 2002):

E[Rt]É8 X

i6=i?

logt

µ,i +(1+π2 3 )

XK i=1

µ,i (6.11)

where i?stands for the index of the optimal arm. MINandUCBthus both achieve a logarithmic regret uniformly over t , where the regret rate involves the mean-related margin inUCB(resp. the min-related margin inMIN, multiplied by the constant A).

A stronger result can be obtained forMIN, under an additional assumption on the lower tails of the arm distributions.

Proposition 6.2.2. With same notations and assumptions as in Prop. 6.2.1, let us further assume that for every i=1 . . .K,∆µ,i=µ?µi Éa?ai=∆a,i.

Then, with probability at least1−δ, Rt ÉK−1

A log(t K

δ )+(K−1)∆µ,max

withµ,max=max

iµ,i.

Furthermore, if t>KA1, the expectation of Rt is upper-bounded as follows:

E[RtK−1 A

µ log

µt2K A K−1

¶ +1

+(K−1)∆µ,max (6.12) Proof. The proof closely follows the one of Prop.6.2.1, noting that in Eq. 6.8∆a,i is now greater than∆µ,i. Settingδ=(Kt A−1)concludes the proof of Eq.6.12.

Discussion. The comparison of UCB and MIN only makes sense when the two goals are the same, naturally, that is, the same arm is optimal in terms of expectation and in terms of essential infimum. When it is the case, Eq.6.12and Eq. 6.11suggests that MIN might outperform UCB when: i) margins∆µ,i are small, ii) distributionsνi are not too thin in the neighborhood of the essential infimum (that is, Ais not too small), and iii) the assumption∆a,iÊ∆µ,i holds.

Note that the latter assumption boils down to considering that better arms (in the sense of their mean) also have a narrower support for their lower tail, thus a lower risk. If this assumption does not hold however, then risk minimization and regret minimization are likely to be conflicting objectives.

A last remark is that the assumptions done (lower bounded distribution density in the neighborhood of the essential minimum and mean-related margin greater than the minimum-related margin) yield a significant improvement compared to the continu- ous distribution-free case, where the optimal regret is known to beO(p

t) (Audibert and Bubeck, 2009, 2010).