• Nenhum resultado encontrado

AnatoliIambartsevIME-USP OptimizationMethodsII.EMalgorithms.Exercises. 0

N/A
N/A
Protected

Academic year: 2022

Share "AnatoliIambartsevIME-USP OptimizationMethodsII.EMalgorithms.Exercises. 0"

Copied!
20
0
0

Texto

(1)

Optimization Methods II.

EM algorithms. Exercises.

Anatoli Iambartsev

IME-USP

(2)

[RC] Example 5.16.

A classic example of the EM algorithm is a genetics problem [DLR] where observations (x

1

, x

2

, x

3

, x

4

) are gathered from the multinomial distribution

(x

1

, x

2

, x

3

, x

4

) ∼ M

n; 1

2 + θ 4 , 1

4 (1 − θ), 1

4 (1 − θ), θ 4

with n = x

1

+ x

2

+ x

3

+ x

4

. Thus the observed likeli- hood

L(θ | x

1

, x

2

, x

3

, x

4

) ∝ (2 + θ)

x1

θ

x4

(1 − θ)

x2+x3

.

(3)

[RC] Example 5.16.

Estimation is easier if the x

1

cell is split into two cells, so we create the augmented model

(z

1

, z

2

, x

2

, x

3

, x

4

) ∼ M

n; 1 2 , θ

4 , 1

4 (1 − θ), 1

4 (1 − θ), θ 4

,

with x

1

= z

1

+ z

2

. Thus the complete likelihood L

c

(θ | z

1

, z

2

, x

2

, x

3

, x

4

) ∝ θ

z2+x4

(1 − θ)

x2+x3

. Note that

Z

2

| x ∼ B

x

1

, θ θ + 2

and E

θ

(Z

2

| x) = θ

θ + 2 x

1

.

(4)

[RC] Example 5.16.

The expected complete log-likelihood function is E

θ0

(Z

2

+ x

4

) log θ + (x

2

+ x

3

) log(1 − θ)

=

θ

0

θ

0

+ 2 x

1

+ x

4

log θ + (x

2

+ x

3

) log(1 − θ), which can easily be maximized in θ, leading to the EM step

θ ˆ

1

=

n θ

0

x

1

2 + θ

0

+ x

4

o /

n θ

0

x

1

2 + θ

0

+ x

2

+ x

3

+ x

4

o

.

(5)

[RC] Example 5.16.

A Monte Carlo EM solution would replace the expectation θ0x1/(2+

θ0) with the empirical average

¯zm = 1 m

m

X

i=1

zi,

where thezis are simulated from a binomial distributionB(x1, θ0/(2+

θ0)), or, equivalently, by

m¯zm ∼ B(mx1, θ0/(2 +θ0)).

The MCEM step would then be ˆˆ

θ = ¯zm +x4

m+x2 +x3 +x4

→ θ,ˆ when m grows to infinity.

This example is merely a formal illustration of the Monte Carlo EM algorithm and its convergence properties since EM can be applied.

(6)

[RC] Example 5.17. The next example, however, details a sit- uation in which the E-step is too complicated to be implemented and where the Monte Carlo EM algorithm provides a realistic (if not straightforward) alternative.

A simple random effect logit model processed in Booth and Hobert (1999) represents observationsyij(i = 1, ..., n, j = 1, ..., m) as distributed conditionally on one covariate xij as a logit model

P(yij = 1 | xij, ui, β) = exp(βxij +ui) 1 + exp(βxij +ui),

where ui ∼ N(0, σ2) is an unobserved random effect. The vec- tor of random effects (U1, ..., Un) therefore corresponds to the missing data Z. When considering Q(θ0 | θ,x,y), with θ = (β, σ) Q(θ0 | θ,x,y) =

X

i,j

yijE β0xij +Ui | β, σ,x,y

X

i,j

yijE log(1 + exp(β0xij +Ui)) | β, σ,x,y

X

i

E Ui2 | β, σ,x,y

/2(σ0)2 −nlogσ0,

(7)

[RC] Example 5.17. When considering Q(θ0 | θ,x,y), with θ = (β, σ)

Q(θ0 | θ,x,y) =

X

i,j

yijE β0xij +Ui | β, σ,x,y

X

i,j

yijE log(1 + exp(β0xij +Ui)) | β, σ,x,y

X

i

E Ui2 | β, σ,x,y

/2(σ0)2 −nlogσ0,

it is impossible to compute the expectations in Ui. Were those available, the M-step would then be almost straightforward since maximizing Q(θ0 | θ,x,y) in σ0 leads to

0)2 = 1 n

X

i

E Ui2 | β, σ,x,y

...(= σ2...?).

maximizing Q(θ0 | θ,x,y) in β0 produces the fixed-point equation

X

i,j

yijxij =

X

i,j

xijE

exp(β0x

ij +Ui) 1 + exp(β0xij +Ui)

β, σ,x,y

which is not particularly easy to solve in β.

(8)

[RC] Example 5.17.

The alternative to EM is therefore to simulate the Ui’s condi- tional on β, σ,x,y in order to replace the expectations above with Monte Carlo approximations. While a direct simulation from

π(ui | β, σ,x,y) ∝ exp

n P

j yijui −u2i/2σ2

o Q

j 1 + exp(βxij +ui)

...(...?)

is feasible ([BH] Booth and Hobert, 1999), it requires some preliminary tuning better avoided at this stage, and it is thus easier to implement an MCMC version of the simulation of the ui’s toward the approximations of both expectations.

(9)

[FZ] First Exercise.

“Suppose that the lifetime of litebulbs follows an ex- ponential distribution with unknown mean θ. A total of M + N litebulbs are tested in two independent ex- periments. In the first experiment, with N bulbs, the exact lifetime y

1

, . . . , y

N

are recorded. In the second experiment, the experimenter enters the laboratory at some time t > 0, and all she registers is that some of the M litebulbs are still burning, while the others have expired. Thus, the results from the second ex- periment are right- or left-censored, and the available data are indicators E

1

, . . . , E

M

E

i

=

1, if the bulb i is still burning,

0, if light is out.

(10)

[FZ] First Exercise.

The observed data from both the experiments combined denote y = (y1, . . . , yN, E1, . . . , EM)

and the unobserved data is

X = (X1, . . . , XM).

The complete log-likelihood is

`c(θ;y, X) = log

Y

N

i=1

exp(−yi/θ) θ

M

Y

i=1

exp(−Xi/θ) θ

= −N lnθ + ¯y/θ

M

X

i=1

lnθ +Xi

,

which is linear in the unobserved Xi. But E(Xi | y) = E(Xi | Ei) =

t+θ, if Ei = 1, θ− 1−exp(−t/θ)texp(−t/θ) , if Ei = 0.

(11)

[FZ] First Exercise.

The E-step consists of replacing Xi by its expected value E(Xi | y) using the current value θt. Denote Z =

P

M

i=1Zi. Thus Q(θ | θt) = E`c(θ;y, X) = −N lnθ+ ¯y/θ

M

X

i=1

lnθ +E(Xi | Ei)/θ

= −(N +M) lnθ − 1 θ

N¯y+Z(t+θt) + (M −Z)

θt− texp(−t/θt) 1−exp(−t/θt)

.

The M-step yields

θt+1 = F(θt) = arg max

θ

Q(θ | θt)

= 1

N +M

¯y+Z(t+θt) + (M −Z)

θt− texp(−t/θt) 1−exp(−t/θt)

(12)

[FZ] First Exercise.

“The self-consistency equation θ = F (θ) has no ex- plicit solution unless Z = M (i.e., all litebulds in the second experiment are still on at time t); in this case, we obtain the well-known solution

θ ˆ = N y ¯ + M t

N .

00

(13)

[FZ] Second Exercise.

“Contrary to litebulbs, lifetime of havybulbs follow a uniform distribution in the interval (0, θ], where θ is unknown. Suppose the same experiments are per- formed as in the first exercise, and again the second experimenter registers only that Z out of M havybulbs are still burning at time t, while M − Z have expired.

... We know that for (hypothetical) complete data,

the MLE would be max{Y

max

, X

max

}, where Y

max

is

the largest of the observed lifetimes, and X

max

is the

largest of the unobserved lifetimes.”

(14)

[FZ] Second Exercise.

“Assume for simplicity that Z ≥ 1, so that we are sure that θ ≥ t.

Then

E(Xi | Ei) =

1

2(t+θ), if Ei = 1,

1

2t, if Ei = 0,

Thus, following the “rule” (substitute the Xi by its expectation in maximum likelihood estimator) we obtain

θt+1 = F(θt) ≡ max

n

Ymax, 1

2(t+θt)

o

.

“Starting with some θ0 > 0, iterations will converge to the so- lution ˆθ = max{Ymax, t}, and this conclusion may be obtained easily by noticing that the self-consistency equation θ = F(θ) is solved by ˆθ.

The main advantage of this solution is its simplicity. Its main disadvantage is that it is wrong.”

(15)

[FZ] Second Exercise.

The joint likelihood function for the observed data is L(θ) = θ

−N

1

[Ymax,∞)

(θ)

t max(t, θ)

M−Z

1 − t

max(t, θ)

Z

.

Note that if Z = 0, then

L(θ) = θ

−N

1

[Ymax,∞)

(θ)

t

max(t, θ)

M

,

“which is decreasing for θ ≥ Y

max

, and therefore the

maximum likelihood estimator is ˆ θ = Y

max

.

(16)

[FZ] Second Exercise. The joint likelihood function for the observed data is

L(θ) = θ−N1[Ymax,∞)(θ)

t

max(t, θ)

M−Z

1− t

max(t, θ)

Z

.

Note that if Z ≥ 1, then θ ≥ t, and L(θ) = θ−N1[Ymax,∞)(θ)

t

θ

M−Z

1− t θ

Z

= tM−Z1[Ymax,∞)(θ)θ−(N+M)(θ−t)Z.

For θ ≥ t the function θ(N+M)(θ − t)Z has a unique maximum in ¯θ = N+M

N+M−Zt and is monotonically decreasing for θ ≥ ¯θ. Thus summarizing the results the likelihood function estimator is

ˆθ =

n

¯

θ, if ¯θ > Ymax and Z ≥ 1,

Ymax, otherwise. (ˆθ = max{Ymax, t})

(17)

[FZ] Second Exercise.

“Why is the solution given by the EM algorithm wrong?

The answer is simple: the EM algorithm in not ap-

plicable because the log-likelihood function does not

exist for all θ > 0, which means that its expected

value is not defined.”

(18)

[FZ] Second Exercise.

“Indeed, assume that one heavybulb has survived time t, and let Xm be its (unobserved) lifetime. The unconditioned distribution of Xm is U[0, θ]. In E-step we need to find Q(θ | θt). The conditional expectation of Xm is calculated conditioning on the eventXm > t and using θt as a parameter, thus Xm|Y has uniform U[t, θt] distribution. Now, for all θ < θt the unconditioned density of Xm

fθ(x) =

n

1/θ, if 0 ≤ x ≤ θ, 0, elsewhere.

takes value 0 with positive probability, and hence Q(θ | θt) does not exist for θ < θt. This could be seen from the observed data likelihood function, but in the rush of applying the EM algorithm, it is easy to skip this check.”

(19)

[H] About Second Exercise of [FZ].

Let the EM algorithm start at some θ0 > max{Ymax, t}. In [H] it is shown that “the EM algorithm in this example converges to θ0 – in other words, it never goes anywhere once initialized!”

Q(θ | θ0) =

n

−(N +M) logθ, if θ ≥ θ0,

−∞, if 0 < θ < θ0.

“Since Q(θ | θ0) is strictly decreasing on [θ0,∞) and strictly less than Q(θ0 | θ0) on (0, θ0), setting θ1 equal to the maximizer of Q(θ | θ0) gives θ1 = θ0. By induction, this EM algorithm is forever stuck at the initial value.”

(20)

References.

[DLR] Dempster, A.P., Laird, N.M., and Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm, J.Roy.

Statist. Soc. Ser. B, 39, 1-38, 1977.

[RC ] Cristian P. Robert and George Casella. Introducing Monte Carlo Methods with R. Series “Use R!”. Springer

[BH] Booth J.G. and Hobert J.P. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J.R. Statist. Soc. B. 61, Part1, pp. 265-285, 1999.

[FZ] Flury, Bernard and Zopp´e, Alice. Exercises in EM. The American Statistician, Vol. 54, No.3, August 2000.

[H] Hunter, David R. On the Geometry of EM algorithms. Tech- nical Report 0303, Penn State University, 2003.

Referências

Documentos relacionados

Ao Dr Oliver Duenisch pelos contatos feitos e orientação de língua estrangeira Ao Dr Agenor Maccari pela ajuda na viabilização da área do experimento de campo Ao Dr Rudi Arno

Neste trabalho o objetivo central foi a ampliação e adequação do procedimento e programa computacional baseado no programa comercial MSC.PATRAN, para a geração automática de modelos

Ousasse apontar algumas hipóteses para a solução desse problema público a partir do exposto dos autores usados como base para fundamentação teórica, da análise dos dados

i) A condutividade da matriz vítrea diminui com o aumento do tempo de tratamento térmico (Fig.. 241 pequena quantidade de cristais existentes na amostra já provoca um efeito

Peça de mão de alta rotação pneumática com sistema Push Button (botão para remoção de broca), podendo apresentar passagem dupla de ar e acoplamento para engate rápido

IMPLEMENTATION OF THE MODEL The main goal of the implementation that has been made (CADS system [9]) is to allow the realization of concrete agents able to use the knowledge

Extinction with social support is blocked by the protein synthesis inhibitors anisomycin and rapamycin and by the inhibitor of gene expression 5,6-dichloro-1- β-

Os controlos à importação de géneros alimentícios de origem não animal abrangem vários aspetos da legislação em matéria de géneros alimentícios, nomeadamente