Probabilistic Forests An application to the rhythmic distinction between Brazilian and European Portuguese Antonio Galves Denis Lacerda Florencia Leonardi

(1)

Probabilistic Forests

An application to the rhythmic distinction between Brazilian and European Portuguese

Antonio Galves Denis Lacerda Florencia Leonardi

Instituto de Matemática e Estatística Universidade de São Paulo

28th November, 2005

(2)

Remembering VLMC

A VLMC is a stochastic chain(X₀,X₁, . . .)taking values on a finite alphabetAand characterized by two elements:

The set of all contexts.

A contextXn−`, . . . ,Xn−1is the finite portion of the past X₀, . . . ,X_n−1which is relevant to predict the next symbol X_n.

A family of transition probabilities associated to each context.

Given a context, its associated transition probability gives

(3)

Remembering VLMC

(4)

Remembering VLMC

(5)

Remembering VLMC

The set of all contexts is asuffix code. This means that no context is a suffix of another context.

For this reason this set can be represented as a tree. This tree with the associated transition probabilities is called a Probabilistic Tree.

(6)

Remembering VLMC

The set of all contexts is asuffix code. This means that no context is a suffix of another context.

For this reason this set can be represented as a tree. This tree with the associated transition probabilities is called a Probabilistic Tree.

(7)

Preliminary study

Variable Length Markov Chains (VLMC) have been used to discriminate rhythmic patterns in European and Brazilian Portuguese written texts (Galves et al. 2005).

The written texts where transformed into sequences over the alphabetA={0,1,2,3,4}.

The sequences where obtained using well defined rules taking into account the boundaries between prosodic words and the stressed syllabus.

This analysis suggests a typical contexts tree for each

(8)

Preliminary study

(9)

Preliminary study

(10)

Preliminary study

(11)

Preliminary study

The typical trees for BP and EP are:

This study showed significant patterns that had been

(12)

But it is also true that...

The preceding results show high variability.

The Context Algorithm used to estimate the trees is very sensitive to spurious strings.

(13)

But it is also true that...

The preceding results show high variability.

The Context Algorithm used to estimate the trees is very sensitive to spurious strings.

(14)

A new approach: probabilistic forests

A probabilistic forest is a mixture of probabilistic trees, that is A setΓof probabilistic trees.

A probability distribution over this set{p(τ)}_τ∈Γ.

(15)

A new approach: probabilistic forests

(16)

A new approach: probabilistic forests

(17)

How to define the set of trees Γ

The setΓdepends on the problem and can be any set with a reasonable number of trees.

For the linguistic problem, we choose the set of trees satisfying the algebraic restrictions given by the codification and with depth not bigger than 3.

Given a sample sequencex₁,x₂, . . . ,xn(or a set of sample sequences) the probability of each tree in the set is

estimated by the procedure followed in Eskinet al. (2002).

(18)

How to define the set of trees Γ

(19)

How to define the set of trees Γ

(20)

How to estimate the probability of each tree

Following Eskinet al. (2002):

First, propose apriorweightω⁰(τ).

Then, for each instant of timei update this weight by the formula:

ωⁱ(τ) =







ωⁱ⁻¹(τ)_|A|¹ , ifi<d(Γ)

ωⁱ⁻¹(τ)ˆPⁱ_τ(x_i|c_τ(x₀, . . . ,x_i−1)), ifn≥d(Γ) wherePˆⁱτ(x_i|c_τ(x₀, . . . ,xi−1))is theMaximum Likelihood Estimateof the transition probabilities inτ with the sample

(21)

How to estimate the probability of each tree

Following Eskinet al. (2002):

First, propose apriorweightω⁰(τ).

Then, for each instant of timei update this weight by the formula:

ωⁱ(τ) =







ωⁱ⁻¹(τ)_|A|¹ , ifi<d(Γ)

ωⁱ⁻¹(τ)ˆPⁱ_τ(x_i|c_τ(x₀, . . . ,x_i−1)), ifn≥d(Γ) wherePˆⁱτ(x_i|c_τ(x₀, . . . ,xi−1))is theMaximum Likelihood Estimateof the transition probabilities inτ with the sample

(22)

A closed formula for ω

ⁿ

(τ )

These weightsωⁿ(τ)can be rewritten as:

ωⁿ(τ) =ωⁿ⁻¹(τ)ˆPⁿ_τ(x_n|c_τ(x₀, . . . ,x_n−1))

=ω⁰(τ) 1

|A|^d(Γ)

n

Y

i=d(Γ)

Pˆⁱ_τ(x_i|c_τ(x₀, . . . ,x_i−1))

=ω⁰(τ) 1

|A|^d(Γ)

Y Y (N_n(s,a) +1)!

(N_n(s) +|A|)!

(23)

A closed formula for ω

ⁿ

(τ )

=ω⁰(τ) 1

|A|^d(Γ)

n

Y

i=d(Γ)

Pˆⁱ_τ(x_i|c_τ(x₀, . . . ,x_i−1))

=ω⁰(τ) 1

|A|^d(Γ)

Y Y (N_n(s,a) +1)!

(N_n(s) +|A|)!

(24)

A closed formula for ω

ⁿ

(τ )

=ω⁰(τ) 1

|A|^d(Γ)

n

Y

i=d(Γ)

Pˆⁱ_τ(x_i|c_τ(x₀, . . . ,x_i−1))

=ω⁰(τ) 1

|A|^d(Γ)

Y Y (N_n(s,a) +1)!

(N_n(s) +|A|)!

(25)

How to estimate the probability of each tree

Finally, we normalize the weights using the formula:

pⁿ(τ) = ωⁿ(τ) P

τ⁰∈Γωⁿ(τ⁰)

Therefore,pⁿ(τ)is the probability of treeτ estimated with the samplex₁,x₂, . . . ,xnby the preceding procedure.

(26)

Empirical results for EP

In the case of European Portuguese, we estimate the weights of all possible trees with depth less or equal 3. We used 31 texts of Portuguese authors.

n=148887.

ω⁰(τ) = (500n)^−t(τ), wheret(τ)is the number of terminal nodes ofτ.

pⁿ(τ) =0.99 for the following tree:

(27)

Empirical results for EP

n=148887.

(28)

Empirical results for EP

n=148887.

(29)

Empirical results for EP

n=148887.

(30)

Empirical results for EP

n=148887.

(31)

Empirical results for BP

In the case of Brazilian Portuguese, we also estimate the weights of all possible trees with depth less or equal 3. We used 35 texts of Brazilian authors.

n=619282.

(32)

Empirical results for BP

n=619282.

(33)

Empirical results for BP

n=619282.

(34)

Empirical results for BP

n=619282.

(35)

Empirical results for BP

n=619282.

(36)

Consistency Theorem

Theorem

Letτ be a probabilistic tree. Then for almost all sequences x₁,x₂, . . .generated byτ, we have that

pⁿ(τ)→1when n→ ∞.

(37)

Consistency Theorem

Idea of the proof

= X

τ⁰∈Γ

ωⁿ(τ⁰) ωⁿ(τ)

!−1

=



1+ X ωⁿ(τ⁰)

n





−1

(38)

Consistency Theorem

Idea of the proof

= X

τ⁰∈Γ

!−1

=



1+ X ωⁿ(τ⁰)

n





−1

(39)

Consistency Theorem

Idea of the proof

= X

τ⁰∈Γ

!−1

=



1+ X ωⁿ(τ⁰)

n





−1

(40)

Consistency Theorem

Idea of the proof

ωⁿ(τ) =ω⁰(τ) 1

|A|^d(Γ) Y

s∈τ

Y

a∈A

(N_n(s,a) +1)!

(Nn(s) +|A|)!

=ω⁰(τ) 1

|A|^d(Γ) Y

s∈τ

PˆKT,s(x₁ⁿ)

(41)

Consistency Theorem

Idea of the proof

ωⁿ(τ) =ω⁰(τ) 1

|A|^d(Γ) Y

s∈τ

Y

a∈A

(N_n(s,a) +1)!

(Nn(s) +|A|)!

=ω⁰(τ) 1

|A|^d(Γ) Y

s∈τ

PˆKT,s(x₁ⁿ)

(42)

Consistency Theorem

Idea of the proof

ωⁿ(τ⁰)

ωⁿ(τ) = ω⁰(τ⁰) ω⁰(τ)

Q

s⁰∈τ⁰PˆKT,s⁰(x₁ⁿ) Q

s∈τPˆKT,s(x₁ⁿ)

= ω⁰(τ⁰) ω⁰(τ)

Y

s∈τ,s≺τ⁰

Q

s⁰∈τ⁰,s⁰>sPˆKT,s⁰(x₁ⁿ) PˆKT,s(x₁ⁿ)

Y PˆKT,s⁰(x₁ⁿ) Q Pˆ (xⁿ)

(43)

Consistency Theorem

Idea of the proof

ωⁿ(τ⁰)

ωⁿ(τ) = ω⁰(τ⁰) ω⁰(τ)

Q

s⁰∈τ⁰PˆKT,s⁰(x₁ⁿ) Q

s∈τPˆKT,s(x₁ⁿ)

= ω⁰(τ⁰) ω⁰(τ)

Y

s∈τ,s≺τ⁰

Q

s⁰∈τ⁰,s⁰>sPˆKT,s⁰(x₁ⁿ) PˆKT,s(x₁ⁿ)

Y PˆKT,s⁰(x₁ⁿ) Q Pˆ (xⁿ)

(44)

Consistency Theorem

Idea of the proof

logωⁿ(τ⁰)

ωⁿ(τ) = logω⁰(τ⁰) ω⁰(τ) +

+ X

s∈τ,s≺τ⁰



 X

s⁰∈τ⁰,s⁰>s

logPˆKT,s⁰(x₁ⁿ)−logPˆKT,s(x₁ⁿ)





+ X

s⁰∈τ⁰,s⁰≺τ



logPˆKT,s⁰(x₁ⁿ)− X

s∈τ,s>s⁰≺τ

logPˆKT,s(x₁ⁿ)





(45)

Consistency Theorem

Idea of the proof

logωⁿ(τ⁰)

+ X

s∈τ,s≺τ⁰



 X

s⁰∈τ⁰,s⁰>s





+ X



s∈τ,s>s⁰≺τ

logPˆKT,s(x₁ⁿ)





(46)

Consistency Theorem

Idea of the proof

logωⁿ(τ⁰)

+ X

s∈τ,s≺τ⁰



 X

s⁰∈τ⁰,s⁰>s





+ X



s∈τ,s>s⁰≺τ

logPˆKT,s(x₁ⁿ)





(47)

Consistency Theorem

Idea of the proof

logωⁿ(τ⁰)

+ X

s∈τ,s≺τ⁰



 X

s⁰∈τ⁰,s⁰>s





+ X



s∈τ,s>s⁰≺τ

logPˆKT,s(x₁ⁿ)





(48)

Consistency Theorem

Idea of the proof

Following the ideas of Csiszár and Talata (2005) we proved that Lemma

Let s∈τ such that s is a proper suffix of some context inτ⁰. Then there exists a constant c<0such that

X

s⁰∈τ⁰,s⁰>s

logPˆKT,s⁰(x₁ⁿ)−logPˆKT,s(x₁ⁿ)<clogn, eventually almost surely as n→ ∞.

(49)

Consistency Theorem

Idea of the proof

Lemma

Let s⁰ ∈τ⁰such that s⁰ is a proper suffix of some context inτ. Then there exists a constant c<0such that

logPˆKT,s⁰(x₁ⁿ)− X

s∈τ,s>s⁰

logPˆKT,s(x₁ⁿ)<cn, eventually almost surely as n→ ∞.

(50)

Consistency Theorem

Idea of the proof

Then there exist constantsC₁≤0 andC₂≤0, not vanishing simultaneously, such that

logωⁿ(τ⁰)

ωⁿ(τ) <logω⁰(τ⁰)

ω⁰(τ) +C₁logn+C₂n

Withω⁰(τ) = (cn)^−t(τ), wheret(τ)is the number of terminal nodes ofτ we have that

logωⁿ(τ⁰)

ωⁿ(τ) → −∞,

(51)

Consistency Theorem

Idea of the proof

Then there exist constantsC₁≤0 andC₂≤0, not vanishing simultaneously, such that

logωⁿ(τ⁰)

ωⁿ(τ) <logω⁰(τ⁰)

ω⁰(τ) +C₁logn+C₂n

Withω⁰(τ) = (cn)^−t(τ), wheret(τ)is the number of terminal nodes ofτ we have that

logωⁿ(τ⁰)

ωⁿ(τ) → −∞,

(52)

Conclusion

“Only one tree represents the forest”