6 Testing Equality Modulo Associativity and Commutativity

In this section we will give an algorithmic application which proves the utility of FSLPs (even if we deal with binary trees). We ﬁx two subsetsA ⊆Σ(the set ofassociative symbols) andC ⊆Σ(the set ofcommutative symbols). This means that we impose the following identities for alla∈ A,c∈ C, all treest₁, . . . , t_n∈ T0(Σ), all permutationsσ:{1, . . . , n} → {1, . . . , n}, and all 1≤i≤j≤n+ 1:

a(t1· · ·t_n) =a(t1· · ·t_i−1a(t_i· · ·t_j−1)t_j· · ·t_n) (1) c(t₁· · ·t_n) =c(t_σ₍₁₎· · ·t_σ₍_n₎). (2) Note that the standard law of associativity for a binary symbol◦(i.e.,x◦(y◦z) = (x◦y)◦z) can be captured by making◦an (unranked) associative symbol in the

sense of (1). Our main result is:

Theorem 13. For treess, twe can test in polynomial time whethersandt are equal modulo the identities in (1)and (2), ifsandtare given succinctly by one of the following three formalisms: (i ) FSLPs, (ii ) top dags, (iii ) TSLPs for the fcns-encodings ofs, t.

6.1 Associative Symbols

Below, we deﬁne the associative normal form nf_A(f) of a forest f and show that from an FSLP F we can compute in linear time an FSLP F with F= nf_A(F). For treess, t∈ T0(Σ) we have thats=tmodulo the identities in (1) if and only if nf_A(s) = nf_A(t). The generalization to forests is needed for the induction, where a slight technical problem arises. Whether the forests t1· · ·t_i−1a(t_i· · ·t_j−1)t_j· · ·t_n and t1· · ·t_n are equal modulo the identities in (1) actually depends on the symbol on top of these two forests. If it is an a, and a ∈ A, then the two forests are equal modulo associativity, otherwise not. To cope with this problem, we use for every associative symbol a∈ A a function φ_a: F0(Σ)→ F0(Σ) that pulls up occurrences ofawhenever possible.

Let•∈/ Σ be a new symbol. For everya∈Σ∪ {•} letφ_a:F0(Σ)→ F0(Σ) be deﬁned as follows, wheref ∈ F0(Σ) andt₁, . . . , t_n∈ T0(Σ):

φ_a(b(f)) =

φ_a(f) ifa∈ Aanda=b,

b(φ_b(f)) otherwise, φ_a(t1· · ·t_n) =φ_a(t1)· · ·φ_a(t_n).

In particular, φ_a(ε) = ε. Moreover, deﬁne nf_A: F0(Σ) → F0(Σ) by nf_A(f) = φ_•(f).

Grammar-Based Compression of Unranked Trees 127

Example 14. Lett=a(a(cd)b(cd)a(e)) andA={a}. We obtain φ_a(t) =φ_a(a(cd)b(cd)a(e)) =φ_a(a(cd))φ_a(b(cd))φ_a(a(e))

=φ_a(cd)b(φ_b(cd))φ_a(e) =cdb(cd)e, φ_b(t) =a(φ_a(a(cd)b(cd)a(e))) =a(cdb(cd)e).

To show the following simple lemma one considers the terminating and conﬂuent rewriting system obtained by directing the Eq. (1) from right to left.

Lemma 15. For two forests f1, f2 ∈ F0(Σ), nf_A(f1) =nf_A(f2) if and only if f1 andf2 are equal modulo the identities in (1) for alla∈ A.

Lemma 16. From a given FSLP F = (V, S, ρ) over Σ one can construct in timeO(|F| · |Σ|)an FSLP F withF=nf_A(F).

For the proof of Lemma16one introduces new variables A_a for alla∈Σ∪ {•}

and deﬁnes the right-hand sides of F such that AaF = φ_a(AF) for all A∈V0andBaφb(f)F =φ_a(BfF) for all B∈V1, f ∈ F0(Σ), wherebis the label of the parent node of the parameterxinBF. This parent node exists if we assume the FSLPF to be in normal form.

6.2 Commutative Symbols

To test whether two trees overΣ are equivalent with respect to commutativity, we deﬁne a commutative normal form nf_C(t) of a tree t ∈ T0(Σ) such that nf_C(t1) = nf_C(t2) if and only if t1 and t2 are equivalent with respect to the identities in (2) for allc∈ C.

We start with a general definition: Let Δ be a possibly infinite alphabet together with a total order <. Let ≤ be the reflexive closure of<. Define the function sort^<:Δ^∗ → Δ^∗ by sort^<(a1· · ·a_n) = a_i₁· · ·a_i_n with {i1, . . . , i_n} = {1, . . . , n} anda_i₁ ≤ · · · ≤a_i_n.

Lemma 17. LetGbe an SSLP overΔ and let<be some total order onΔ. We can construct in time O(|Δ| · |G|)an SSLPG such that G=sort^<(G).

In order to deﬁne the commutative normal form, we need a total order onF0(Σ).

Recall that elements ofF0(Σ) are particular strings over the alphabetΓ :=Σ∪ {(,)}. Fix an arbitrary total order onΓ and let<_llex be thelength-lexicographic orderonΓ^∗induced by<: forx, y∈Γ^∗we havex <_llexyif|x|<|y|or (|x|=|y|, x=uav, y=ubv, anda < b foru, v, v ∈Γ^∗ anda, b∈Γ). We now consider the restriction of <llex to F0(Σ) ⊆ Γ^∗. For the proof of the following lemma one ﬁrst constructs SSLPs for the strings F1,F2 ∈ Γ^∗ (the construction is similar to the case of TSLPs, see [6]) and then uses [16, Lemma 3] according to which SSLP-encoded strings can be compared in polynomial time with respect to <llex.

Lemma 18. For two FSLPsF1andF2we can check in polynomial time whether F1=F2,F1<_llexF2orF2<_llexF1.

128 A. Gasc´on et al.

From the restriction of <llex toT0(Σ)⊆Γ^∗ we obtain the function sort^<^llex on T0(Σ)^∗=F0(Σ). We deﬁne nf_C:F0(Σ)→ F0(Σ) by

nf_C(a(f)) =

a(sort^<^llex(nf_C(f))) ifa∈ C a(nf_C(f)) otherwise, nf_C(t₁· · ·t_n) = nf_C(t₁)· · ·nf_C(t_n).

Obviously, f₁, f₂ ∈ F(Σ) are equal modulo the identities in (2) for allc ∈ C if and only if nf_C(f₁) = nf_C(f₂). Using this fact and Lemma 15 it is not hard to show:

Lemma 19. For f1, f2 ∈ F0(Σ) we have nf_C(nf_A(f1)) = nf_C(nf_A(f2)) if and only if f1 and f2 are equal modulo the identities in (1) and (2) for all a∈ A, c∈ C.

For our main technical result (Theorem21) we need a strengthening of our FSLP normal form. Recall the notion of thespinefrom Sect.3. We say that an FSLP F = (V, S, ρ) is in strong normal form if it is in normal form and for every A ∈ V₀^⊥ with ρ(A) =BC either B ∈ V₁^⊥ or |C_F| ≥ |D_F| −1 for every D∈V₁^⊥ which occurs in spine(B) (note that|D_F| −1 is the number of nodes in D_F except for the parameterx).

Lemma 20. From a given FSLPF = (V, S, ρ)in normal form we can construct in polynomial time an FSLPF= (V, S, ρ)in strong normal form withF= F.

For the proof of Lemma20we modify the right-hand sides of variablesA∈V₀^⊥ with ρ(A) = BC and |spine(B)| ≥ 2. Basically, we replace the vertical concatenations BCby polynomially many vertical concatenations B_iC_i which satisfy the condition of the strong normal form. We can now prove the main technical result of this section:

Theorem 21. From a given FSLPF we can construct in polynomial time an FSLPF with F=nf_C(F).

Proof. LetF = (V, S, ρ). By Theorem4 and Lemma20we may assume thatF is in strong normal form. For everyA∈V₁ let

args(A) ={t∈ T0(Σ)| |t| ≥ |DF| −1for each symbolD inspine(A)} We want to construct an FSLPF = (V, S, ρ) withV₀⊆V₀ andV₁=V₁ such that

(1) A_F = nf_C(A_F) for allA∈V0,

(2) A_Fnf_C(t)= nf_C(A_Ft) for allA∈V1,t∈args(A).

From (1) we obtain F=S_F = nf_C(S_F) = nf_C(F) which concludes the proof.

To deﬁneρ, letV^c=V₀^c∪V₁^cwithV₁^c={A∈V1|ρ(A) =a(BxC) witha∈ C}and V₀^c={A∈V0|ρ(A) =a(B) witha∈ C or ρ(A) =DCwithD∈V₁^c} be the set of commutative variables. We setρ(A) =ρ(A) for A∈V \V^c. For A∈V^c we deﬁneρ(A) by induction along the partial order of the dag:

Grammar-Based Compression of Unranked Trees 129 1. ρ(A) = a(B): Let M_A be the set of all C ∈ V₀^⊥ which are below A in the dag, and letw= hor(B) =B_F∈M_A^∗. By induction,ρ is already defined onM_A, and thusC_F is defined for everyC ∈M_A. By Lemma18, we can compute in polynomial time a total order<onM_Asuch thatC < Dimplies C_F ≤llex D_Ffor allC, D∈M_A. By Lemma17, we can construct in linear time an SSLPG_w= (V_w, S_w, ρ_w) withGw= sort^<(w), and we may assume that all variables D ∈ V_w are new. We add these variables to V₀ together with their right hand sidesρ(D) =ρ_w(D), and we finally set ρ(A) =a(S_w).

2. ρ(A) =BC: Let ρ(B) = a(DxE). We deﬁneG_w = (V_w, S_w, ρ_w) as before, but withw=DCE_F instead ofw=B_F, and we setρ(A) =a(S_w).

3. ρ(A) = a(BxC): We deﬁne G_w = (V_w, S_w, ρ_w) as before, this time with w=BC_F, and we setρ(B) =a(S_wx).

The main idea is that the strong normal form ensures that in right-hand sides of the forma(DxE) witha∈ Cone can move the parameterxto the last position (see point 3 above), since only trees that are larger than all trees produced from

D andE are substituted forx.

Proof of Theorem13. By Propositions6 and9 it suﬃces to show Theorem13 for the case that t1 and t2 are given by FSLPs F1 and F2, respectively. By Lemma19 and Lemma18 it suﬃces to compute in polynomial time FSLPs F₁ andF₂ for nf_C(nf_A(t₁)) and nf_C(nf_A(t₂)). This can be achieved using Lemma16

and Theorem21.

No documento 13th International Computer Science Symposium in Russia, CSR 2018 (páginas 156-159)