Mar elo Arenas
DepartmentofComputerS ien e
UniversityofToronto
marenas s.toronto.edu
Leonid Libkin
DepartmentofComputerS ien e
UniversityofToronto
libkin s.toronto.edu
Abstra t
Thispapertakesarststeptowardsthedesignand
nor-malizationtheoryforXML do uments. Weshowthat,
likerelationaldatabases,XMLdo umentsmay ontain
redundant information, and may be prone to update
anomalies. Furthermore, su h problems are aused by
ertainfun tionaldependen iesamongpathsinthe
do -ument. Ourgoalistondawayof onvertingan
arbi-traryDTD into awell-designedone, that avoidsthese
problems. We rst introdu e the on ept of a
fun -tional dependen y for XML, and dene its semanti s
viaarelationalrepresentationofXML. Wethendene
anXMLnormalform,XNF,thatavoidsupdate
anoma-liesandredundan ies. Westudyitspropertiesandshow
thatitgeneralizesBCNFandanormalformfornested
relationswhen those are appropriately oded asXML
do uments. Finally,wepresentalosslessalgorithmfor
onvertinganyDTDinto oneinXNF.
1 Introdu tion
The on eptsof databasedesignandnormalformsare
akey omponentoftherelationaldatabasete hnology.
Inthispaper,westudydesignprin iplesforXMLdata.
XML has re ently emerged as a new basi formatfor
data ex hange. Although many XML do uments are
viewsofrelationaldata,thenumberofappli ations
us-ingnativeXML do umentsisin reasing rapidly. Su h
appli ationsmayusenativeXMLstoragefa ilities[20℄,
andupdate XMLdata[28℄. Updates,likein relational
databases, may ause anomalies if data is redundant.
Intherelationalworld,anomaliesareavoidedbyusing
well-designed database s hema. XML has its version
Resear haÆliation: BellLaboratories.
ofs hematoo;mostoftenitisDTDs (Do umentType
Denitions), andsomeotherproposalsexist orare
un-derdevelopment[31,30℄. Whatwoulditmeanthenfor
su h as hemato bewellorpoorlydesigned? Clearly,
this questionhasarisenin pra ti e: one annd
om-paniesoeringhelp in\goodDTD design." Thishelp,
however, omes in form of onsulting servi es rather
than ommer ially available software, as there are no
learguidelinesforprodu ingwelldesignedXML.
Our goal is to nd prin iples for good XML data
de-sign, and algorithms to produ e su h designs. We
be-lievethat itisimportantto dothis resear hnow, asa
lot ofdataisbeingput ontheweb. On emassiveweb
databasesare reated,itisveryhardto hangetheir
or-ganization;thus,thereisariskofhavinglargeamounts
ofwidelya essible,butatthesametimepoorly
orga-nizedlega ydata.
Normalizationisoneofthemostthoroughlyresear hed
subje tsindatabasetheory(asurvey[4℄produ edmany
referen es morethan20 years ago),and annotbe
re- onstru ted in a single paper in its entirety. Here we
followthestandardtreatmentofoneofthemost
om-mon (if not themost ommon) normal forms,BCNF.
It eliminates redundan ies and avoids update
anoma-lies whi h they ause by de omposing into relational
subs hemasinwhi heverynontrivialfun tional
depen-den y denes a key. Justto retra e this development
in theXML ontext,weneedthefollowing:
a) Understanding ofwhat a redundan yand an
up-dateanomalyis.
b) Adenitionandbasi propertiesoffun tional
de-penden ies(so far, most proposals forXML
on-straints on entrateonkeys).
) A denition of what \bad" fun tional
dependen- iesare(thosethat auseredundan iesandupdate
anomalies).
d) Analgorithmfor onvertinganarbitraryDTDinto
onethat does notadmit su h bad fun tional
courses
title
"Automata
Theory"
taken_by
"st1"
"Deere"
"A+"
"st2"
"B-"
student
student
name
grade
name
grade
name
grade
"st1"
"Deere"
"A"
student
student
name
grade
"st3’
"Smith"
"B+"
taken_by
title
"Calculus I"
"mat100"
@cno
@cno
@sno
@sno
@sno
@sno
"csc200"
"Smith"
course
course
(a)title
"Automata
Theory"
student
grade
"st1"
"A+"
"st2"
grade
"B-"
student
"mat100"
title
"Calculus I"
taken_by
taken_by
student
student
"st1"
grade
"A"
"st3’
grade
"B+"
@cno
@cno
@sno
@sno
@sno
@sno
courses
"csc200"
info
"Smith"
name
@sno
"st1"
info
"Deere"
name
number
@sno
@sno
"st3"
"st2"
number
number
course
course
(b)Figure 1: ExamplesofXMLdo uments.
Starting with point a), how doesone identify bad
de-signs? WehavelookedatalargenumberofDTDsand
foundtwokindsof ommonlypresentdesignproblems.
Theyareillustratedintwoexamplesbelow.
Example 1.1: Consider the following DTD that
de-s ribesapartofauniversitydatabase:
<!ELEMENT ourses ( ourse*)>
<!ELEMENT ourse (title, taken_by)>
<!ATTLIST ourse
no CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT taken_by (student*)>
<!ELEMENT student (name, grade)>
<!ATTLIST student
sno CDATA #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT grade (#PCDATA)>
Forevery ourse,westoreitsnumber( no),itstitleand
thelistofstudentstakingthe ourse. Forea hstudent
takinga ourse,westorehis/hernumber(sno),name,
andthegradein the ourse.
AnexampleofanXMLdo umentthat onformstothis
DTD is shown in Figure 1, (a). This do ument
satis-esthefollowing onstraint: anytwostudentelements
withthesamesnovaluemusthavethesamename. This
onstraint(whi hlooksverymu hlikeafun tional
de-penden y), ausesthedo umenttostoreredundant
in-formation: for example, the name Deere for student
st1isstoredtwi e. Andjustasinrelationaldatabases,
su h redundan ies an lead to update anomalies: for
example,updatingthenameofst1foronlyone ourse
resultsin anin onsistentdo ument, andremovingthe
studentfroma oursemayresultin removingthat
stu-In orderto eliminate redundantinformation, weusea
te hniquesimilartotherelationalone,andsplitthe
in-formationaboutthenameandthegrade. Sin ewedeal
withjustoneXMLdo ument,wemustdoitby reating
an extra elementtype, info, for student information,
asshownbelow:
<!ELEMENT ourses ( ourse*, info*)>
<!ELEMENT ourse (title,taken_by)>
<!ATTLIST ourse
no CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT taken_by (student*)>
<!ELEMENT student (grade)>
<!ATTLIST student
sno CDATA #REQUIRED>
<!ELEMENT grade (#PCDATA)>
<!ELEMENT info (number*,name)>
<!ELEMENT number EMPTY>
<!ATTLIST number
sno CDATA #REQUIRED>
<!ELEMENT name (#PCDATA)>
Ea h infoelement hasas hildrenonenameanda
se-quen e of number elements, with sno as anattribute.
Dierent students an have the same name, and we
groupallstudentnumberssnoforea hnameunderthe
sameinfoelement. Arestru tureddo umentthat
on-formstothisDTD isshowninFigure1,(b). Notethat
st2 and st3 are put together be ause both students
havethesamename.
This example is reminis ent of the anoni al example
of bad relational design aused by non-key fun tional
dependen ies,andsoisthemodi ationofthes hema.
Someexamplesofredundan iesaremore loselyrelated
database[8℄forstoringdataabout onferen es.
<!ELEMENT db ( onf*)>
<!ELEMENT onf (title, issue+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT issue (inpro eedings+)>
<!ELEMENT inpro eedings (author+,
title, booktitle)>
<!ATTLIST inpro eedings
key ID #REQUIRED
pages CDATA #REQUIRED
year CDATA #REQUIRED>
<!ELEMENT author (#PCDATA)>
<!ELEMENT booktitle (#PCDATA)>
Ea h onferen e has a title, and one or more issues
(whi h orrespond to years when the onferen e was
held). Papersare stored in inpro eedingselements;
theyearofpubli ationisoneofitsattributes.
Su hado umentsatisesthefollowing onstraint: any
two inpro eedings hildren of the same issuemust
have the same value of year. This too is similar to
relationalfun tionaldependen ies,but nowwereferto
thevalues(theyearattribute)aswellasthestru ture
( hildrenof the same issue). Moreover,weonly talk
about inpro eedings nodes that are hildren of the
sameissueelement. Thus,thisfun tionaldependen y
anbe onsideredrelativetoea hissue.
The fun tional dependen y here leads to redundan y:
yearisstoredmultipletimesfora onferen e. The
natu-ralsolutiontotheprobleminthis aseisnotto reatea
newelementforstoringtheyear,butratherrestru ture
the do ument and make year an attribute of issue.
Thatis,we hangeattributelistsas:
<!ATTLIST issue
year CDATA #REQUIRED>
<!ATTLIST inpro eedings
key ID #REQUIRED
pages CDATA #REQUIRED>
Our goalis to show howto dete t anomalies of those
kinds,andtotransformdo umentsinalosslessfashion
intoonesthat donotsuerfrom thoseproblems.
The rst step towardsthat goal is to introdu e
fun -tionaldependen ies(FDs)forXMLdo uments. Sofar,
mostproposalsforXML onstraintsdealwithkeysand
foreign keys [5, 6, 31℄. We introdu e FDs for XML
by onsideringarelationalrepresentationofdo uments
and dening FDs on them. The relational
represen-tation is somewhatsimilar to thetotal unnesting of a
nestedrelation[26, 29℄; however,wehavetodealwith
DTDs that may ontainarbitrary regularexpressions,
and be re ursive. Our representation via tree tuples,
introdu ed in Se tion 3, may ontain null values. In
Se tion4,XMLFDsareintrodu edviaFDson
in om-pleterelations[3,21℄.
allowsredundan y- ausingFDs. Wegiveitin Se tion
5,and showthat ournormalform, alledXNF,
gener-alizesBCNF andanestednormalform NNF[22,23℄.
The last step then is to nd an algorithm that
on-vertsanyDTDintooneinXNF.Wedothisin Se tion
6. Onbothexamplesshownearlier,thealgorithm
pro-du es exa tly the desired re onstru tion of the DTD.
The main algorithm usesimpli ation of fun tional
de-penden ies(althoughthereisaversionthatdoesnotuse
impli ation,butitmayprodu esuboptimalresults). In
Se tion7,weshowthatforalarge lassofDTDs,
ov-eringmostDTDsthato urinpra ti e,theimpli ation
problemistra table(in fa t,quadrati ).
Oneofthereasonsforthesu essofthenormalization
theoryis itssimpli ity,at leastforthe ommonlyused
normalformssu hasBCNF,3NFand4NF.Hen e,the
normalizationtheoryforXMLshouldnotbeextremely
ompli ated in order to be appli able. In parti ular,
this was the reason we hose to use DTDs instead of
more omplex formalisms [31, 30℄. This is in perfe t
analogywith the situation in the relational world:
al-thoughSQLDDLisarather ompli atedlanguagewith
numerousfeatures,BCNFde ompositionusesasimple
modelofaset ofattributesand asetoffun tional
de-penden ies.
Related work Forsurveyofrelationalnormalization,
see[1,4℄. Normalizationfornestedrelationsand
obje t-oriented databases is studied in [23, 22, 27℄. Coding
nestedrelationsinto atones,similartoourtreetuples,
is donein [26, 29℄. Weuse FDsand relationalalgebra
queries over in omplete relations using the te hniques
from [3,7,15,19, 21℄. XML onstraints(mostlykeys)
have been studied in [5, 6, 12℄; these onstraints do
notuseDTDs. XML onstraintsthat takesDTDs into
a ountarestudiedin[11℄. Finally,[2℄ onsidersnormal
formsforextended ontext-freegrammarssimilartothe
Greiba hnormalformforCFGs;these,however,donot
ne essarilyguaranteegoodXMLdesign.
2 Notations
Assume that wehavethe followingdisjointsets: El of
element names, Att of attribute names, Str of
possi-blevaluesofstring-valuedattributes,andVert ofnode
identiers. All attribute names start with the symbol
,and theseare theonlyones startingwiththis
sym-bol. WeletSand ?(null) bereservedsymbolsnotin
anyofthose sets.
Denition1 A DTD (Do ument Type Denition) is
denedtobeD=(E; A;P;R ;r), where:
EEl is anitesetof elementtypes.
tions: Given 2E,P()=SorP()isaregular
expressiondenedasfollows:
::= j 0
jj j ;j
where is the empty sequen e, 0
2 E, and \j",
\;"and\" denoteunion, on atenation,andthe
Kleene losure, respe tively.
R is a mapping from E to the powerset of A. If
l2R (), wesay thatl isdened for.
r 2E and is alledthe element typeofthe root.
Without lossof generality,we assumethat r does
not o urin P()for any 2E.
The symbols and S represent element type
de lara-tionsEMPTYand #PCDATA,respe tively.
Given a DTD D = (E; A; P; R ; r), a string w =
w 1 w n is a path in D if w 1 = r, w i is in the al-phabet of P(w i 1
), for ea h i 2 [2;n 1℄, and w
n is in the alphabet of P(w n 1 ) or w n = l for some l 2R (w n 1
). We dene length(w)as n and last(w)
asw
n
. Welet paths(D) stand for theset of all paths
in D andEPaths(D)for thesetof allpathsthat ends
with an element type (ratherthan an attribute or S);
thatis, EPaths(D)=fp2paths(D)jlast(p)2Eg. A
DTDis alled re ursiveifpaths(D)is innite.
Denition2 An XML tree T is dened to be a tree
(V;lab;ele;att;root), where
V Vert isanite setof verti es(nodes).
lab:V !El.
ele:V !Str[V
.
attis apartialfun tion V Att !Str. Forea h
v2V,the setfl2Att jatt(v;l)isdened g is
requiredtobenite.
root2V is alledtherootofT.
The parent- hild edge relation on V, f(v
1 ;v 2 ) j v 2 o urs inele(v 1
)g, isrequiredtoforma rootedtree.
Noti e that we do not allow mixed ontent in XML
trees. The hildren of an element node anbe either
zeroormoreelementnodesoronestring.
Given an XML tree T, a string w
1 w n , with w 1 ;:::;w n 1 2El andw n 2El[Att[fSg,isapathin
T ifthere areverti esv
1 v n 1 inV su h that: v 1 = root, v i+1 is a hild of v i (1 i n 2), lab(v i )=w i (1in 1). n n n 1 that lab(v n ) = w n . If w n = l, with l 2 Att, then att(v n 1 ;l) is dened. If w n = S, then v n 1 hasa hildinStr.
Weletpaths(T)standforthesetofpathsinT.
Wenextgiveastandarddenitionofatree onforming
to a DTD (T j= D) aswell as a weakerversion of T
being ompatiblewithD (TD).
Denition3 Given aDTD D =(E; A; P; R ;r) and
anXMLtreeT =(V;lab;ele;att;root),wesaythatT
onformstoD (T j=D)if
labisamapping fromV toE.
For ea h v 2 V, if P(lab(v)) =S, then ele(v)=
[s℄, where s 2 Str. Otherwise, if ele(v) =
[v
1 ;:::;v
n
℄,thenthestringlab(v
1
)lab(v
n )must
bein theregularlanguagedenedbyP(lab(v)).
attisapartialfun tionfromVAtoStrsu hthat
foranyv2V andl2A,att(v;l)isdenedi
l2R (lab(v)).
lab(root)=r.
We say that T is ompatible with D (written TD)
ipaths(T)paths(D).
3 Tree Tuples
Toextendthenotionsoffun tionaldependen iestothe
XMLsetting,werepresentXMLtreesassetsoftuples.
While various mappings from XML to the relational
model have been proposed [14, 25℄, the mappingthat
weuseisofadierentnature,asourgoalisnottond
awayofstoringdo umentseÆ iently,butrathernda
orresponden e betweendo uments and relationsthat
lends itself to anaturaldenition of fun tional
depen-den y.
Various languages proposed for expressing XML
in-tegrity onstraintssu h askeys, [5,6, 31℄, treat XML
trees asunordered(for thepurpose of dening the
se-manti s of onstraints): that is, the order of hildren
of any givennodeis irrelevantasfar assatisfa tionof
onstraints is on erned. In XML trees, on the other
hand, hildrenofea hnodeareordered. Werstdene
anotionofsubsumptionthat disregardthisordering.
GiventwoXML treesT
1 =(V 1 ; lab 1 ; ele 1 ; att 1 ;root 1 ) andT 2 =(V 2 ;lab 2 ;ele 2 ;att 2 ;root 2 ),wesaythatT 1 is subsumedbyT 2 ,writtenasT 1 T 2 if V 1 V 2 . root =root .
2 V1 1 att 2 V1Att =att 1 . Forallv2V 1 ,ele 1 (v)isasublistofapermutation ofele 2 (v).
Thisrelationisapre-order,whi hgivesrisetoan
equiv-alen erelation: T 1 T 2 iT 1 T 2 andT 2 T 1 . That is, T 1 T 2 iT 1 and T 2
areequalasunorderedtrees.
Wedene [T℄tobethe-equivalen e lassofT.
Wewrite [T℄ j=D if T
1
j=D for someT
1
2 [T℄. It is
easytoseethatforanyT
1 T 2 ,paths(T 1 )=paths(T 2 ); hen eT 1 D iT 2
D. Weshall alsowrite T
1 T 2 whenT 1 T 2 andT 2 6T 1 .
Denition4(Tree tuples) Given a DTD D = (E;
A; P; R ; r), a tree tuple t in D is a fun tion from
paths(D)toVert[Str[f?gsu hthat:
Forp2EPaths(D),t(p)2Vert[f?g,andt(r)6=?.
Forp2paths(D) EPaths(D),t(p)2Str[f?g.
If t(p 1 )=t(p 2 )andt(p 1 )2Vert,thenp 1 =p 2 . Ift(p 1 )=?andp 1 isaprexofp 2 ,thent(p 2 )=?. fp2paths(D)jt(p)6=?g isnite.
T(D)isdenedtobethesetofalltreetuplesinD. For
atree tuplet andapathp, wewritet:pfor t(p).
Example 3.1: Suppose that D is theDTD shown in
example1.1 (a). Thenatreetuplein D assignsvalues
toea hpathin paths(D)asisshownin gure2(a).
Weusenulls(?) in treetuplesbe auseofthe
disjun -tionin DTDs. Forexample, letD =(E; A; P; R ;r),
where E = fr;a;bg, A = ;, P(r) = (ajb), P(a) =
andP(b)=. Thenpaths(D)=fr;r:a;r:bgbutnotree
tuple omingfrom an XML tree onformingto D an
assignnon-nullvaluestobothr:aandr:b.
IfDisare ursiveDTD,thenpaths(D)isinnite;
how-ever,onlyanite numberofvaluesin atree tupleare
dierent from ?. For ea h tree tuple t, its non-null
valuesgiveriseto anXML treeasfollows.
Denition5(tree
D
) Given a DTD D = (E; A; P;
R ;r) andatree tuple t2T(D), tree
D
(t) isdened to
bean XML tree (V;lab;ele;att;root), whereroot=t:r
and
V =fv2Vertj9p2paths(D)su hthatv=t:pg.
If v=t:p,thenlab(v)=last(p).
If v = t:p, then ele(v) is dened to be the list
ontaining ft:p 0 j t:p 0 6= ?andp 0 = p:; 2 E,orp 0
=p:Sg,orderedlexi ographi ally.
att(v;l)=t:p:l.
Example 3.2: Thenon-nullvaluesofthe treetuple t
shownin gure2(a) giveriseto theXML treeshown
in gure2(b).
Note that tree
D
(t) need not onform to the DTD D,
but:
Proposition1 If t2T(D), thentree
D
(t)D. 2
We would like to des ribe XML trees in terms of the
tuples they ontain. For this, we need to sele t
tu-ples ontaining the maximal amount of information.
This is done via the usual notion of ordering on
tu-ples (and relations) with nulls, [7, 15, 16℄. If wehave
two tree tuples t
1 ;t 2 , we write t 1 v t 2 if whenever t 1 :p is dened, then so is t 2 :p, and t 1 :p 6= ? implies t 1 :p = t 2 :p. As usual, t 1 t 2 means t 1 v t 2 and t 1 6=t 2
. Giventwosetsoftreetuples,XandY,wewrite
X v [ Y if8t 1 2X9t 2 2Y t 1 vt 2 . Denition6 (tupl es D ) Given a DTD D and an
XML tree T su h that T D, tuples
D
(T) is dened
tobethe setof maximal, wrt v, tree tuples t su hthat
tree
D
(t)issubsumedbyT;thatis:
max v ft2T(D)jtree D (t)Tg: Observe that T 1 T 2 implies tuples D (T 1 ) = tuples D (T 2 ). Hen e, tuples D applies to equivalen e lasses: tuples D ([T℄)=tuples D (T).
Proposition2 If T D, then tuples
D
(T) is a nite
subset of T(D). Furthermore, tuples
D ()is monotone: T 1 T 2 implies tuples D (T 1 )v [ tuples D (T 2 ). 2
Finally,wedenethetreesrepresentedbyasetoftuples
X asthe minimal, with respe tto , trees ontaining
alltuplesinX.
Denition 7(trees
D
) Given aDTD D and aset of
treetuples X T(D), trees
D (X)isdenedtobe: min fT jTD and8t2X; tree D (t)Tg: ForT 2trees D (X)and T 0 T, T 0 2trees D (X); thus trees D
(X)isaunionofequivalen e lasses.
ThefollowingshowsthateveryXMLdo ument anbe
representedasasetoftreetuples,ifwe onsideritasan
unorderedtree. That is,atreeT anbere onstru ted
from tuples
D
(T),uptoequivalen e.
Theorem 1 Given a DTD D and an XML tree T, if
TD,thentrees
D
t( ourses)=v 0 t( ourses: ourse)=v 1 t( ourses: ourse: no)= s 200 t( ourses: ourse:title)=v 2
t( ourses: ourse:title:S)=Automata Theory
t( ourses: ourse:taken by)=v
3
t( ourses: ourse:taken by:student)=v
4
t( ourses: ourse:taken by:student:sno)=st1
t( ourses: ourse:taken by:student:name)=v
5
t( ourses: ourse:taken by:student:name:S)=Deere
t( ourses: ourse:taken by:student:grade)=v
6
t( ourses: ourse:taken by:student:grade:S )=A+
(a)Valuesoft 0 v 1 v 2 v 3 v4 v 5 v 6 s 200 Automata Theory st1 Deere A+ (b)tree D (t)
Figure2: Treetuplet anditstreerepresentation.
The onversedoesnothold, but anbepartially
re ov-eredwhentrees
D
(X) isasingleequivalen e lass. We
saythatX T(D)isD- ompatibleifthereisanXML
treeT su hthatT DandX tuples
D (T).
Proposition 3 If X T(D) is D- ompatible, then
(a) There is an XML tree T su h that T D and
trees D (X)=[T℄,and(b)X v [ tuples D (trees D (X)).
Theorem 1 and Proposition 3 are summarized in the
diagram presentedin the following gure. In this
dia-gram,XisaD- ompatiblesetoftreetuples. Thearrow
-standsforthev [ ordering. X trees D -[T℄ X 0 tuples D ? trees D 6 -4 Fun tionalDependen ies
We dene fun tional dependen ies for XML by using
tree tuples. For a DTD D, a fun tional dependen y
(FD) over D is an expression of the form S
1 ! S 2 whereS 1 ;S 2
arenite non-emptysubsetsofpaths(D).
Theset ofallFDsoverDisdenoted byFD(D).
ForS paths(D), and t;t 0 2T(D),t:S =t 0 :S means t:p =t 0
:pfor all p2 S. Furthermore,t:S 6=? means
t:p6=?forallp2S.
If S
1 !S
2
2FD(D)and T isan XMLtreesu hthat
TD and S
1 [S
2
paths(T),wesaythat T satises
S 1 ! S 2 (written T j= S 1 ! S 2 ) if for everyt 1 ;t 2 2 tuples D (T),t 1 :S 1 =t 2 :S 1 andt 1 :S 1 6=?implyt 1 :S 2 = t 2 :S 2
. Thisextendstoequivalen e lasses,sin eforany
FD',andT T 0
,T j='iT 0
j='.
We write T j= , for FD(D), if T j= 'for ea h
'2,andwewriteT j=(D;),ifT j=D andT j=.
Example4.1: Referringba ktoExample1.1,wehave
thefollowingFDs. noisakeyof ourse:
ourses. ourse. no! ourses. ourse. (FD1)
AnotherFDsaysthattwodistin tstudentsubelements
ofthesame ourse annothavethesamesno:
f ourses. ourse,
ourses. ourse.taken by.student.snog!
ourses. ourse.taken by.student. (FD2)
Finally,tosaythattwostudentelementswiththesame
snovaluemusthavethesamename,weuse
ourses. ourse.taken by.student.sno!
ourses. ourse.taken by.student.name.S. (FD3)
Weoerafewremarksonourdenition ofFDs. First,
using thetree tuplesrepresentation, itis easy to
om-bine node and value equality: the former orresponds
to equality between verti es and the latter to
equal-ity between strings. Moreover, keysnaturally appear
asasub lass of FDs,and relative onstraints analso
be en oded. Note that by dening the semanti s of
FD(D) on T(D), we essentially dene satisfa tion of
FDsonrelationswithnullvalues,andoursemanti sis
wesaythat(D;)implies',written (D;)`',iffor
anytreeT with T j=D andT j=, itisthe asethat
T j= '. The set of all FDs implied by (D;) will be
denotedby(D;) +
.
AnFD'istrivialif(D;;)`'. Inrelationaldatabases,
theonly trivialFDs are X !Y, with Y X. Here,
DTD for essomemoreinterestingtrivialFDs. For
in-stan e, for ea h p 2 EPaths(D) and p 0
a prex of p,
(D;;) `p! p 0
. Furthermore,for p;p:l 2paths(D),
(D;;)`p!p:l.
5 XNF:An XML Normal Form
With the denitions of the previous se tion, we are
ready to present the normal form that generalizes
BCNFforXMLdo uments.
Denition8 Given a DTD D and FD(D),
(D;) is in XML normal form (XNF ) i for every
non-trivial FD ' 2 (D;) +
of the form S ! p:l or
S!p:S,itisthe asethat S!pisin(D;) +
.
The intuition is as follows. Suppose that S ! p:l
is in (D;) +
. If T is an XML tree onforming to D
and satisfying , then in T for every set of values of
theelementsin S, we annd onlyonevalue ofp:l.
Thus, for everyset of valuesof S weneedto storethe
valueofp:lonlyon e;inotherwords,S!pmustbe
impliedby(D;).
In this denition, we impose the ondition that ' is
anon-trivial FD. Indeed, the trivial FD p:l ! p:l
is always in (D;) +
, but often p:l ! p 62 (D;) +
,
whi hdoesnotne essarilyrepresentabaddesign.
ToshowhowXNFdistinguishesgoodXMLdesignfrom
baddesign,werevisittheexamplesfrom the
introdu -tion,andprovethatXNFgeneralizesBCNFandNNF,
anormalformfornestedrelations[22,23℄.
Example5.1: ConsidertheDTDfromexample1.1(a)
whoseFDsare(FD1),(FD2),(FD3)showninthe
previ-ousse tion. (FD3)asso iatesauniquenamewithea h
studentnumber,whi histhereforeredundant. The
de-signisnotinXNF,sin eit ontains(FD3)butdoesnot
implytheFD
ourses: ourse:taken by:student:sno!
ourses: ourse:taken by:student:name
To remedy this, we gave a revised DTD in example
1.1(b). Theideawasto reateanewelementinfofor
storinginformationaboutstudent. Thatdesignsatises
FDs(FD1),(FD2)aswellas
ourses.info.number.sno ! ourses.info,
Example5.2:SupposethatDistheDBLPDTDfrom
example1.2. Among theset ofFDs satisedbythe
do umentsare:
db. onf.title.S!db. onf (FD4)
db. onf.issue!
db. onf.issue.inpro eedings.year (FD5)
Forea hissueofa onferen e,itsyearisstoredinevery
arti leinthat issue;thus, (D;)isnotin XNF ,sin e
db: onf:issue!db: onf:issue:inpro eedings
isnotin(D;) +
.
The solution we proposed in the introdu tion was to
make yearan attribute of issue. (FD5) is not valid
intherevisedspe i ation,whi h anbeeasilyveried
to bein XNF . Note that we donot repla e(FD5) by
db. onf.issue!db. onf.issue.year,sin eitisatrivial
FDandthusisimpliedbythenewDTDalone.
BCNF and XNF Relational databases an be
eas-ily mapped into XML do uments. Given a s hema
G(A 1 ;:::;A n ), a DTD D G
has two element types
db and G, P(db) = G , P(G) = , and R (G) = fA 1 ;:::;A n g. For a set F of FDs over G, we dene a set F of FDs over D G that
in- ludes, for ea h A
i 1 A i m ! A j in F an FD fdb:G:A i1 ;:::;db:G:A im g ! db:G:A j , as well as fdb:G:A 1 ;:::;db:G:A n
g ! db:G (to avoid
dupli- ates).
Example 5.3: As hemaG(A;B;C) anbe odedby
thefollowingDTD: <!ELEMENT db (G*)> <!ELEMENT G EMPTY> <!ATTLIST G A CDATA #REQUIRED B CDATA #REQUIRED C CDATA #REQUIRED>
In this s hema, an FD A ! B is translated into
db:G:A!db:G:B. Proposition4 (G;F) is in BCNF i (D G ; F ) is in XNF. 2
NNF and XNF A nested s hema is either a set of
attributes X, or X(G 1 ) :::(G n ) , where G i 's are
nested s hemas. An example of a nested relation for
the s hema H 1 = Country(H 2 ) , H 2 = State(H 3 ) , H 3
=City isshownin gure3(a).
United States State Texas City Houston Dallas State Ohio City Columbus Cleveland
(a)NestedrelationH1
Country State City
UnitedStates Texas Houston
UnitedStates Texas Dallas
UnitedStates Ohio Columbus
UnitedStates Ohio Cleveland
(b)CompleteunnestingofH1
Figure3: Nested relationand itsunnesting.
nesteds hemaG=X(G 1 ) :::(G n ) ,weintrodu e an
element typeGwith P(G) =G 1 ;:::;G n and R (G)= fA 1 ;:::;A n g,where X =fA 1 ;:::;A n g;atthe top
levelwehaveanewelementtypedb withP(db)=G
andR (db)=;. InourexampletheDTDis:
<!ELEMENT db (H1*)>
<!ELEMENT H1 (H2*)>
<!ATTLIST H1 Country CDATA #REQUIRED>
<!ELEMENT H2 (H3*)>
<!ATTLIST H2 State CDATA #REQUIRED>
<!ELEMENT H3 EMPTY>
<!ATTLIST H3 City CDATA #REQUIRED>
Thedenition ofFDs fornestedrelationsusesthe
no-tionof omplete unnesting. The ompleteunnestingof
anestedrelationfromourexampleisshownin gure3
(b);ingeneral,thisnotioniseasilydenedbyindu tion.
Inourexample,wehaveavalidFDState !Country,
whiletheFD State!City doesnothold.
Normalizationisusually onsideredfornestedrelations
inthepartitionnormalform(PNF)[1,22,23℄. Anested
relationroverX(G 1 ) :::(G n )
isinPNFifforanytwo
tuples t 1 , t 2 in r: (1) if t 1 :X =t 2
:X, then the nested
relationt 1 :G i and t 2 :G i
areequal, for everyi2 [1;n℄,
and(2)ea hnestedrelationt
1 :G
i
mustbeinPNF,for
every i 2 [1;n℄. Note that PNF an be enfor ed by
usingFDsontheXMLrepresentation. Inourexample
thisisdoneasfollows:
db:H 1 :Country ! db:H 1 fdb:H 1 ; db:H 1 :H 2 :Stateg ! db:H 1 :H 2 fdb:H 1 :H 2 ; db:H 1 :H 2 :H 3 :Cityg ! db:H 1 :H 2 :H 3
It turnsout that one an dene FDsovernested
rela-tionsbyusingtheXMLrepresentation. LetU beaset
ofattributes, G
1
anestedrelations hemaoverU and
FD aset offun tional dependen iesoverG
1
. Assume
that G in ludes nested relation s hemas G , :::, G
andasetofattributesU 0 U. Forea hG i (i2[1;n℄), path(G i
)is indu tivelydened asfollows. If G
i =G 1 , then path(G i ) = db:G 1 . Otherwise, if G i is a nested attribute of G j , then path(G i ) = path(G j ):G i . Fur-thermore, ifA2U 0 isan atomi attribute ofG i , then path(A) =path(G i
):A. Forinstan e, forthe s hema
ofthenestedrelationinFigure3,path(H
2
)=db:H
1 :H
2
andpath(City)=db:H
1 :H 2 :H 3 :City. Wenowdene FD asfollows: For ea h FD A i1 A im ! A i 2 FD, fpath(A i 1 ); :::; path(A i m )g ! path(A i ) is in FD . For ea h i 2 [1;n℄, if A j1 ;:::;A jm is the set of atomi attributesofG i andG i isanestedattribute of G j , fpath(G j ); path(A j1 ); :::; path(A jm )g ! path(G i )is in FD . Furthermore,ifB j1 ;:::;B jl
isthesetofatomi
at-tributesofG 1 ,thenfpath(B j 1 ); :::;path(B j l )g! path(G 1 )isin FD .
Note that the last rule imposes the partition normal
form.
ANestedNormalForm(NNF)fornestedrelationswas
proposed in [22, 23℄. Here weuse the presentation of
[22℄ restri ted to FDs only. Given a nestedrelational
s hemaGandasubs hemaR ,forea hatomi attribute
AofRwedenean estor(A)astheunionoftheatomi
attributesofallthenestedrelations hemasmentioned
in path(R ). Forinstan e, an estor(State) =fCountry;
Stateg. IfFDisasetofFDsoverG,thensaythatitisin
NNFifforea hnon-trivialFDX!A(A2U),ifX !
A2(G;FD) +
,thenX !an estor(A)2(G;FD) +
. As
before, (G;FD) +
standsfor theset of allFDs implied
by(G;FD).
The resultbelow saysthat anestedrelational s hema
G FD denedabove,isinXNF. Proposition 5 (G;FD)isinNNFi(D G ; FD )isin XNF. 2 6 NormalizingXML Do uments
WeshowhowtotransformaDTDDandasetofFDs
intoanewspe i ation(D 0
; 0
)thatisinXNFand
on-tainsthesameinformation. Throughoutthese tion,we
assumethattheDTDsarenon-re ursive(the re ursive
ase anbehandledinaverysimilarfashion),andthat
all FDs are of the form: fq;p
1 :l 1 ;:::;p n :l n g ! p.
Thatis, they ontainat mostoneelementpathonthe
left-hand side. Note that all the FDs we have seen
sofar are of this form. While onstraintsof the form
fq;q 0
;:::garenotforbidden,theyappeartobequite
un-natural,and anbeeasilyeliminatedby reatinganew
attributelandsplittingfq;q 0 g[S!pintoq 0 :l!q 0 and fq;q 0
:lg[S !p. Furthermore,we assumethat
pathsdonot ontainthesymbolS(sin ep:S analways
berepla edbyapathoftheformp:l).
Given a DTD D and a set of FDs , a non-trivial
FD S ! p:l is alled anomalous, over (D;), if
it violates XNF; that is, S ! p:l 2 (D;) +
but
S ! p 62 (D;) +
. A path on the right-hand side of
ananomalousFDis alledananomalouspath,andthe
setofallsu h pathsisdenoted byAP(D;).
Thealgorithm ombinestwobasi ideaspresentedinthe
introdu tion: reatinganewelementtype,andmoving
anattribute.
Movingattributes LetD=(E;A;P;R ;r)beaDTD,
p:l 2 paths(D), q 2 EPaths(D) and m be an
at-tribute. The DTD D[p:l :=q:m℄ is onstru ted by
moving the attribute l from the set of attributes of
last(p)totheset ofattributesoflast(q),and hanging
itsnameto m,asshownin thefollowinggure.
r l last(p) m last(q) p q Formally,D[p:l :=q:m℄ is(E; A 0 ; P; R 0 ; r), where A 0 = A[fmg, R 0 (last(q)) = R (last(q))[fmg, R 0
(last(p))=R (last(p))nflgandR 0 ( 0 )=R ( 0 )for ea h 0
2Enflast(q);last(p)g. Thisis the samekind
oftransformationwesawinmovingtheyearattribute
intheDBLPexample.
q:m℄ overD[p:l :=q:m℄ onsistsof allFDs S
1 ! S 2 2(D;) + withS 1 [S 2 paths(D[p:l:=q:m℄).
Creating new element types Let D =(E; A; P; R ;
r)beaDTD,S=fq; p 1 :l 1 ;:::;p n :l n gpaths(D)
su h that n 1 and q 2 EPaths(D). We onstru t
a new DTD D 0
by reating a new element type as
a hild of the last elementof q, making
1 , :::,
n its
hildren, l its attribute, and l
1 ;:::;l n attributes of 1 ,:::, n
. Furthermore,weremovel from theset
of attributes of the last elementof p, asshown in the
followinggure.
. . .
. . .
r p ln l 1 1 n p 1 last(p1) last(pn) l l n l 1 l p n q last(q) last(p) Formally, if f; 1 ; :::; ng are element types whi h
are not in E, the new DTD, denoted by D[p:l :=
q:[ 1 :l 1 ;:::; n :l n ;l℄℄,is(E 0 ; A;P 0 ;R 0 ;r), where E 0 =E[f; 1 ;:::; n gand 1. P 0 (last(q)) = P(last(q)); , P 0 () = 1 ;:::; n , P 0 ( i
)=, forea hi2[1;n℄, andP 0 ( 0 )=P( 0 ) forea h 0 2Enflast(q)g. 2. R 0 () =flg,R 0 ( i )=fl i g, forea h i2[1;n℄, R 0
(last(p))=R (last(p))nflgandR 0 ( 0 )=R ( 0 ) forea h 0 2Enflast(p)g. GivenD 0 = D[p:l :=q:[ 1 :l 1 ;:::; n :l n ;l℄℄ and
a set of FDs over D, we dene a set [p:l :=
q:[ 1 :l 1 ; :::; n :l n ;l℄℄ of FDs over D 0 asthe set
that ontainsthefollowing:
1. S 1 !S 2 2(D;) + withS 1 [S 2 paths(D 0 ); 2. Ea h FD over q, p i , p i :l i (i 2 [1;n℄) and p:l
is transferred to and its hildren. That is, if
S 1 [S 2 fq;p 1 ;:::;p n ;p 1 :l 1 ;:::;p n :l n ;p:lg and S 1 !S 2 2(D;) + , then we in lude an FD obtainedfrom S 1 ! S 2 by hangingp i to q:: i , p:l toq:::l,and p:ltoq::l;
(1) If(D;)isinXNFthenreturn(D;),
other-wisegotostep(2).
(2) If there is an anomalous FD S ! p:l with
q 2 EPaths(D)\S and q ! S 2 (D;) + , then: D :=D[p:l:=q:m℄ :=[p:l:=q:m℄
where misfresh, andgotostep(1).
(3) Choosea(D;)-minimalanomalousFDS!
p:l,whereS=fq;p 1 :l 1 ;:::;p n :l n g.
Cre-atefreshelementtypes,
1 ,:::, n ; set D:=D[p:l:=q:[ 1 :l 1 ;:::; n :l n ;l℄℄ :=[p:l:=q:[ 1 :l 1 ;:::; n :l n ;l℄℄ andgotostep(1).
Figure4: XNFde ompositionalgorithm.
3. fq; q:: 1 :l 1 ; :::; q:: n :l n g! q:, andfq:; q:: i :l i g!q:: i fori2[1;n℄ 1 .
This onstru tion,whenappliedtothestudentexample
fromtheintrodu tion,yieldsexa tlytherevisedDTD,
with beinginfo,lbeingname,
1
beingnumberand
l
1
beingsno.
We are notinterestedin applying this transformation
toanarbitraryanomalousFD,butrathertoaminimal
one. Intherelational ontext,aminimalFDisX !A
su h that X 0
6! A for any X 0
$ X. In our ase the
denition is a bit more omplex to a ount for paths
used in FDs. We say that fq;p
1 :l 1 ;:::;p n :l n g ! p 0 :l 0
is (D;)-minimalif there is no anomalous FD
S 0 ! p i :l i 2 (D;) +
su h that i 2 [0;n℄ and S 0 is asubsetoffq;p 1 ;:::;p n ;p 0 :l 0 ;:::;p n :l n gsu hthat jS 0 jnandS 0
ontainsatmostoneelementpath.
Proposition 6 Let(D 0
; 0
)be onstru tedfrom(D;)
byusingeitherthe\movingattributes" onstru tion,or
the\ reatingnewelementtypes" onstru tionappliedto
a (D;)-minimal FD. Then AP(D 0
; 0
) $AP(D;).
2
The algorithm The algorithm applies the two
trans-formationsuntilthes hemaisinXNF,asshownin
g-ure4. ItinvolvesFDimpli ation,thatis,testing
mem-bershipin(D;) +
(and onsequentlytestingXNFand
(D;)-minimality), whi h will bedes ribed in Se tion
7. Sin e ea h step redu es the number of anomalous
paths(Proposition6),weobtain:
Theorem2 TheXNF de ompositionalgorithm
termi-nates,andoutputsaspe i ation (D;)inXNF.
1
If? anbeavalueofp:lintuples
D
(T),thedenitionmustbe
modiedslightly,bylettingP 0 ()be 1 ;:::; n ;( 0 j),where 0 is
fresh,makinglanattributeof 0
,andmodifyingthedenitionof
FDsa ordingly.
stillde omposeintoXNF,althoughthenalresultmay
notbeasgoodaswithusing theimpli ation. Aslight
modi ationoftheproofofProposition 6yields:
Proposition7 Consider a simpli ation of the XNF
de ompositionalgorithmwhi h only onsistsofstep(3)
applied toFDs S!p:l2, andin whi h the
deni-tion of[p:l :=q:[ 1 :l 1 ; :::; n :l n ; l℄℄ is
modi-edbyusinginstead of(D;) +
. Then su han
algo-rithmalways terminatesanditsresultisinXNF.
Lossless De ompositions Toprovethatour
transfor-mations do not lose any information from the
do u-ments, we dene the on ept of lossless
de omposi-tionssimilarlytotherelationalnotionof\generi
dom-inan e"from[18℄. Thatnotionrequirestheexisten eof
two relationalalgebra queriesthat translate ba k and
forth between two relational s hemas. Adapting this
denition poses two problems in oursetting: rst, no
XMLquerylanguageyethasthesame\yardsti k"
sta-tus asrelational algebrafor relationaldatabases, and
se ond, our transformations generate new node ids,
whi h annotbedes ribedbygeneri queries.
To deal with this, we use the relational
represen-tation via the tuples
D
() operator, and say that
(D 2 ; 2 )isalosslessde ompositionof(D 1 ; 1 ),written (D 1 ; 1 ) lossless (D 2 ; 2
),ifthereexistrelational
alge-braqueriesQ 1 ;Q 0 1 ;Q 2
su hthat foranyTj=(D
1 ; 1 ), thereexistsT 0 j=(D 2 ; 2
)su hthatthediagrambelow
ommutes: T T 0 tuples D 1 (T) tuples D 1 ? Q1 - Q 0 1 Q 1 (tuples D 1 (T)) Q2 tuples D 2 (T 0 ) tuples D 2 ?
ThegoalofqueryQ
2
istoeliminateextranodeidsthat
mayo urinT 0
but notinT;thenQ
1 andQ
0
1
goba k
andforthbetweentuples
D
1
(T)andtheresultofQ
2 on tuples D2 (T 0
). Asrelationsoftheform tuples
D
(T)may
ontainnulls,weusethesemanti sofCoddtables[1,19℄
forevaluatingrelationalalgebraqueriesonthem.
Proposition8 (a) Therelation
lossless istransitive. (b) If (D 0 ; 0
) isobtained from (D;) by using one of
the transformations from the normalization algorithm,
then(D;) lossless (D 0 ; 0 ). Thus, if(D 0 ; 0
) istheoutput ofthenormalization
al-gorithmon(D;),then(D;) lossless (D 0 ; 0 ).
More-over,thetransformationsonthedo uments anbe
Inthepreviousse tionwesawthatitispossibleto
loss-lessly onvert aDTD into onein XNF. Thealgorithm
usedFDimpli ation. Wenowshowthatformost lasses
ofDTDsusedinpra ti e,thisproblemistra table. We
assume,withoutlossof generality,that allFDshavea
singlepathontheright-handside.
Typi ally,regularexpressionsusedin DTDsarerather
simple. We now formulate a riterion for simpli ity
that orresponds to the usual pra ti e of writing
reg-ularexpressionsinDTDs. GivenanalphabetA,a
reg-ularexpressionoverAis alled trivialifitof theform
s
1 ;:::;s
n
,whereforea hs
i thereisalettera i 2Asu h that s i is either a i ora i ? (whi h abbreviates a i j), or a + i ora i , andfori6=j,a i 6=a j
. We all aregular
ex-pressionssimple ifthereisatrivialregularexpression
s 0
su h that anywordwin thelanguagedenoted bys
isapermutationofawordin thelanguagedenoted by
s 0
,andvi eversa.
Forexample,(ajbj ) is simple: a ;b ; istrivial,and
every word in (ajbj ) is a permutation of a word in a ;b ;
and vi e versa. ADTD is alled simple if all
produ tions in it use simple regular expressions over
E[fSg. Simple regular expressions are prevalent in
DTDs. Forinstan e,theBusinessPro essSpe i ation
S hemaofebXML[10℄,asetofspe i ationsto ondu t
businessovertheInternet,isasimpleDTD.Partofthis
s hemaisshowedin gure5.
Theorem3 The impli ation problem for FDs over
simpleDTDs issolvable inquadrati time.
InasimpleDTD,disjun tion anappearinexpressions
of the form (aj) or (ajb)
, but a general disjun tion
(ajb) is not allowed. We now show that the
impli a-tion problem remains tra table if the number of su h
unrestri teddisjun tionsissmall.
A regularexpression s overanalphabet A is asimple
disjun tionifs=, s=a,where a2A, ors=s 1 js 2 , wheres 1 ,s 2
aresimpledisjun tionsoveralphabetsA
1 , A 2 and A 1 \A 2 =;. A DTD D =(E;A; P; R ; r)is
alleddisjun tiveifforevery 2E,P()=s
1 ;:::;s m , whereea h s i
is eitherasimpleregularexpressionora
simpledisjun tionoveranalphabetA
i (i2[1;m℄),and A i \A j
=;(i;j 2[1;m℄ andi 6=j). This generalizes
the on eptofasimpleDTD.
With ea h disjun tive DTD D, we asso iate a
num-ber N
D
that measures the omplexity of unrestri ted
disjun tions in D. Formally, for a simple regular
ex-pression s, N
s
= 1. If s is a simple disjun tion,
then N
s
is the number of symbols j in s plus 1. If
P() =s 1 ;:::;s n , then N is 1,if s 1 ;:::;s n is a
sim-pleregularexpression,N
=jfp2paths(D)jlast(p)= gj N s 1 N s n otherwise. Finally,N D = Q 2E N .
problem fordisjun tiveDTDs DwithN
D
klog (jDj)
issolvable in polynomialtime. 2
Therearesome lassesofDTDsforwhi hthe
impli a-tion problem is nottra table. Onesu h lass onsists
ofarbitrarydisjun tiveDTDs. Another lassisthat of
relationalDTDs. WesaythatD isarelationalDTDif
forea hXML treeT j=D,ifX isanon-emptysubset
oftuples
D
(T),thentrees
D
(X)j=D.
This lass ontainsregularexpressionsliketheone
be-low,from aDTD forFrequentlyAskedQuestions[17℄:
<!ELEMENT se tion (logo*, title, (qna+ | q+ |
( p | div | se tion)+))>
There exist non-relational DTDs (for example,
<!ELEMENT a (b,b)>). However:
Proposition9 Every disjun tive DTD isrelational.
Theorem 5 The FD impli ation problem over
rela-tional DTDs and over disjun tive DTDs is
oNP- omplete. 2
Relational DTDs have the following useful property
that letsusestablishthe omplexityof testingXNF.
Proposition10 Given arelational DTD D and aset
of FDs over D, (D;) is in XNF i for ea h
non-trivial FD of the form S ! p:l or S ! p:S in ,
S!p2(D;) +
. 2
Fromthis, weimmediatelyderive:
Corollary 1 Testingif (D;) is in XNF an bedone
in ubi timefor simple DTDs, and is oNP- omplete
for relationalDTDs. 2
8 FutureResear h
Thede omposition algorithm anbeimprovedin
vari-ousways,andweplan toworkonmakingitmore
eÆ- ient. We alsowould liketond a omplete
lassi a-tion of the omplexity of the FD impli ation problem
forvarious lassesofDTDs.
AsprevalentasBCNFis,itdoesnotsolveallthe
prob-lemsofrelationals hemadesign,andone annotexpe t
XNF to addressall short omings of DTD design. We
plan towork onextendingXNFto morepowerful
nor-mal forms,in parti ularby takinginto a ount
Pro essSpe ifi ation | Pa kage | BinaryCollaboration | BusinessTransa tion |
MultiPartyCollaboration)*)>
<!ELEMENT In lude (Do umentation*)>
<!ELEMENT BusinessDo ument (ConditionExpression?, Do umentation*)>
<!ELEMENT SubstitutionSet (Do umentSubstitution | AttributeSubstitution | Do umentation)*>
<!ELEMENT BinaryCollaboration (Do umentation*, InitiatingRole, RespondingRole, (Do umentation |
Start | Transition | Su ess | Failure | BusinessTransa tionA tivity | CollaborationA tivity |
Fork | Join)*)>
<!ELEMENT Transition (ConditionExpression?, Do umentation*)>
Figure5: PartoftheBusinessPro essSpe i ationS hemaofebXML.
A knowledgmentsDis ussionswithMi haelBenediktand
WenfeiFanwereextremely helpful. Theauthors were
sup-portedinpartbygrantsfromtheNaturalS ien esand
En-gineering Resear h Coun il ofCanada and fromBell
Uni-versityLaboratories.
Referen es
[1℄ S.Abiteboul, R. Hull,and V. Vianu. Foundations of
Databases. Addison-Wesley,1995.
[2℄ J. Albert, D. Giammarresi, D. Wood. Normal form
algorithms for extended ontext-freegrammars. TCS
267(2001),35{47.
[3℄ P.Atzeni,N. Morfuni. Fun tionaldependen iesin
re-lationswithnullvalues.InformationPro essingLetters
18(4): 233{238,(1984).
[4℄ C. Beeri, P. Bernstein, N. Goodman. A
sophisti- ate's introdu tion to database normalization theory.
VLDB'78,pages113{124.
[5℄ P. Buneman, S. Davidson, W. Fan, C. Hara, and
W.Tan. KeysforXML. InWWW'10,2001.
[6℄ P. Buneman, S. Davidson, W. Fan, C. Hara, and
W.Tan. ReasoningaboutkeysforXML. InDBPL'01.
[7℄ P.Buneman,A.Jung, A.Ohori, Using powerdomains
to generalize relational databases, Theoreti al
Com-puterS ien e91(1991),23{55.
[8℄ DBLP. http://dblp.uni-trier.de/.
[9℄ W. F. Dowling and J. H. Gallier. Linear-time
al-gorithms for testing the satisability of propositional
Hornformulae. JLP1(3): 267{284(1984).
[10℄ ebXML. Business Pro essSpe i ationS hema v1.01.
http://www.ebxml.org/spe s/.
[11℄ W. Fan, L. Libkin. OnXML integrity onstraints in
thepresen eofDTDs. InPODS'01,pages114{125.
[12℄ W. Fan, J. Simeon. Integrity onstraints for XML.
PODS'00,pages23{34.
[13℄ M.Fernandez,J.Simeon,P.Wadler.Asemi-monadfor
semi-stru tureddata. ICDT'01,pages263{300.
[14℄ D.Flores u,D.Kossmann. StoringandqueryingXML
data using an RDMBS. IEEE Data Eng. Bull. 22
(1999),27{34.
[15℄ G.Grahne. TheProblemof In ompleteInformationin
RelationalDatabases,Springer,Berlin, 1991.
[16℄ C.Gunter.Semanti sofProgrammingLanguages, The
MITPress,1992.
[17℄ J. Higgins, R. Jellie QAML Version 2.4.
http://xml.as .net/resour e/ qaml- xml. dtd, 1999.
[18℄ R. Hull. Relative information apa ity ofsimple
rela-tionaldatabases hemata.SIAMJournalonComputing
15(3): 856-886(1986).
[19℄ T.ImielinskiandW.Lipski.In ompleteinformationin
relationaldatabases. J.ACM31(1984),761{791.
[20℄ C.Kanne,G.Moerkotte.EÆ ientstorageofXMLdata.
InICDE'00,p.198.
[21℄ M. Levene, G. Loizou. Axiomatisation of fun tional
dependen ies in in omplete relations. TCS 206(1-2):
283{300,1998.
[22℄ W.Y. Mok,Y. K.Ng, D. Embley. A normal formfor
pre isely hara terizingredundan yinnestedrelations.
ACM TODS21(1996),77{106.
[23℄ Z. M.
Ozsoyoglu, L.-Y.Yuan. A newnormalformfor
nestedrelations. ACMTODS12(1): 111{136,1987.
[24℄ Y.Sagiv,C.Delobel,D.S.Parker,R.Fagin.An
equiv-alen e between relational database dependen ies and
a fragment of propositional logi . J.ACM 28 (1981),
435{453.
[25℄ J. Shanmugasundaram, K. Tufte, C. Zhang, G. He,
D. DeWitt, J. Naughton. Relational databases for
querying XMLdo uments: limitations and
opportuni-ties. VLDB'99,pages302{314.
[26℄ D.Su iu. Boundedxpointsfor omplexobje ts. TCS
176(1997),283{328.
[27℄ Z. Tari, J. Stokes, S. Spa apietra. Obje t normal
forms and dependen y onstraints for obje t-oriented
s hemata. ACMTODS22(1997),513{569.
[28℄ I. Tatarinov, Z. Ives, A. Halevy, D. Weld. Updating
XML. InSIGMOD'01,pages413{424.
[29℄ J.VandenBuss he.Simulationofthenestedrelational
algebrabythe atrelationalalgebra, withan
appli a-tion to the omplexity of evaluating powerset algebra
expressions. TCS254(1-2): 363{377,2001.
[30℄ W3C. XML-Data.W3CNote,Jan.1998.
[31℄ W3C. XMLS hema. W3CWorkingDraft,May2001.
[32℄ W3C. XQuery 1.0: An XMLQueryLanguage. W3C