• Nenhum resultado encontrado

course title "Automata Theory" taken_by "csc200" name name name grade "Smith" "st1" "Deere" "A" courses info info name

N/A
N/A
Protected

Academic year: 2021

Share "course title "Automata Theory" taken_by "csc200" name name name grade "Smith" "st1" "Deere" "A" courses info info name"

Copied!
12
0
0

Texto

(1)

Mar elo Arenas

DepartmentofComputerS ien e

UniversityofToronto

marenas s.toronto.edu

Leonid Libkin 

DepartmentofComputerS ien e

UniversityofToronto

libkin s.toronto.edu

Abstra t

Thispapertakesa rststeptowardsthedesignand

nor-malizationtheoryforXML do uments. Weshowthat,

likerelationaldatabases,XMLdo umentsmay ontain

redundant information, and may be prone to update

anomalies. Furthermore, su h problems are aused by

ertainfun tionaldependen iesamongpathsinthe

do -ument. Ourgoalisto ndawayof onvertingan

arbi-traryDTD into awell-designedone, that avoidsthese

problems. We rst introdu e the on ept of a

fun -tional dependen y for XML, and de ne its semanti s

viaarelationalrepresentationofXML. Wethende ne

anXMLnormalform,XNF,thatavoidsupdate

anoma-liesandredundan ies. Westudyitspropertiesandshow

thatitgeneralizesBCNFandanormalformfornested

relationswhen those are appropriately oded asXML

do uments. Finally,wepresentalosslessalgorithmfor

onvertinganyDTDinto oneinXNF.

1 Introdu tion

The on eptsof databasedesignandnormalformsare

akey omponentoftherelationaldatabasete hnology.

Inthispaper,westudydesignprin iplesforXMLdata.

XML has re ently emerged as a new basi formatfor

data ex hange. Although many XML do uments are

viewsofrelationaldata,thenumberofappli ations

us-ingnativeXML do umentsisin reasing rapidly. Su h

appli ationsmayusenativeXMLstoragefa ilities[20℄,

andupdate XMLdata[28℄. Updates,likein relational

databases, may ause anomalies if data is redundant.

Intherelationalworld,anomaliesareavoidedbyusing

well-designed database s hema. XML has its version



Resear haÆliation: BellLaboratories.

ofs hematoo;mostoftenitisDTDs (Do umentType

De nitions), andsomeotherproposalsexist orare

un-derdevelopment[31,30℄. Whatwoulditmeanthenfor

su h as hemato bewellorpoorlydesigned? Clearly,

this questionhasarisenin pra ti e: one an nd

om-panieso eringhelp in\goodDTD design." Thishelp,

however, omes in form of onsulting servi es rather

than ommer ially available software, as there are no

learguidelinesforprodu ingwelldesignedXML.

Our goal is to nd prin iples for good XML data

de-sign, and algorithms to produ e su h designs. We

be-lievethat itisimportantto dothis resear hnow, asa

lot ofdataisbeingput ontheweb. On emassiveweb

databasesare reated,itisveryhardto hangetheir

or-ganization;thus,thereisariskofhavinglargeamounts

ofwidelya essible,butatthesametimepoorly

orga-nizedlega ydata.

Normalizationisoneofthemostthoroughlyresear hed

subje tsindatabasetheory(asurvey[4℄produ edmany

referen es morethan20 years ago),and annotbe

re- onstru ted in a single paper in its entirety. Here we

followthestandardtreatmentofoneofthemost

om-mon (if not themost ommon) normal forms,BCNF.

It eliminates redundan ies and avoids update

anoma-lies whi h they ause by de omposing into relational

subs hemasinwhi heverynontrivialfun tional

depen-den y de nes a key. Justto retra e this development

in theXML ontext,weneedthefollowing:

a) Understanding ofwhat a redundan yand an

up-dateanomalyis.

b) Ade nitionandbasi propertiesoffun tional

de-penden ies(so far, most proposals forXML

on-straints on entrateonkeys).

) A de nition of what \bad" fun tional

dependen- iesare(thosethat auseredundan iesandupdate

anomalies).

d) Analgorithmfor onvertinganarbitraryDTDinto

onethat does notadmit su h bad fun tional

(2)

courses

title

"Automata

Theory"

taken_by

"st1"

"Deere"

"A+"

"st2"

"B-"

student

student

name

grade

name

grade

name

grade

"st1"

"Deere"

"A"

student

student

name

grade

"st3’

"Smith"

"B+"

taken_by

title

"Calculus I"

"mat100"

@cno

@cno

@sno

@sno

@sno

@sno

"csc200"

"Smith"

course

course

(a)

title

"Automata

Theory"

student

grade

"st1"

"A+"

"st2"

grade

"B-"

student

"mat100"

title

"Calculus I"

taken_by

taken_by

student

student

"st1"

grade

"A"

"st3’

grade

"B+"

@cno

@cno

@sno

@sno

@sno

@sno

courses

"csc200"

info

"Smith"

name

@sno

"st1"

info

"Deere"

name

number

@sno

@sno

"st3"

"st2"

number

number

course

course

(b)

Figure 1: ExamplesofXMLdo uments.

Starting with point a), how doesone identify bad

de-signs? WehavelookedatalargenumberofDTDsand

foundtwokindsof ommonlypresentdesignproblems.

Theyareillustratedintwoexamplesbelow.

Example 1.1: Consider the following DTD that

de-s ribesapartofauniversitydatabase:

<!ELEMENT ourses ( ourse*)>

<!ELEMENT ourse (title, taken_by)>

<!ATTLIST ourse

no CDATA #REQUIRED>

<!ELEMENT title (#PCDATA)>

<!ELEMENT taken_by (student*)>

<!ELEMENT student (name, grade)>

<!ATTLIST student

sno CDATA #REQUIRED>

<!ELEMENT name (#PCDATA)>

<!ELEMENT grade (#PCDATA)>

Forevery ourse,westoreitsnumber( no),itstitleand

thelistofstudentstakingthe ourse. Forea hstudent

takinga ourse,westorehis/hernumber(sno),name,

andthegradein the ourse.

AnexampleofanXMLdo umentthat onformstothis

DTD is shown in Figure 1, (a). This do ument

satis- esthefollowing onstraint: anytwostudentelements

withthesamesnovaluemusthavethesamename. This

onstraint(whi hlooksverymu hlikeafun tional

de-penden y), ausesthedo umenttostoreredundant

in-formation: for example, the name Deere for student

st1isstoredtwi e. Andjustasinrelationaldatabases,

su h redundan ies an lead to update anomalies: for

example,updatingthenameofst1foronlyone ourse

resultsin anin onsistentdo ument, andremovingthe

studentfroma oursemayresultin removingthat

stu-In orderto eliminate redundantinformation, weusea

te hniquesimilartotherelationalone,andsplitthe

in-formationaboutthenameandthegrade. Sin ewedeal

withjustoneXMLdo ument,wemustdoitby reating

an extra elementtype, info, for student information,

asshownbelow:

<!ELEMENT ourses ( ourse*, info*)>

<!ELEMENT ourse (title,taken_by)>

<!ATTLIST ourse

no CDATA #REQUIRED>

<!ELEMENT title (#PCDATA)>

<!ELEMENT taken_by (student*)>

<!ELEMENT student (grade)>

<!ATTLIST student

sno CDATA #REQUIRED>

<!ELEMENT grade (#PCDATA)>

<!ELEMENT info (number*,name)>

<!ELEMENT number EMPTY>

<!ATTLIST number

sno CDATA #REQUIRED>

<!ELEMENT name (#PCDATA)>

Ea h infoelement hasas hildrenonenameanda

se-quen e of number elements, with sno as anattribute.

Di erent students an have the same name, and we

groupallstudentnumberssnoforea hnameunderthe

sameinfoelement. Arestru tureddo umentthat

on-formstothisDTD isshowninFigure1,(b). Notethat

st2 and st3 are put together be ause both students

havethesamename.

This example is reminis ent of the anoni al example

of bad relational design aused by non-key fun tional

dependen ies,andsoisthemodi ationofthes hema.

Someexamplesofredundan iesaremore loselyrelated

(3)

database[8℄forstoringdataabout onferen es.

<!ELEMENT db ( onf*)>

<!ELEMENT onf (title, issue+)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT issue (inpro eedings+)>

<!ELEMENT inpro eedings (author+,

title, booktitle)>

<!ATTLIST inpro eedings

key ID #REQUIRED

pages CDATA #REQUIRED

year CDATA #REQUIRED>

<!ELEMENT author (#PCDATA)>

<!ELEMENT booktitle (#PCDATA)>

Ea h onferen e has a title, and one or more issues

(whi h orrespond to years when the onferen e was

held). Papersare stored in inpro eedingselements;

theyearofpubli ationisoneofitsattributes.

Su hado umentsatis esthefollowing onstraint: any

two inpro eedings hildren of the same issuemust

have the same value of year. This too is similar to

relationalfun tionaldependen ies,but nowwereferto

thevalues(theyearattribute)aswellasthestru ture

( hildrenof the same issue). Moreover,weonly talk

about inpro eedings nodes that are hildren of the

sameissueelement. Thus,thisfun tionaldependen y

anbe onsideredrelativetoea hissue.

The fun tional dependen y here leads to redundan y:

yearisstoredmultipletimesfora onferen e. The

natu-ralsolutiontotheprobleminthis aseisnotto reatea

newelementforstoringtheyear,butratherrestru ture

the do ument and make year an attribute of issue.

Thatis,we hangeattributelistsas:

<!ATTLIST issue

year CDATA #REQUIRED>

<!ATTLIST inpro eedings

key ID #REQUIRED

pages CDATA #REQUIRED>

Our goalis to show howto dete t anomalies of those

kinds,andtotransformdo umentsinalosslessfashion

intoonesthat donotsu erfrom thoseproblems.

The rst step towardsthat goal is to introdu e

fun -tionaldependen ies(FDs)forXMLdo uments. Sofar,

mostproposalsforXML onstraintsdealwithkeysand

foreign keys [5, 6, 31℄. We introdu e FDs for XML

by onsideringarelationalrepresentationofdo uments

and de ning FDs on them. The relational

represen-tation is somewhatsimilar to thetotal unnesting of a

nestedrelation[26, 29℄; however,wehavetodealwith

DTDs that may ontainarbitrary regularexpressions,

and be re ursive. Our representation via tree tuples,

introdu ed in Se tion 3, may ontain null values. In

Se tion4,XMLFDsareintrodu edviaFDson

in om-pleterelations[3,21℄.

allowsredundan y- ausingFDs. Wegiveitin Se tion

5,and showthat ournormalform, alledXNF,

gener-alizesBCNF andanestednormalform NNF[22,23℄.

The last step then is to nd an algorithm that

on-vertsanyDTDintooneinXNF.Wedothisin Se tion

6. Onbothexamplesshownearlier,thealgorithm

pro-du es exa tly the desired re onstru tion of the DTD.

The main algorithm usesimpli ation of fun tional

de-penden ies(althoughthereisaversionthatdoesnotuse

impli ation,butitmayprodu esuboptimalresults). In

Se tion7,weshowthatforalarge lassofDTDs,

ov-eringmostDTDsthato urinpra ti e,theimpli ation

problemistra table(in fa t,quadrati ).

Oneofthereasonsforthesu essofthenormalization

theoryis itssimpli ity,at leastforthe ommonlyused

normalformssu hasBCNF,3NFand4NF.Hen e,the

normalizationtheoryforXMLshouldnotbeextremely

ompli ated in order to be appli able. In parti ular,

this was the reason we hose to use DTDs instead of

more omplex formalisms [31, 30℄. This is in perfe t

analogywith the situation in the relational world:

al-thoughSQLDDLisarather ompli atedlanguagewith

numerousfeatures,BCNFde ompositionusesasimple

modelofaset ofattributesand asetoffun tional

de-penden ies.

Related work Forsurveyofrelationalnormalization,

see[1,4℄. Normalizationfornestedrelationsand

obje t-oriented databases is studied in [23, 22, 27℄. Coding

nestedrelationsinto atones,similartoourtreetuples,

is donein [26, 29℄. Weuse FDsand relationalalgebra

queries over in omplete relations using the te hniques

from [3,7,15,19, 21℄. XML onstraints(mostlykeys)

have been studied in [5, 6, 12℄; these onstraints do

notuseDTDs. XML onstraintsthat takesDTDs into

a ountarestudiedin[11℄. Finally,[2℄ onsidersnormal

formsforextended ontext-freegrammarssimilartothe

Greiba hnormalformforCFGs;these,however,donot

ne essarilyguaranteegoodXMLdesign.

2 Notations

Assume that wehavethe followingdisjointsets: El of

element names, Att of attribute names, Str of

possi-blevaluesofstring-valuedattributes,andVert ofnode

identi ers. All attribute names start with the symbol

,and theseare theonlyones startingwiththis

sym-bol. WeletSand ?(null) bereservedsymbolsnotin

anyofthose sets.

De nition1 A DTD (Do ument Type De nition) is

de nedtobeD=(E; A;P;R ;r), where:

 EEl is a nitesetof elementtypes.

(4)

tions: Given 2E,P()=SorP()isaregular

expression de nedasfollows:

::=  j 0

j j j ; j 

where  is the empty sequen e,  0

2 E, and \j",

\;"and\" denoteunion, on atenation,andthe

Kleene losure, respe tively.

 R is a mapping from E to the powerset of A. If

l2R (), wesay thatl isde ned for.

 r 2E and is alledthe element typeofthe root.

Without lossof generality,we assumethat r does

not o urin P()for any  2E.

The symbols  and S represent element type

de lara-tionsEMPTYand #PCDATA,respe tively.

Given a DTD D = (E; A; P; R ; r), a string w =

w 1 w n is a path in D if w 1 = r, w i is in the al-phabet of P(w i 1

), for ea h i 2 [2;n 1℄, and w

n is in the alphabet of P(w n 1 ) or w n = l for some l 2R (w n 1

). We de ne length(w)as n and last(w)

asw

n

. Welet paths(D) stand for theset of all paths

in D andEPaths(D)for thesetof allpathsthat ends

with an element type (ratherthan an attribute or S);

thatis, EPaths(D)=fp2paths(D)jlast(p)2Eg. A

DTDis alled re ursiveifpaths(D)is in nite.

De nition2 An XML tree T is de ned to be a tree

(V;lab;ele;att;root), where

 V Vert isa nite setof verti es(nodes).

 lab:V !El.

 ele:V !Str[V 

.

 attis apartialfun tion V Att !Str. Forea h

v2V,the setfl2Att jatt(v;l)isde ned g is

requiredtobe nite.

 root2V is alledtherootofT.

The parent- hild edge relation on V, f(v

1 ;v 2 ) j v 2 o urs inele(v 1

)g, isrequiredtoforma rootedtree.

Noti e that we do not allow mixed ontent in XML

trees. The hildren of an element node anbe either

zeroormoreelementnodesoronestring.

Given an XML tree T, a string w

1 w n , with w 1 ;:::;w n 1 2El andw n 2El[Att[fSg,isapathin

T ifthere areverti esv

1 v n 1 inV su h that:  v 1 = root, v i+1 is a hild of v i (1  i  n 2), lab(v i )=w i (1in 1). n n n 1 that lab(v n ) = w n . If w n = l, with l 2 Att, then att(v n 1 ;l) is de ned. If w n = S, then v n 1 hasa hildinStr.

Weletpaths(T)standforthesetofpathsinT.

Wenextgiveastandardde nitionofatree onforming

to a DTD (T j= D) aswell as a weakerversion of T

being ompatiblewithD (TD).

De nition3 Given aDTD D =(E; A; P; R ;r) and

anXMLtreeT =(V;lab;ele;att;root),wesaythatT

onformstoD (T j=D)if

 labisamapping fromV toE.

 For ea h v 2 V, if P(lab(v)) =S, then ele(v)=

[s℄, where s 2 Str. Otherwise, if ele(v) =

[v

1 ;:::;v

n

℄,thenthestringlab(v

1

)lab(v

n )must

bein theregularlanguagede nedbyP(lab(v)).

 attisapartialfun tionfromVAtoStrsu hthat

foranyv2V andl2A,att(v;l)isde nedi

l2R (lab(v)).

 lab(root)=r.

We say that T is ompatible with D (written TD)

i paths(T)paths(D).

3 Tree Tuples

Toextendthenotionsoffun tionaldependen iestothe

XMLsetting,werepresentXMLtreesassetsoftuples.

While various mappings from XML to the relational

model have been proposed [14, 25℄, the mappingthat

weuseisofadi erentnature,asourgoalisnotto nd

awayofstoringdo umentseÆ iently,butrather nda

orresponden e betweendo uments and relationsthat

lends itself to anaturalde nition of fun tional

depen-den y.

Various languages proposed for expressing XML

in-tegrity onstraintssu h askeys, [5,6, 31℄, treat XML

trees asunordered(for thepurpose of de ning the

se-manti s of onstraints): that is, the order of hildren

of any givennodeis irrelevantasfar assatisfa tionof

onstraints is on erned. In XML trees, on the other

hand, hildrenofea hnodeareordered. We rstde ne

anotionofsubsumptionthat disregardthisordering.

GiventwoXML treesT

1 =(V 1 ; lab 1 ; ele 1 ; att 1 ;root 1 ) andT 2 =(V 2 ;lab 2 ;ele 2 ;att 2 ;root 2 ),wesaythatT 1 is subsumedbyT 2 ,writtenasT 1 T 2 if  V 1 V 2 .  root =root .

(5)

2  V1 1  att 2  V1Att =att 1 .  Forallv2V 1 ,ele 1 (v)isasublistofapermutation ofele 2 (v).

Thisrelationisapre-order,whi hgivesrisetoan

equiv-alen erelation: T 1 T 2 i T 1 T 2 andT 2 T 1 . That is, T 1 T 2 i T 1 and T 2

areequalasunorderedtrees.

Wede ne [T℄tobethe-equivalen e lassofT.

Wewrite [T℄ j=D if T

1

j=D for someT

1

2 [T℄. It is

easytoseethatforanyT

1 T 2 ,paths(T 1 )=paths(T 2 ); hen eT 1 D i T 2

D. Weshall alsowrite T

1 T 2 whenT 1 T 2 andT 2 6T 1 .

De nition4(Tree tuples) Given a DTD D = (E;

A; P; R ; r), a tree tuple t in D is a fun tion from

paths(D)toVert[Str[f?gsu hthat:

 Forp2EPaths(D),t(p)2Vert[f?g,andt(r)6=?.

 Forp2paths(D) EPaths(D),t(p)2Str[f?g.

 If t(p 1 )=t(p 2 )andt(p 1 )2Vert,thenp 1 =p 2 .  Ift(p 1 )=?andp 1 isapre xofp 2 ,thent(p 2 )=?.  fp2paths(D)jt(p)6=?g is nite.

T(D)isde nedtobethesetofalltreetuplesinD. For

atree tuplet andapathp, wewritet:pfor t(p).

Example 3.1: Suppose that D is theDTD shown in

example1.1 (a). Thenatreetuplein D assignsvalues

toea hpathin paths(D)asisshownin gure2(a).

Weusenulls(?) in treetuplesbe auseofthe

disjun -tionin DTDs. Forexample, letD =(E; A; P; R ;r),

where E = fr;a;bg, A = ;, P(r) = (ajb), P(a) = 

andP(b)=. Thenpaths(D)=fr;r:a;r:bgbutnotree

tuple omingfrom an XML tree onformingto D an

assignnon-nullvaluestobothr:aandr:b.

IfDisare ursiveDTD,thenpaths(D)isin nite;

how-ever,onlya nite numberofvaluesin atree tupleare

di erent from ?. For ea h tree tuple t, its non-null

valuesgiveriseto anXML treeasfollows.

De nition5(tree

D

) Given a DTD D = (E; A; P;

R ;r) andatree tuple t2T(D), tree

D

(t) isde ned to

bean XML tree (V;lab;ele;att;root), whereroot=t:r

and

 V =fv2Vertj9p2paths(D)su hthatv=t:pg.

 If v=t:p,thenlab(v)=last(p).

 If v = t:p, then ele(v) is de ned to be the list

ontaining ft:p 0 j t:p 0 6= ?andp 0 = p:; 2 E,orp 0

=p:Sg,orderedlexi ographi ally.

att(v;l)=t:p:l.

Example 3.2: Thenon-nullvaluesofthe treetuple t

shownin gure2(a) giveriseto theXML treeshown

in gure2(b).

Note that tree

D

(t) need not onform to the DTD D,

but:

Proposition1 If t2T(D), thentree

D

(t)D. 2

We would like to des ribe XML trees in terms of the

tuples they ontain. For this, we need to sele t

tu-ples ontaining the maximal amount of information.

This is done via the usual notion of ordering on

tu-ples (and relations) with nulls, [7, 15, 16℄. If wehave

two tree tuples t

1 ;t 2 , we write t 1 v t 2 if whenever t 1 :p is de ned, then so is t 2 :p, and t 1 :p 6= ? implies t 1 :p = t 2 :p. As usual, t 1  t 2 means t 1 v t 2 and t 1 6=t 2

. Giventwosetsoftreetuples,XandY,wewrite

X v [ Y if8t 1 2X9t 2 2Y t 1 vt 2 . De nition6 (tupl es D ) Given a DTD D and an

XML tree T su h that T D, tuples

D

(T) is de ned

tobethe setof maximal, wrt v, tree tuples t su hthat

tree

D

(t)issubsumedbyT;thatis:

max v ft2T(D)jtree D (t)Tg: Observe that T 1  T 2 implies tuples D (T 1 ) = tuples D (T 2 ). Hen e, tuples D applies to equivalen e lasses: tuples D ([T℄)=tuples D (T).

Proposition2 If T D, then tuples

D

(T) is a nite

subset of T(D). Furthermore, tuples

D ()is monotone: T 1 T 2 implies tuples D (T 1 )v [ tuples D (T 2 ). 2

Finally,wede nethetreesrepresentedbyasetoftuples

X asthe minimal, with respe tto , trees ontaining

alltuplesinX.

De nition 7(trees

D

) Given aDTD D and aset of

treetuples X T(D), trees

D (X)isde nedtobe: min  fT jTD and8t2X; tree D (t)Tg: ForT 2trees D (X)and T 0 T, T 0 2trees D (X); thus trees D

(X)isaunionofequivalen e lasses.

ThefollowingshowsthateveryXMLdo ument anbe

representedasasetoftreetuples,ifwe onsideritasan

unorderedtree. That is,atreeT anbere onstru ted

from tuples

D

(T),uptoequivalen e.

Theorem 1 Given a DTD D and an XML tree T, if

TD,thentrees

D

(6)

t( ourses)=v 0 t( ourses: ourse)=v 1 t( ourses: ourse: no)= s 200 t( ourses: ourse:title)=v 2

t( ourses: ourse:title:S)=Automata Theory

t( ourses: ourse:taken by)=v

3

t( ourses: ourse:taken by:student)=v

4

t( ourses: ourse:taken by:student:sno)=st1

t( ourses: ourse:taken by:student:name)=v

5

t( ourses: ourse:taken by:student:name:S)=Deere

t( ourses: ourse:taken by:student:grade)=v

6

t( ourses: ourse:taken by:student:grade:S )=A+

(a)Valuesoft 0 v 1 v 2 v 3 v4 v 5 v 6 s 200 Automata Theory st1 Deere A+ (b)tree D (t)

Figure2: Treetuplet anditstreerepresentation.

The onversedoesnothold, but anbepartially

re ov-eredwhentrees

D

(X) isasingleequivalen e lass. We

saythatX T(D)isD- ompatibleifthereisanXML

treeT su hthatT DandX tuples

D (T).

Proposition 3 If X  T(D) is D- ompatible, then

(a) There is an XML tree T su h that T D and

trees D (X)=[T℄,and(b)X v [ tuples D (trees D (X)).

Theorem 1 and Proposition 3 are summarized in the

diagram presentedin the following gure. In this

dia-gram,XisaD- ompatiblesetoftreetuples. Thearrow



-standsforthev [ ordering. X trees D -[T℄ X 0 tuples D ? trees D 6  -4 Fun tionalDependen ies

We de ne fun tional dependen ies for XML by using

tree tuples. For a DTD D, a fun tional dependen y

(FD) over D is an expression of the form S

1 ! S 2 whereS 1 ;S 2

are nite non-emptysubsetsofpaths(D).

Theset ofallFDsoverDisdenoted byFD(D).

ForS paths(D), and t;t 0 2T(D),t:S =t 0 :S means t:p =t 0

:pfor all p2 S. Furthermore,t:S 6=? means

t:p6=?forallp2S.

If S

1 !S

2

2FD(D)and T isan XMLtreesu hthat

TD and S

1 [S

2

paths(T),wesaythat T satis es

S 1 ! S 2 (written T j= S 1 ! S 2 ) if for everyt 1 ;t 2 2 tuples D (T),t 1 :S 1 =t 2 :S 1 andt 1 :S 1 6=?implyt 1 :S 2 = t 2 :S 2

. Thisextendstoequivalen e lasses,sin eforany

FD',andT T 0

,T j='i T 0

j='.

We write T j= , for  FD(D), if T j= 'for ea h

'2,andwewriteT j=(D;),ifT j=D andT j=.

Example4.1: Referringba ktoExample1.1,wehave

thefollowingFDs. noisakeyof ourse:

ourses. ourse. no! ourses. ourse. (FD1)

AnotherFDsaysthattwodistin tstudentsubelements

ofthesame ourse annothavethesamesno:

f ourses. ourse,

ourses. ourse.taken by.student.snog!

ourses. ourse.taken by.student. (FD2)

Finally,tosaythattwostudentelementswiththesame

snovaluemusthavethesamename,weuse

ourses. ourse.taken by.student.sno!

ourses. ourse.taken by.student.name.S. (FD3)

Weo erafewremarksonourde nition ofFDs. First,

using thetree tuplesrepresentation, itis easy to

om-bine node and value equality: the former orresponds

to equality between verti es and the latter to

equal-ity between strings. Moreover, keysnaturally appear

asasub lass of FDs,and relative onstraints analso

be en oded. Note that by de ning the semanti s of

FD(D) on T(D), we essentially de ne satisfa tion of

FDsonrelationswithnullvalues,andoursemanti sis

(7)

wesaythat(D;)implies',written (D;)`',iffor

anytreeT with T j=D andT j=, itisthe asethat

T j= '. The set of all FDs implied by (D;) will be

denotedby(D;) +

.

AnFD'istrivialif(D;;)`'. Inrelationaldatabases,

theonly trivialFDs are X !Y, with Y  X. Here,

DTD for essomemoreinterestingtrivialFDs. For

in-stan e, for ea h p 2 EPaths(D) and p 0

a pre x of p,

(D;;) `p! p 0

. Furthermore,for p;p:l 2paths(D),

(D;;)`p!p:l.

5 XNF:An XML Normal Form

With the de nitions of the previous se tion, we are

ready to present the normal form that generalizes

BCNFforXMLdo uments.

De nition8 Given a DTD D and   FD(D),

(D;) is in XML normal form (XNF ) i for every

non-trivial FD ' 2 (D;) +

of the form S ! p:l or

S!p:S,itisthe asethat S!pisin(D;) +

.

The intuition is as follows. Suppose that S ! p:l

is in (D;) +

. If T is an XML tree onforming to D

and satisfying , then in T for every set of values of

theelementsin S, we an nd onlyonevalue ofp:l.

Thus, for everyset of valuesof S weneedto storethe

valueofp:lonlyon e;inotherwords,S!pmustbe

impliedby(D;).

In this de nition, we impose the ondition that ' is

anon-trivial FD. Indeed, the trivial FD p:l ! p:l

is always in (D;) +

, but often p:l ! p 62 (D;) +

,

whi hdoesnotne essarilyrepresentabaddesign.

ToshowhowXNFdistinguishesgoodXMLdesignfrom

baddesign,werevisittheexamplesfrom the

introdu -tion,andprovethatXNFgeneralizesBCNFandNNF,

anormalformfornestedrelations[22,23℄.

Example5.1: ConsidertheDTDfromexample1.1(a)

whoseFDsare(FD1),(FD2),(FD3)showninthe

previ-ousse tion. (FD3)asso iatesauniquenamewithea h

studentnumber,whi histhereforeredundant. The

de-signisnotinXNF,sin eit ontains(FD3)butdoesnot

implytheFD

ourses: ourse:taken by:student:sno!

ourses: ourse:taken by:student:name

To remedy this, we gave a revised DTD in example

1.1(b). Theideawasto reateanewelementinfofor

storinginformationaboutstudent. Thatdesignsatis es

FDs(FD1),(FD2)aswellas

ourses.info.number.sno ! ourses.info,

Example5.2:SupposethatDistheDBLPDTDfrom

example1.2. Among theset ofFDs satis edbythe

do umentsare:

db. onf.title.S!db. onf (FD4)

db. onf.issue!

db. onf.issue.inpro eedings.year (FD5)

Forea hissueofa onferen e,itsyearisstoredinevery

arti leinthat issue;thus, (D;)isnotin XNF ,sin e

db: onf:issue!db: onf:issue:inpro eedings

isnotin(D;) +

.

The solution we proposed in the introdu tion was to

make yearan attribute of issue. (FD5) is not valid

intherevisedspe i ation,whi h anbeeasilyveri ed

to bein XNF . Note that we donot repla e(FD5) by

db. onf.issue!db. onf.issue.year,sin eitisatrivial

FDandthusisimpliedbythenewDTDalone.

BCNF and XNF Relational databases an be

eas-ily mapped into XML do uments. Given a s hema

G(A 1 ;:::;A n ), a DTD D G

has two element types

db and G, P(db) = G  , P(G) = , and R (G) = fA 1 ;:::;A n g. For a set F of FDs over G, we de ne a set  F of FDs over D G that

in- ludes, for ea h A

i 1 A i m ! A j in F an FD fdb:G:A i1 ;:::;db:G:A im g ! db:G:A j , as well as fdb:G:A 1 ;:::;db:G:A n

g ! db:G (to avoid

dupli- ates).

Example 5.3: As hemaG(A;B;C) anbe odedby

thefollowingDTD: <!ELEMENT db (G*)> <!ELEMENT G EMPTY> <!ATTLIST G A CDATA #REQUIRED B CDATA #REQUIRED C CDATA #REQUIRED>

In this s hema, an FD A ! B is translated into

db:G:A!db:G:B. Proposition4 (G;F) is in BCNF i (D G ; F ) is in XNF. 2

NNF and XNF A nested s hema is either a set of

attributes X, or X(G 1 )  :::(G n )  , where G i 's are

nested s hemas. An example of a nested relation for

the s hema H 1 = Country(H 2 )  , H 2 = State(H 3 )  , H 3

=City isshownin gure3(a).

(8)

United States State Texas City Houston Dallas State Ohio City Columbus Cleveland

(a)NestedrelationH1

Country State City

UnitedStates Texas Houston

UnitedStates Texas Dallas

UnitedStates Ohio Columbus

UnitedStates Ohio Cleveland

(b)CompleteunnestingofH1

Figure3: Nested relationand itsunnesting.

nesteds hemaG=X(G 1 )  :::(G n )  ,weintrodu e an

element typeGwith P(G) =G  1 ;:::;G  n and R (G)= fA 1 ;:::;A n g,where X =fA 1 ;:::;A n g;atthe top

levelwehaveanewelementtypedb withP(db)=G 

andR (db)=;. InourexampletheDTDis:

<!ELEMENT db (H1*)>

<!ELEMENT H1 (H2*)>

<!ATTLIST H1 Country CDATA #REQUIRED>

<!ELEMENT H2 (H3*)>

<!ATTLIST H2 State CDATA #REQUIRED>

<!ELEMENT H3 EMPTY>

<!ATTLIST H3 City CDATA #REQUIRED>

Thede nition ofFDs fornestedrelationsusesthe

no-tionof omplete unnesting. The ompleteunnestingof

anestedrelationfromourexampleisshownin gure3

(b);ingeneral,thisnotioniseasilyde nedbyindu tion.

Inourexample,wehaveavalidFDState !Country,

whiletheFD State!City doesnothold.

Normalizationisusually onsideredfornestedrelations

inthepartitionnormalform(PNF)[1,22,23℄. Anested

relationroverX(G 1 )  :::(G n ) 

isinPNFifforanytwo

tuples t 1 , t 2 in r: (1) if t 1 :X =t 2

:X, then the nested

relationt 1 :G i and t 2 :G i

areequal, for everyi2 [1;n℄,

and(2)ea hnestedrelationt

1 :G

i

mustbeinPNF,for

every i 2 [1;n℄. Note that PNF an be enfor ed by

usingFDsontheXMLrepresentation. Inourexample

thisisdoneasfollows:

db:H 1 :Country ! db:H 1 fdb:H 1 ; db:H 1 :H 2 :Stateg ! db:H 1 :H 2 fdb:H 1 :H 2 ; db:H 1 :H 2 :H 3 :Cityg ! db:H 1 :H 2 :H 3

It turnsout that one an de ne FDsovernested

rela-tionsbyusingtheXMLrepresentation. LetU beaset

ofattributes, G

1

anestedrelations hemaoverU and

FD aset offun tional dependen iesoverG

1

. Assume

that G in ludes nested relation s hemas G , :::, G

andasetofattributesU 0 U. Forea hG i (i2[1;n℄), path(G i

)is indu tivelyde ned asfollows. If G

i =G 1 , then path(G i ) = db:G 1 . Otherwise, if G i is a nested attribute of G j , then path(G i ) = path(G j ):G i . Fur-thermore, ifA2U 0 isan atomi attribute ofG i , then path(A) =path(G i

):A. Forinstan e, forthe s hema

ofthenestedrelationinFigure3,path(H

2

)=db:H

1 :H

2

andpath(City)=db:H

1 :H 2 :H 3 :City. Wenowde ne FD asfollows:  For ea h FD A i1 A im ! A i 2 FD, fpath(A i 1 ); :::; path(A i m )g ! path(A i ) is in  FD .  For ea h i 2 [1;n℄, if A j1 ;:::;A jm is the set of atomi attributesofG i andG i isanestedattribute of G j , fpath(G j ); path(A j1 ); :::; path(A jm )g ! path(G i )is in FD . Furthermore,ifB j1 ;:::;B jl

isthesetofatomi

at-tributesofG 1 ,thenfpath(B j 1 ); :::;path(B j l )g! path(G 1 )isin  FD .

Note that the last rule imposes the partition normal

form.

ANestedNormalForm(NNF)fornestedrelationswas

proposed in [22, 23℄. Here weuse the presentation of

[22℄ restri ted to FDs only. Given a nestedrelational

s hemaGandasubs hemaR ,forea hatomi attribute

AofRwede nean estor(A)astheunionoftheatomi

attributesofallthenestedrelations hemasmentioned

in path(R ). Forinstan e, an estor(State) =fCountry;

Stateg. IfFDisasetofFDsoverG,thensaythatitisin

NNFifforea hnon-trivialFDX!A(A2U),ifX !

A2(G;FD) +

,thenX !an estor(A)2(G;FD) +

. As

before, (G;FD) +

standsfor theset of allFDs implied

by(G;FD).

The resultbelow saysthat anestedrelational s hema

(9)

G FD de nedabove,isinXNF. Proposition 5 (G;FD)isinNNFi (D G ; FD )isin XNF. 2 6 NormalizingXML Do uments

WeshowhowtotransformaDTDDandasetofFDs

intoanewspe i ation(D 0

; 0

)thatisinXNFand

on-tainsthesameinformation. Throughoutthese tion,we

assumethattheDTDsarenon-re ursive(the re ursive

ase anbehandledinaverysimilarfashion),andthat

all FDs are of the form: fq;p

1 :l 1 ;:::;p n :l n g ! p.

Thatis, they ontainat mostoneelementpathonthe

left-hand side. Note that all the FDs we have seen

sofar are of this form. While onstraintsof the form

fq;q 0

;:::garenotforbidden,theyappeartobequite

un-natural,and anbeeasilyeliminatedby reatinganew

attributelandsplittingfq;q 0 g[S!pintoq 0 :l!q 0 and fq;q 0

:lg[S !p. Furthermore,we assumethat

pathsdonot ontainthesymbolS(sin ep:S analways

berepla edbyapathoftheformp:l).

Given a DTD D and a set of FDs , a non-trivial

FD S ! p:l is alled anomalous, over (D;), if

it violates XNF; that is, S ! p:l 2 (D;) +

but

S ! p 62 (D;) +

. A path on the right-hand side of

ananomalousFDis alledananomalouspath,andthe

setofallsu h pathsisdenoted byAP(D;).

Thealgorithm ombinestwobasi ideaspresentedinthe

introdu tion: reatinganewelementtype,andmoving

anattribute.

Movingattributes LetD=(E;A;P;R ;r)beaDTD,

p:l 2 paths(D), q 2 EPaths(D) and m be an

at-tribute. The DTD D[p:l :=q:m℄ is onstru ted by

moving the attribute l from the set of attributes of

last(p)totheset ofattributesoflast(q),and hanging

itsnameto m,asshownin thefollowing gure.

r l last(p) m last(q) p q Formally,D[p:l :=q:m℄ is(E; A 0 ; P; R 0 ; r), where A 0 = A[fmg, R 0 (last(q)) = R (last(q))[fmg, R 0

(last(p))=R (last(p))nflgandR 0 ( 0 )=R ( 0 )for ea h  0

2Enflast(q);last(p)g. Thisis the samekind

oftransformationwesawinmovingtheyearattribute

intheDBLPexample.

q:m℄ overD[p:l :=q:m℄ onsistsof allFDs S

1 ! S 2 2(D;) + withS 1 [S 2 paths(D[p:l:=q:m℄).

Creating new element types Let D =(E; A; P; R ;

r)beaDTD,S=fq; p 1 :l 1 ;:::;p n :l n gpaths(D)

su h that n  1 and q 2 EPaths(D). We onstru t

a new DTD D 0

by reating a new element type  as

a hild of the last elementof q, making 

1 , :::, 

n its

hildren, l its attribute, and l

1 ;:::;l n attributes of 1 ,:::,  n

. Furthermore,weremovel from theset

of attributes of the last elementof p, asshown in the

following gure.

. . .

. . .

r p  ln l 1 1 n p 1 last(p1) last(pn) l l n l 1 l p n q last(q) last(p) Formally, if f;  1 ; :::;  n

g are element types whi h

are not in E, the new DTD, denoted by D[p:l :=

q:[ 1 :l 1 ;:::; n :l n ;l℄℄,is(E 0 ; A;P 0 ;R 0 ;r), where E 0 =E[f; 1 ;:::; n gand 1. P 0 (last(q)) = P(last(q));  , P 0 () =   1 ;:::;  n , P 0 ( i

)=, forea hi2[1;n℄, andP 0 ( 0 )=P( 0 ) forea h 0 2Enflast(q)g. 2. R 0 () =flg,R 0 ( i )=fl i g, forea h i2[1;n℄, R 0

(last(p))=R (last(p))nflgandR 0 ( 0 )=R ( 0 ) forea h 0 2Enflast(p)g. GivenD 0 = D[p:l :=q:[ 1 :l 1 ;:::; n :l n ;l℄℄ and

a set  of FDs over D, we de ne a set [p:l :=

q:[ 1 :l 1 ; :::;  n :l n ;l℄℄ of FDs over D 0 asthe set

that ontainsthefollowing:

1. S 1 !S 2 2(D;) + withS 1 [S 2 paths(D 0 ); 2. Ea h FD over q, p i , p i :l i (i 2 [1;n℄) and p:l

is transferred to  and its hildren. That is, if

S 1 [S 2 fq;p 1 ;:::;p n ;p 1 :l 1 ;:::;p n :l n ;p:lg and S 1 !S 2 2(D;) + , then we in lude an FD obtainedfrom S 1 ! S 2 by hangingp i to q:: i , p:l toq:::l,and p:ltoq::l;

(10)

(1) If(D;)isinXNFthenreturn(D;),

other-wisegotostep(2).

(2) If there is an anomalous FD S ! p:l with

q 2 EPaths(D)\S and q ! S 2 (D;) + , then: D :=D[p:l:=q:m℄ :=[p:l:=q:m℄

where misfresh, andgotostep(1).

(3) Choosea(D;)-minimalanomalousFDS!

p:l,whereS=fq;p 1 :l 1 ;:::;p n :l n g.

Cre-atefreshelementtypes,

1 ,:::,  n ; set D:=D[p:l:=q:[ 1 :l 1 ;:::; n :l n ;l℄℄ :=[p:l:=q:[ 1 :l 1 ;:::; n :l n ;l℄℄ andgotostep(1).

Figure4: XNFde ompositionalgorithm.

3. fq; q:: 1 :l 1 ; :::; q:: n :l n g! q:, andfq:; q:: i :l i g!q:: i fori2[1;n℄ 1 .

This onstru tion,whenappliedtothestudentexample

fromtheintrodu tion,yieldsexa tlytherevisedDTD,

with beinginfo,lbeingname,

1

beingnumberand

l

1

beingsno.

We are notinterestedin applying this transformation

toanarbitraryanomalousFD,butrathertoaminimal

one. Intherelational ontext,aminimalFDisX !A

su h that X 0

6! A for any X 0

$ X. In our ase the

de nition is a bit more omplex to a ount for paths

used in FDs. We say that fq;p

1 :l 1 ;:::;p n :l n g ! p 0 :l 0

is (D;)-minimalif there is no anomalous FD

S 0 ! p i :l i 2 (D;) +

su h that i 2 [0;n℄ and S 0 is asubsetoffq;p 1 ;:::;p n ;p 0 :l 0 ;:::;p n :l n gsu hthat jS 0 jnandS 0

ontainsatmostoneelementpath.

Proposition 6 Let(D 0

; 0

)be onstru tedfrom(D;)

byusingeitherthe\movingattributes" onstru tion,or

the\ reatingnewelementtypes" onstru tionappliedto

a (D;)-minimal FD. Then AP(D 0

; 0

) $AP(D;).

2

The algorithm The algorithm applies the two

trans-formationsuntilthes hemaisinXNF,asshownin

g-ure4. ItinvolvesFDimpli ation,thatis,testing

mem-bershipin(D;) +

(and onsequentlytestingXNFand

(D;)-minimality), whi h will bedes ribed in Se tion

7. Sin e ea h step redu es the number of anomalous

paths(Proposition6),weobtain:

Theorem2 TheXNF de ompositionalgorithm

termi-nates,andoutputsaspe i ation (D;)inXNF.

1

If? anbeavalueofp:lintuples

D

(T),thede nitionmustbe

modi edslightly,bylettingP 0 ()be  1 ;:::;  n ;( 0 j),where 0 is

fresh,makinglanattributeof 0

,andmodifyingthede nitionof

FDsa ordingly.

stillde omposeintoXNF,althoughthe nalresultmay

notbeasgoodaswithusing theimpli ation. Aslight

modi ationoftheproofofProposition 6yields:

Proposition7 Consider a simpli ation of the XNF

de ompositionalgorithmwhi h only onsistsofstep(3)

applied toFDs S!p:l2, andin whi h the

de ni-tion of[p:l :=q:[ 1 :l 1 ; :::;  n :l n ; l℄℄ is

modi- edbyusinginstead of(D;) +

. Then su han

algo-rithmalways terminatesanditsresultisinXNF.

Lossless De ompositions Toprovethatour

transfor-mations do not lose any information from the

do u-ments, we de ne the on ept of lossless

de omposi-tionssimilarlytotherelationalnotionof\generi

dom-inan e"from[18℄. Thatnotionrequirestheexisten eof

two relationalalgebra queriesthat translate ba k and

forth between two relational s hemas. Adapting this

de nition poses two problems in oursetting: rst, no

XMLquerylanguageyethasthesame\yardsti k"

sta-tus asrelational algebrafor relationaldatabases, and

se ond, our transformations generate new node ids,

whi h annotbedes ribedbygeneri queries.

To deal with this, we use the relational

represen-tation via the tuples

D

() operator, and say that

(D 2 ; 2 )isalosslessde ompositionof(D 1 ; 1 ),written (D 1 ; 1 ) lossless (D 2 ; 2

),ifthereexistrelational

alge-braqueriesQ 1 ;Q 0 1 ;Q 2

su hthat foranyTj=(D

1 ; 1 ), thereexistsT 0 j=(D 2 ; 2

)su hthatthediagrambelow

ommutes: T T 0 tuples D 1 (T) tuples D 1 ? Q1 - Q 0 1 Q 1 (tuples D 1 (T))  Q2 tuples D 2 (T 0 ) tuples D 2 ?

ThegoalofqueryQ

2

istoeliminateextranodeidsthat

mayo urinT 0

but notinT;thenQ

1 andQ

0

1

goba k

andforthbetweentuples

D

1

(T)andtheresultofQ

2 on tuples D2 (T 0

). Asrelationsoftheform tuples

D

(T)may

ontainnulls,weusethesemanti sofCoddtables[1,19℄

forevaluatingrelationalalgebraqueriesonthem.

Proposition8 (a) Therelation

lossless istransitive. (b) If (D 0 ; 0

) isobtained from (D;) by using one of

the transformations from the normalization algorithm,

then(D;) lossless (D 0 ; 0 ). Thus, if(D 0 ; 0

) istheoutput ofthenormalization

al-gorithmon(D;),then(D;) lossless (D 0 ; 0 ).

More-over,thetransformationsonthedo uments anbe

(11)

Inthepreviousse tionwesawthatitispossibleto

loss-lessly onvert aDTD into onein XNF. Thealgorithm

usedFDimpli ation. Wenowshowthatformost lasses

ofDTDsusedinpra ti e,thisproblemistra table. We

assume,withoutlossof generality,that allFDshavea

singlepathontheright-handside.

Typi ally,regularexpressionsusedin DTDsarerather

simple. We now formulate a riterion for simpli ity

that orresponds to the usual pra ti e of writing

reg-ularexpressionsinDTDs. GivenanalphabetA,a

reg-ularexpressionoverAis alled trivialifitof theform

s

1 ;:::;s

n

,whereforea hs

i thereisalettera i 2Asu h that s i is either a i ora i ? (whi h abbreviates a i j), or a + i ora  i , andfori6=j,a i 6=a j

. We all aregular

ex-pressionssimple ifthereisatrivialregularexpression

s 0

su h that anywordwin thelanguagedenoted bys

isapermutationofawordin thelanguagedenoted by

s 0

,andvi eversa.

Forexample,(ajbj )  is simple: a  ;b  ;  istrivial,and

every word in (ajbj )  is a permutation of a word in a  ;b  ; 

and vi e versa. ADTD is alled simple if all

produ tions in it use simple regular expressions over

E[fSg. Simple regular expressions are prevalent in

DTDs. Forinstan e,theBusinessPro essSpe i ation

S hemaofebXML[10℄,asetofspe i ationsto ondu t

businessovertheInternet,isasimpleDTD.Partofthis

s hemaisshowedin gure5.

Theorem3 The impli ation problem for FDs over

simpleDTDs issolvable inquadrati time.

InasimpleDTD,disjun tion anappearinexpressions

of the form (aj) or (ajb) 

, but a general disjun tion

(ajb) is not allowed. We now show that the

impli a-tion problem remains tra table if the number of su h

unrestri teddisjun tionsissmall.

A regularexpression s overanalphabet A is asimple

disjun tionifs=, s=a,where a2A, ors=s 1 js 2 , wheres 1 ,s 2

aresimpledisjun tionsoveralphabetsA

1 , A 2 and A 1 \A 2 =;. A DTD D =(E;A; P; R ; r)is

alleddisjun tiveifforevery 2E,P()=s

1 ;:::;s m , whereea h s i

is eitherasimpleregularexpressionora

simpledisjun tionoveranalphabetA

i (i2[1;m℄),and A i \A j

=;(i;j 2[1;m℄ andi 6=j). This generalizes

the on eptofasimpleDTD.

With ea h disjun tive DTD D, we asso iate a

num-ber N

D

that measures the omplexity of unrestri ted

disjun tions in D. Formally, for a simple regular

ex-pression s, N

s

= 1. If s is a simple disjun tion,

then N

s

is the number of symbols j in s plus 1. If

P() =s 1 ;:::;s n , then N  is 1,if s 1 ;:::;s n is a

sim-pleregularexpression,N

 =jfp2paths(D)jlast(p)= gj  N s 1    N s n otherwise. Finally,N D = Q 2E N  .

problem fordisjun tiveDTDs DwithN

D

klog (jDj)

issolvable in polynomialtime. 2

Therearesome lassesofDTDsforwhi hthe

impli a-tion problem is nottra table. Onesu h lass onsists

ofarbitrarydisjun tiveDTDs. Another lassisthat of

relationalDTDs. WesaythatD isarelationalDTDif

forea hXML treeT j=D,ifX isanon-emptysubset

oftuples

D

(T),thentrees

D

(X)j=D.

This lass ontainsregularexpressionsliketheone

be-low,from aDTD forFrequentlyAskedQuestions[17℄:

<!ELEMENT se tion (logo*, title, (qna+ | q+ |

( p | div | se tion)+))>

There exist non-relational DTDs (for example,

<!ELEMENT a (b,b)>). However:

Proposition9 Every disjun tive DTD isrelational.

Theorem 5 The FD impli ation problem over

rela-tional DTDs and over disjun tive DTDs is

oNP- omplete. 2

Relational DTDs have the following useful property

that letsusestablishthe omplexityof testingXNF.

Proposition10 Given arelational DTD D and aset

 of FDs over D, (D;) is in XNF i for ea h

non-trivial FD of the form S ! p:l or S ! p:S in ,

S!p2(D;) +

. 2

Fromthis, weimmediatelyderive:

Corollary 1 Testingif (D;) is in XNF an bedone

in ubi timefor simple DTDs, and is oNP- omplete

for relationalDTDs. 2

8 FutureResear h

Thede omposition algorithm anbeimprovedin

vari-ousways,andweplan toworkonmakingitmore

eÆ- ient. We alsowould liketo nd a omplete

lassi a-tion of the omplexity of the FD impli ation problem

forvarious lassesofDTDs.

AsprevalentasBCNFis,itdoesnotsolveallthe

prob-lemsofrelationals hemadesign,andone annotexpe t

XNF to addressall short omings of DTD design. We

plan towork onextendingXNFto morepowerful

nor-mal forms,in parti ularby takinginto a ount

(12)

Pro essSpe ifi ation | Pa kage | BinaryCollaboration | BusinessTransa tion |

MultiPartyCollaboration)*)>

<!ELEMENT In lude (Do umentation*)>

<!ELEMENT BusinessDo ument (ConditionExpression?, Do umentation*)>

<!ELEMENT SubstitutionSet (Do umentSubstitution | AttributeSubstitution | Do umentation)*>

<!ELEMENT BinaryCollaboration (Do umentation*, InitiatingRole, RespondingRole, (Do umentation |

Start | Transition | Su ess | Failure | BusinessTransa tionA tivity | CollaborationA tivity |

Fork | Join)*)>

<!ELEMENT Transition (ConditionExpression?, Do umentation*)>

Figure5: PartoftheBusinessPro essSpe i ationS hemaofebXML.

A knowledgmentsDis ussionswithMi haelBenediktand

WenfeiFanwereextremely helpful. Theauthors were

sup-portedinpartbygrantsfromtheNaturalS ien esand

En-gineering Resear h Coun il ofCanada and fromBell

Uni-versityLaboratories.

Referen es

[1℄ S.Abiteboul, R. Hull,and V. Vianu. Foundations of

Databases. Addison-Wesley,1995.

[2℄ J. Albert, D. Giammarresi, D. Wood. Normal form

algorithms for extended ontext-freegrammars. TCS

267(2001),35{47.

[3℄ P.Atzeni,N. Morfuni. Fun tionaldependen iesin

re-lationswithnullvalues.InformationPro essingLetters

18(4): 233{238,(1984).

[4℄ C. Beeri, P. Bernstein, N. Goodman. A

sophisti- ate's introdu tion to database normalization theory.

VLDB'78,pages113{124.

[5℄ P. Buneman, S. Davidson, W. Fan, C. Hara, and

W.Tan. KeysforXML. InWWW'10,2001.

[6℄ P. Buneman, S. Davidson, W. Fan, C. Hara, and

W.Tan. ReasoningaboutkeysforXML. InDBPL'01.

[7℄ P.Buneman,A.Jung, A.Ohori, Using powerdomains

to generalize relational databases, Theoreti al

Com-puterS ien e91(1991),23{55.

[8℄ DBLP. http://dblp.uni-trier.de/.

[9℄ W. F. Dowling and J. H. Gallier. Linear-time

al-gorithms for testing the satis ability of propositional

Hornformulae. JLP1(3): 267{284(1984).

[10℄ ebXML. Business Pro essSpe i ationS hema v1.01.

http://www.ebxml.org/spe s/.

[11℄ W. Fan, L. Libkin. OnXML integrity onstraints in

thepresen eofDTDs. InPODS'01,pages114{125.

[12℄ W. Fan, J. Simeon. Integrity onstraints for XML.

PODS'00,pages23{34.

[13℄ M.Fernandez,J.Simeon,P.Wadler.Asemi-monadfor

semi-stru tureddata. ICDT'01,pages263{300.

[14℄ D.Flores u,D.Kossmann. StoringandqueryingXML

data using an RDMBS. IEEE Data Eng. Bull. 22

(1999),27{34.

[15℄ G.Grahne. TheProblemof In ompleteInformationin

RelationalDatabases,Springer,Berlin, 1991.

[16℄ C.Gunter.Semanti sofProgrammingLanguages, The

MITPress,1992.

[17℄ J. Higgins, R. Jelli e QAML Version 2.4.

http://xml.as .net/resour e/ qaml- xml. dtd, 1999.

[18℄ R. Hull. Relative information apa ity ofsimple

rela-tionaldatabases hemata.SIAMJournalonComputing

15(3): 856-886(1986).

[19℄ T.ImielinskiandW.Lipski.In ompleteinformationin

relationaldatabases. J.ACM31(1984),761{791.

[20℄ C.Kanne,G.Moerkotte.EÆ ientstorageofXMLdata.

InICDE'00,p.198.

[21℄ M. Levene, G. Loizou. Axiomatisation of fun tional

dependen ies in in omplete relations. TCS 206(1-2):

283{300,1998.

[22℄ W.Y. Mok,Y. K.Ng, D. Embley. A normal formfor

pre isely hara terizingredundan yinnestedrelations.

ACM TODS21(1996),77{106.

[23℄ Z. M. 

Ozsoyoglu, L.-Y.Yuan. A newnormalformfor

nestedrelations. ACMTODS12(1): 111{136,1987.

[24℄ Y.Sagiv,C.Delobel,D.S.Parker,R.Fagin.An

equiv-alen e between relational database dependen ies and

a fragment of propositional logi . J.ACM 28 (1981),

435{453.

[25℄ J. Shanmugasundaram, K. Tufte, C. Zhang, G. He,

D. DeWitt, J. Naughton. Relational databases for

querying XMLdo uments: limitations and

opportuni-ties. VLDB'99,pages302{314.

[26℄ D.Su iu. Bounded xpointsfor omplexobje ts. TCS

176(1997),283{328.

[27℄ Z. Tari, J. Stokes, S. Spa apietra. Obje t normal

forms and dependen y onstraints for obje t-oriented

s hemata. ACMTODS22(1997),513{569.

[28℄ I. Tatarinov, Z. Ives, A. Halevy, D. Weld. Updating

XML. InSIGMOD'01,pages413{424.

[29℄ J.VandenBuss he.Simulationofthenestedrelational

algebrabythe atrelationalalgebra, withan

appli a-tion to the omplexity of evaluating powerset algebra

expressions. TCS254(1-2): 363{377,2001.

[30℄ W3C. XML-Data.W3CNote,Jan.1998.

[31℄ W3C. XMLS hema. W3CWorkingDraft,May2001.

[32℄ W3C. XQuery 1.0: An XMLQueryLanguage. W3C

Referências

Documentos relacionados

Fez-se o levantamento das linhas bases (bases lines) dos motores de indução trifásicos através da integração de seis tecnologias preditivas: análise de vibração, análise

• Se várias colunas tiverem os mesmos nomes mas se os tipos de dados não forem correspondentes, a cláusula NATURAL JOIN poderá ser modificada com a cláusula USING para especificar

[r]

[r]

Dennis Collier d.collier@murphybusiness.com d.collier@murphybusiness.com Dan Hinson dan.hinson@metrobrokers.com dan.hinson@cbcmetrobrokers.com David

E-14, E-17 Jackie Writz

“Lembre-se: se você faz aquilo que deve fazer no exato momento em que tem de ser feito, algum dia você poderá fazer tudo o que desejar, no momento em que desejar. Qualquer coisa

b) Entregar no DESTINADOR, caso o transporte passe a ser por intermédio dos DESTINATÁRIOS, até a data da primeira coleta, a lista com os nomes dos catadores,