On the Problem of Matching Database Schemas

(1)

On the Problem of Matching Database Schemas

Marco A. Casanova, Karin K. Breitman, Antonio L. Furtado, Vânia M.P. Vidal, and José A. F. de Macêdo

AbstractThis chapter introduces procedures to test strict satisfiability and decide logical implication for a family of database schemas calledextralite schemas with role hierarchies. Using the OWL jargon, extralite schemas support named classes, datatype and object properties, minCardinalities and maxCardinalities, InverseFunc- tionalProperties, class subset constraints, and class disjointness constraints. Extralite schemas with role hierarchies also support subset and disjointness constraints defined for datatype and object properties. Strict satisfiability imposes the additional restriction that the constraints of a schema must not force classes, datatypes, or object properties to be always empty, and is therefore more adequate than the traditional notion of satisfiability in the context of database design. The decision procedures outlined in the chapter are based on the satisfiability algorithm for Boolean formulas in conjunctive normal form with at most two literals per clause, and explore the structure of a set of constraints, captured as aconstraint graph. The chapter includes a proof that the procedures work in polynomial time and are consistent and complete for a subfamily of the schemas that impose a restriction on the role hierarchy.

1.1 Introduction

The problem of satisfiability is often taken for granted when designing database schemas, perhaps based on the implicit assumption that real data provides a consistent database state. However, this implicit assumption is unwarranted when the schema results from the integration of several data sources, as in a data warehouse or in a mediation environment. When we have to combine semantically heterogeneous data sources, we should expect conflicting data or, equivalently, mutually incon- sistent sets of integrity constraints. The same problem also occurs during schema redesign, when changes in some constraints might create conflicts with other parts of the database schema. Naturally, the satisfiability problem is aggravated when the

1

(2)

schema integration process has to deal with a large number of source schemas, or when the schema to be redesigned is complex.

We may repeat similar remarks for the problem of detecting redundancies in a schema, that is, the problem of detecting which constraints are logically implied by others. The situation is analogous if we replace the question of satisfiability by the question of logical implication.

A third similar, but more sophisticated problem is to automatically generate the constraints of a mediated schema from the sets of constraints of export schemas.

The constraints of the mediated schema are relevant for a correct understanding of what the semantics of the external schemas have in common.

With this motivation, we focus on the crucial challenge of selecting a sufficiently expressive family of schemas that is useful for defining real-world schemas and yet is tractable, i.e., for which there are practical procedures to test the satisfiability of a schema, to detect redundancies in a schema, and to combine the constraints of export schemas into a single set of mediated schema constraints. The intuitive metrics for expressiveness here is that the family of schemas should account for the commonly used conceptual constructs of OWL, UML, and the ER model. By a practical procedure, we mean a procedure that is polynomial on the size of the set of constraints of the schema.

As an answer to this challenge, we first introduce a family of schemas that we call extralite schemas with role hierarchies. Using the OWL jargon, this family supports named classes, datatype and object properties, minCardinalities and max- Cardinalities, InverseFunctionalProperties, which capture simple keys, class subset constraints, and class disjointness constraints. Extralite schemas with role hierarchies also support subset and disjointness constraints defined for datatype and object properties (formalized as atomic roles in Description Logics). We then introduce the subfamily ofrestricted extralite schemas with role hierarchies, which limits the interaction between role hierarchies and cardinality constraints.

Testing satisfiability for extralite schemas with role hierarchies turns out to be EXPTIME-hard, as a consequence of the results in [1]. However, for the restricted schemas, we show how to test strict satisfiability and logical implication in polynomial time. Strict satisfiability imposes the additional restriction that the constraints of a schema must not force classes or properties to be always empty, and is more adequate than the traditional notion of satisfiability in the context of database design.

The syntax and semantics of extralite schemas is that of Description Logics to facilitate the formal analysis of the problems we address. However, we depart from the tradition of Description Logics deduction services, which are mostly based on tableaux techniques [3]. The decision procedures outlined in the chapter are based on the satisfiability algorithm for Boolean formulas in conjunctive normal form with at most two literals per clause, described in [2]. The intuition is that the constraints of an extralite schema can be treated much in the same way as Boolean implica- tions. Furthermore, the implicational structure of the constraints can be completely captured as aconstraint graph. The results also depend on the notion of Herbrand interpretation for Description Logics.

(3)

The notion of constraint graph is the key to meet the challenge posed earlier. It permits expressiveness and decidability to be balanced, in the sense that it accounts for a useful family of constraints and yet leads to decision procedures, which are polynomial on the size of the set of constraints. This balancing is achieved by a careful analysis of how the constraints interact.

Constraint graphs can be used to help detect inconsistencies in a set of constraints and to suggest alternatives to fix the problem. They help solve the query contain- ment and related problems in the context of schema constraints [10]. They can also be used to compute the greatest lower bound of two sets of constraints, which is the basic step of a strategy to automatically generate the constraints of a mediated schema from the sets of constraints of the export schemas [8]. The appendix il- lustrates, with the help of examples, how to use constraint graphs to address such problems.

The main contributions of the chapter are the family of extralite schemas with role hierarchies, the procedures to test strict satisfiability and logical implication, which explore the structure of sets of constraints, captured as a constraint graph, and the concept of Herbrand interpretation for Description Logics. The results in the chapter indicate that the procedures are consistent and complete for restricted extralite schemas with role hierarchies, and work in polynomial time. These results extend those published in [8] for extralite schemas without role hierarchies.

There is a vast literature on the formal verification of database schemas and on the formalization of ER and UML diagrams. We single out just a few references here.

The problem of modeling conceptual schemas in DL is discussed in [4]. DL-Lite is used, for example, in [5, 6] to address schema integration and query answering.

A comprehensive survey of the DL-Lite family can be found in [1]. Techniques from Propositional Logic to support the specification of Boolean and multivalued dependencies were addressed in [9].

When compared with the DL-Lite family [1], extralite schemas with role hierarchies are a subsetDL Lite^{H N}_core with role disjunctions. The restricted schemas in turn are a subset ofDL Lite^{(H N}core ⁾with role disjunctions only, which limits the interaction between role inclusions and cardinality constraints. We emphasize that restricted extralite schemas are sufficiently expressive to capture the most familiar constructs of OWL, UML, and the ER model [4], and yet come equipped with useful decision procedures that explore the structure of sets of constraints.

The chapter is organized as follows. Section 1.2 reviews DL concepts and introduces the notion of extralite schemas with role hierarchies. Section 1.3 shows how to test strict satisfiability and logical implication for restricted extralite schemas with role hierarchies. It also outlines proofs for the major results of the chapter, whose details can be found in [7]. Section 1.4 contains examples of the concepts introduced in Sections 1.2 and 1.3, and briefly discusses two applications of the results of Section 1.3. Finally, Section 1.5 contains the conclusions.

(4)

1.2 A Class of Database Schemas

1.2.1 A Brief Review of Attributive Languages

We adopt a family ofattributive languages[3] defined as follows. AlanguageL in the family is characterised by analphabet A, consisting of a set ofatomic concepts, a set ofatomic roles, theuniversal concept and thebottom concept, denoted by>

and?, respectively, and theuniversal roleand thebottom role, also denoted by>

and?, respectively.

The set ofrole descriptionsofL is inductively defined as

• An atomic role and the universal and bottom roles are role descriptions

• If p is a role description, then the following expressions are role descriptions p : theinverseofp

¬p: thenegationofp

The set ofconcept descriptionsofL is inductively defined as

• An atomic concept and the universal and bottom concepts are concept descriptions

• Ifeis a concept description,pis a role description, and nis a positive integer, then the following expressions are concept descriptions

¬e: negation

9p: existential quantification (n p): at-most restriction ( n p): at-least restriction

An interpretation s for L consists of a nonempty set D^s, the domain of s, whose elements are calledindividuals, and aninterpretation function, also denoted s, where:

s(>) =D^sif>denotes the universal concept s(>) =D^s⇥D^sif>denotes the universal role

s(?) =?if?denotes the bottom concept or the bottom role s(A)✓D^sfor each atomic conceptAofA

s(P)✓D^s⇥D^sfor each atomic rolePofA

The function s is extended to role and concept descriptions ofL as follows (whereeis a concept description andpis a role description):

s(p ) =s(p) : the inverse of s(p)

s(¬p) =D^s⇥D^s s(p): the complement ofs(p)with respect toD^s⇥D^s s(¬e) =D^s s(e): the complement ofs(e)with respect toD^s

s(9p) ={I2D^s/(9J2D^s)/(I,J)2s(p)}: the set of individuals thats(p)relates to some individual

s( n p) ={I2D^s/| {J2D^s/(I,J)2s(p)} | n}: the set of individuals that s(p)relates to at leastndistinct individuals

(5)

s(n p) ={I2D^s/| {J2D^s/(I,J)2s(p)} |n}: the set of individuals that s(p)relates to at mostndistinct individuals

A formula ofL is an expression of the formuvv, called aninclusion, or of the formu|v, called adisjunction, or of the formu⌘v, called anequivalence, where bothuandvare concept descriptions or bothuandvare role descriptions ofL. We also say thatuvvis a concept inclusion iff bothuandvare concept descriptions, and thatuvvis a role inclusion iff bothuandvare role descriptions; and likewise for the other types of formulas.

An interpretation s for L satisfies uvv iff s(u)✓s(v), s satisfies u|v iff s(u)\s(v) =?, andssatisfies u⌘viffs(u) =s(v). A formula s is atautology iff any interpretation satisfiess. Two formulas aretautologically equivalentiff any interpretationsthat satisfies one formula also satisfies the other.

Given a set of formulasS, we say that an interpretationsis amodelofS iffs satisfies all formulas inS, denoteds|=S. We say thatS issatisfiableiff there is a model ofS. However, this notion of satisfiability is not entirely adequate in the context of database design since it allows the constraints of a schema to force atomic concepts or atomic roles to be always empty. Hence, we define that an interpretation sis a strict model ofSiffssatisfies all formulas inSands(C)6=?, for each atomic conceptC, ands(P)6=?, for each atomic roleP; we say thatSisstrictly satisfiable iff there is astrict modelforS. In addition, we say thatSlogically impliesa formula s, denotedS|=s, iff any model ofSsatisfiess.

1.2.2 Extralite Schemas with Role Hierarchies

Anextralite schemawithrole hierarchiesis a pairS= (A,S)such that

• A is an alphabet, called thevocabularyofS.

• Sis a set of formulas, called theconstraintsofS, which must be of one the forms (whereCandDare atomic concepts,PandQare atomic roles,pdenotesPor its inverseP , andkis a positive integer):

Domain Constraint:9PvC(the domain ofPis a subset ofC) Range Constraint:9P vC(the range ofPis a subset ofC)

minCardinality constraint:Cv( k p)(pmaps each individual inC to at leastkindividuals)

maxCardinality constraint:Cv(k p)(p maps each individual inCto at mostkindividuals)

Concept Subset Constraint:CvD(Cis a subset ofD)

Concept Disjointness Constraint:C|D(C and D are disjoint atomic concepts) Role Subset Constraint:PvQ(Pis a subset ofQ)

Role Disjointness Constraint:P|Q(PandQare disjoint atomic roles)

(6)

We say that a formula of one of the above forms is anextralite constraint, the concept subset and disjointness constraints ofSare theconcept hierarchyofS, and the role subset and disjointness constraints ofSare therole hierarchyofS.

Wenormalizea set of extralite constraints by rewriting:

9PvC as( 1P)vC 9P vC as( 1P )vC Cv(k P) asCv¬( k+1P) Cv(k P )asCv¬( k+1P ) C|D asCv¬D

P|Q asPv¬Q

The formula on the right-hand side is called thenormal formof the formula on the left-hand side. Observe that: a formula and its normal form are tautologically equivalent; the normal forms avoid the use of existential quantification and at-most restrictions; negated descriptions occur only on the right-hand side of the normal forms; inverse roles do not occur in role subset or role disjoint constraints.

Furthermore, weclosethe set of extralite constraints by also considering as an extralite constraint any inclusion of one of the forms

Cv ? ( m p)v ? ( m p)v( n q) Pv ? ( m p)v¬C ( m p)v¬( n q)

whereCis an atomic concept,Pis an atomic role,pandqboth are atomic roles or both are the inverse of atomic roles, andmandnare positive integers.

Finally, arestricted extralite schema with role hierarchiesis a schemaS= (A,S) that satisfies the following restriction:

Restriction. Role Hierarchy Restriction:IfS contains a role subset constraint of the form PvQ, thenS contains no maxCardinality constraints of the forms Cv (k Q)or Cv(k Q ), with k 1.

Note that the normalization process will rewrite the above constraints asCv

¬( k+1Q)andCv¬( k+1Q ), withk 1.

1.3 Testing Strict Satisfiability and Logical Implication

This section first introduces the notion of constraint graph. Then, it defines Her- brand interpretations for Description Logics. Finally, it states results that lead to simple polynomial procedures to test strict satisfiability and logical implication for restricted extralite schemas with role hierarchies.

1.3.1 Representation Graphs

Let S be a finite set of normalized extralite constraints and W be a finite set of extralite constraint expressions, that is, expression that may occur on the right- or

(7)

left-hand sides of a normalized constraint. The alphabet is understood as the (finite) set of atomic concepts and roles that occur inSandW.

We say that thecomplementof a non-negated descriptioncis¬c, and vice-versa.

We denote the complement of a descriptiondby ¯d. Proposition 1.1 states properties of descriptions that will be used in the rest of this section.

Proposition 1.1.Let e, f and g be concept or role descriptions, P and Q be atomic roles, and p be either P or P . Then, we have:

(i)( n p)v( m p)is a tautology, where0<m<n.

(ii) evf is tautologically equivalent tof¯ve.¯

(iii) IfS logically implies evf and f vg, thenSlogically implies evg.

(iv) IfS logically implies PvQ, thenS logically implies( k P)v( k Q)and ( k P )v( k Q ).

(v) IfSlogically implies( 1P)v¬( 1Q)or( 1P )v¬( 1Q ), thenS logically implies Pv¬Q.

(vi) IfS logically implies evf and ev¬f , thenSlogically implies ev ?. (vii) IfSlogically implies( 1P)v ?or( 1P )v ?, thenSlogically implies

Pv ?.

(viii) If S logically implies Pv ?, then S logically implies ( m P)v ?, ( m P )v ?,> v(n P)and> v(n P ), where m>0and n 0.

In the next definitions, we introduce graphs whose nodes are labeled with expressions or sets of expressions. Then, we use such graphs to create efficient procedures to test ifS is strictly satisfiable and to decide logical implication forS. Finally, it will become clear when we formulate Theorem 1.2 that the definitions must also consider an additional setW of constraint expressions.

To simplify the definitions, if a nodeKis labeled with an expressione, then ¯K denotes the node labeled with ¯e. We will also useK!Mto indicate that there is a path from a nodeKto a nodeM, andK9Mto indicate that no such path exists;

we will usee!f to denote that there is a path from a node labeled witheto a node labeled withf, ande9f to indicate that no such path exists.

Definition 1.1.The labeled graphg(S,W)that capturesSandW, where each node is labeled with an expression, is defined in four stages as follows:

Stage 1:

Initializeg(S,W)with the following nodes and arcs:

(i) For each atomic conceptC,g(S,W)has exactly one node labeled withC.

(ii) For each atomic roleP,g(S,W)has exactly one node labeled withP, one node labeled with( 1P), and one node labeled with( 1P ).

(iii) For each expressionethat occurs on the right- or left-hand side of an inclusion inS, or that occurs inW, other than those in (i) or (ii),g(S,W)has exactly one node labeled withe.

(iv) For each inclusionevf inS,g(S,W)has an arc(M,N),whereMandNare the nodes labeled witheandf, respectively.

(8)

Stage 2:

Until no new node or arc can be added tog(S,W), For each role inclusionPvQinS,

For each nodeK,

(i) ifKis labeled with( k P), for somek>0, then add a nodeLlabeled with ( k Q)and an arc(K,L), if no such node and arc exists;

(ii) ifKis labeled with( k P ), for somek>0, then add a nodeLlabeled with ( k Q )and an arc(K,L), if no such node and arc exists;

(iii) ifKis labeled with( k Q), for somek>0, then add a nodeLlabeled with ( k P)and an arc(L,K), if no such node and arc exists;

(iv) ifKis labeled with( k Q ), for somek>0, then add a nodeLlabeled with ( k P )and an arc(L,K), if no such node and arc exists.

Stage 3:

Until no new node or arc can be added tog(S,W),

(i) Ifg(S,W)has a node labeled with an expressione, then add a node labeled with ¯e, if no such node exists.

(ii) Ifg(S,W)has a node M labeled with ( m p) and a nodeN labeled with ( n p), wherepis eitherPorP and 0<m<n, then add an arc(N,M), if no such arc exists.

(iii) Ifg(S,W)has an arc(M,N), then add an arc(N,¯ M), if no such arc exists.¯ Stage 4:

Until no new node or arc can be added tog(S,W),

for each pair of nodesMandNsuch thatMandNare labeled with( 1P) and¬( 1Q), respectively, and there is a path fromMtoN

add arcs(K,L)and(L,¯ K), where¯ KandLare the nodes labeled withPand

¬Q, respectively, if no such arcs exists.

Note that Stage 2 corresponds to Proposition 1(iv), Stage 3(ii) to Proposition 1(i), Stage 3(iii) to Proposition 1(ii), and Stage 4 to Proposition 1(v).

Definition 1.2.Theconstraint graph that representsS andW is the labeled graph G(S,W), where each node is labeled with a set of expressions, defined fromg(S,W) by collapsing each clique ofg(S,W)into a single node labeled with the expressions that previously labeled the nodes in the clique. WhenW is the empty set, we simply writeG(S)and say thatG(S)is theconstraint graph that representsS.

Note that Definition 1.2 reflects Proposition 1(iii).

Definition 1.3.LetG(S,W)be the constraint graph that representsS andW. We say that a nodeKofG(S,W)is a?-node with level n, for a non-negative integern, iff one of the following conditions holds:

(i)Kis a?-node with level 0iff there are nodesMandN, not necessarily distinct fromK, and a positive expressionhsuch thatMandNare respectively labeled withhand¬h, andK!MandK!N.

(9)

(ii)Kis a?-node with level n+1iff

(a) There is a?-nodeMof leveln, distinct fromK, such thatK!M, andM is the?-node with the smallest level such thatK!M, or

(b)Kis labeled with a minCardinality constraint of the form( k P)or of the form( k P )and there is a?-nodeMof levelnsuch thatM is labeled withP, or

(c)Kis labeled with an atomic rolePand there is a?-nodeMof levelnsuch thatM is labeled with a minCardinality constraint of the form( 1P)or of the form( 1P ).

Note that cases (i) and (ii-a) of Definition 1.3 correspond to Proposition 1(vi), case (ii-b) to Proposition 1(viii), and case (ii-c) to Proposition 1(vii).

Definition 1.4.A nodeKis a?-node of G(S,W)iffKis a?-node with leveln, for some non-negative integern. A nodeKis a>-node ofG(S,W)iff ¯Kis a?-node.

To avoid repetitions, in what follows, letg(S,W)be the graph that capturesSand W andG(S,W)be the graph that representsSandW. Proposition 2 lists properties ofg(S,W)that directly reflect the structure of the set of constraintsS. Proposition 3 applies the results in Proposition 2 to obtain properties ofG(S,W)that are funda- mental to establish Lemma 1.1 and Theorems 1 and 2. Finally, Proposition 4 relates the structure ofG(S,W)with the logical consequences ofS.

Proposition 2:For any pair of nodesKandMofg(S,W):

(i) If there is a pathK!Ming(S,W)and ifMis labeled with a positive expression, thenKis labeled with a positive expression.

(ii) If there is a pathK!Ming(S,W)and ifKis labeled with a negative expression, thenMis labeled with a negative expression.

Proposition 3:

(i) G(S,W)is acyclic.

(ii) For any nodeKofG(S,W), for any expressione, we have thatelabelsKiff ¯e labels ¯K.

(iii) For any pair of nodesMandNofG(S,W), we have thatM!Niff ¯N!M.¯ (iv) For any nodeKofG(S,W), one of the following conditions holds:

(a) K is labeled only with atomic concepts or minCardinality constraints of the form( m p), wherepis eitherPorP andm 1, or

(b) Kis labeled only with atomic roles, or

(c) Kis labeled only with negated atomic concepts or negated minCardinality constraints of the form¬( m p), wherepis eitherPorP andm 1, or (d) Kis labeled only with negated atomic roles.

(v) For any pair of nodesKandMofG(S,W),

(a) If there is a pathK!M inG(S,W)and ifM is labeled with a positive expression, thenKis labeled only with positive expressions.

(10)

(b) If there is a pathK!M inG(S,W)and ifKis labeled with a negative expression, thenMis labeled only with negative expressions.

(vi) For any nodeKofG(S,W),

(a) IfKis a?-node, thenKis labeled only with atomic concepts or minCar- dinality constraints of the form ( m p), wherep is eitherPor P and m 1, orKis labeled only with atomic roles.

(b) If K is a>-node, thenK is labeled only with negated atomic concepts or negated minCardinality constraints of the form¬( m p), wherep is eitherPorP andm 1, orKis labeled only with negated atomic roles.

(vii) Assume that S has no inclusions of the formev¬( k P)or of the form ev¬( k P ). LetMbe the node labeled with¬( k P)(or with¬( k P )).

Then, for any nodeKofG(S,W), if there is a pathK!MinG(S,W), then Kis labeled only with negative concept expressions.

Proposition 4:

(i) For any pair of nodesMandNofG(S,W), for any pair of expressionseandf that labelMandN, respectively, ifM!NthenS|=evf.

(ii) For any nodeK ofG(S,W), for any pair of expressionseandf that labelK, S|=e⌘f.

(iii) For any nodeK ofG(S,W), for any expressionethat labelsK, ifK is a?- node, thenS|=ev ?.

(iv) For any nodeK ofG(S,W), for any expressionethat labelsK, ifK is a>- node, thenS|=> ve.

1.3.2 Herbrand Interpretations and Instance Labeling Functions

To prove the main results, we introduce in this section the notion of canonical Her- brand interpretation for a set of constraints. The definition mimics the analogous notion used in automated theorem proving strategies based on Resolution.

Definition 1.5.

(i) A setF of distinct function symbols is a set of Skolem function symbols for G(S,W)iffF associates:

(a)n distinct unary function symbols with each nodeN of G(S,W)labeled with( n P), denotedf1[N,P], . . . ,fn[N,P]for ease of reference;

(b)n distinct unary function symbols with each nodeN of G(S,W)labeled with( n P ), denotedg1[N,P], . . . ,gn[N,P]for ease of reference;

(c) a distinct constant with each node N ofG(S,W)labeled with an atomic concept or with( 1P), denotedc[N]for ease of reference.

(ii) The Herbrand UniverseD[F]forF is the set of first-order terms constructed using the function symbols inF. The terms inD[F]are calledindividuals.

(11)

In the next definition, recall that useQ!Pto indicate that there is a path from a nodeQto a nodePinG(S,W).

Definition 1.6.

(i) Aninstance labeling function for G(S,W)andD[F]is a functions⁰ that associates a set of individuals in D[F] to each node of G(S,W)labeled with concept expressions, and a set of pairs of individuals inD[F]to each node of G(S,W)labeled with role expressions.

(ii) LetNbe a node ofG(S,W)labeled with an atomic concept or with( k P).

Assume thatNis not a?-node. Then, the Skolem constantc[N]is aseed term ofN, andNis theseed nodeofc[N].

(iii) LetNP be the node ofG(S,W)labeled with the atomic roleP. Assume that NPis not a?-node. For each terma, for each nodeMlabeled with( m P), if a2s⁰(M)and there is no nodeKlabeled with( k Q)such thatmk,Q!P anda2s⁰(K), then

(a) the pair(a,fr[M,P](a))is called aseed pairofNPtriggered by a2s⁰(M), forr2[1,m],

(b) the termfr[M,P](a)is aseed termof the nodeLlabeled with( 1P ), and Lis called theseed nodeof fr[M,P](a), forr2[2,m], ifais of the form gi[J,P](b), for some nodeJand some termb, and forr2[1,m], otherwise.

(iv) LetNPbe the node ofG(S,W)labeled with the atomic roleP. Assume thatNP

is not a?-node. For each termb, for each nodeNlabeled with( n P ), if b2s⁰(N)and there is no nodeKlabeled with( k Q )such thatnk,Q!P andb2s⁰(K), then

(a) the pair(g_r[N,P](b),b)is called aseed pairofNPtriggered by b2s⁰(N), forr2[1,n], and

(b) the termgr[N,P](b)is aseed termof the nodeLlabeled with( 1P), and L is called theseed nodeofgr[N,P](b), for r2[2,n], ifb is of the form fi[J,P](a), for some nodeJand some terma, and forr2[1,n], otherwise.

Intuitively, the seed term of a nodeNwill play the role of a unique signature of N, and likewise for a seed pair of a nodeNP.

Definition 1.7.Acanonical instance labeling function for G(S,W)andD[F]is an instance labeling function that satisfies the following restrictions, for each nodeK ofG(S,W):

(a) Assume thatKis a concept expression node, and thatKis neither a?-node nor a>-node. Then,t2s⁰(K)ifftis a seed term of a nodeJand there is a path from JtoK.

(b) Assume thatKis a role expression node and is neither a?-node nor a>-node.

Then,(t,u)2s⁰(K)iff(t,u)is a seed pair of a nodeJand there is a path fromJ toK.

(c) Assume thatKis a?-node. Then,s⁰(K) =?.

(12)

(d) Assume thatK is a concept expression node and is a>-node. Then, s⁰(K) = D[F].

(e) Assume thatKis a role expression node and is a>-node. Then,s⁰(K) =D[F]⇥ D[F].

Proposition 5:Lets⁰ be canonical instance labeling function forG(S,W)and D[F]. Then

(i) For any pair of nodesMandNofG(S,W), ifM!Nthens⁰(M)✓s⁰(N).

(ii) For any pair of nodesM andNofG(S,W)that both are concept expression nodes or both are role expression nodes,s⁰(M)\s⁰(N)6=?iff there is a seed nodeKsuch thatK!MandK!N.

(iii) For any nodeNPofG(S,W)labeled with an atomic roleP, for any nodeMof G(S,W)labeled with( m P), for any termt2s⁰(M), eithers⁰(N_P)contains all seed pairs triggered by t2s⁰(M), or there are no seed pairs triggered by t2s⁰(M).

(iv) For any nodeNPofG(S,W)labeled with an atomic roleP, for any nodeNof G(S,W)labeled with( n P ), for any termt2s⁰(N), eithers⁰(NP)contains all seed pairs triggered byt2s⁰(N), or there are no seed pairs triggered by t2s⁰(N).

Recall that the alphabet is understood as the (finite) set of atomic concepts and roles that occur inS andW. Hence, in the context ofSandW, when we refer to an interpretation, we mean an interpretation for such alphabet.

Definition 1.8.Let s⁰ be a canonical instance labeling function for G(S,W) and D[F].The canonical Herbrand interpretation induced by s⁰is the interpretations defined as follows:

(a)D[F]is the domain ofs.

(b)s(C) =s⁰(M), for each atomic conceptC, whereM is the node ofG(S,W)labeled withC(there is just one such node).

(c)s(P) =s⁰(N), for each atomic roleP, whereN is the node ofG(S,W)labeled withP(again, there is just one such node).

1.3.3 Strict Satisfiability and Logical Implication for Extralite Schemas with Restricted Role Hierarchies

We now ready to prove the main results of the chapter that lead to efficient decision procedures to test strict satisfiability and logical implication for restricted extralite schemas with role hierarchies.

In what follows, letS be a finite set of normalized extralite constraints andW be a finite set of extralite constraint expressions. Let G(S,W)be the graph that representsS andW.

(13)

Lemma 1.1.Assume thatSsatisfies the role hierarchy restriction. Let s⁰be a canonical instance labeling function for G(S,W)andD[F]. Let s be the canonical Her- brand interpretation induced by s⁰. Then, we have:

(i) For each node N of G(S,W), for each positive expression e that labels N, s⁰(N) =s(e).

(ii) For each node N of G(S,W), for each negative expression¬e that labels N, s⁰(N)✓s(¬e).

Proof SketchLets⁰be a canonical instance labeling function forG(S,W)andD[F].

Letsbe the interpretation induced bys⁰.

(i)LetNbe a node ofG(S,W). Letebe a positive expression that labelsN.

First observe that N cannot be a>-node. By Prop. 3(vi-b), >-nodes are labeled only with negative expressions, which contradicts the assumption thateis a positive expression. Then, there are two cases to consider.

Case 1:Nis not a?-node.

We have to prove thats(e) =s⁰(N). By the restrictions on constraints and constraint expressions, sinceeis a positive expression, there are four cases to consider.

Case 1.1:eis an atomic conceptC.

By Def. 1.8(b), s(C) = s’(N).

Case 1.2:eis an atomic roleP.

By Def. 1.8(c), s(P) = s’(N).

Case 1.3:eis of the form( n P).

LetNPbe the node labeled withP. Then,NP is not a?-node. Assume otherwise.

Then, by Def. 1.3(ii-b) and Def. 1.4, the nodeLlabeled with( 1 P)would be a

?-node. But, by construction ofG(S,W), there is an arc fromN(the node labeled with( n P)) toL. Hence,Nwould be a?-node, contradicting the assumption of Case 1. Furthermore, sinceNPis labeled with the positive atomic roleP, by Prop.

3(vi-b),NPcannot be a>-node.

Then, sinceNPis neither a?-node nor a>-node, Def. 1.7(b) applies tos⁰(NP).

Recall thatNis the node labeled with( n P)and thatNis neither a?-node nor a>-node. We first prove that

(1) a2s⁰(N)implies thata2s(( n P))

Leta2s⁰(N). LetKbe the node labeled with( k P)such thata2s⁰(K)and kis the largest possible integer greater thann. Sincea2s⁰(K)andkis the largest possible, there arekpairs ins⁰(N_P)whose first element isa, by Prop. 5(iii). By Def.

1.8(c),s(P) =s⁰(N_P). Hence, by definition of minCardinality,a2s(( k P)). But again by definition of minCardinality,s(( k P))✓s(( n P)), sincenk, by the choice ofk. Therefore,a2s(( n P)).

We now prove that

(2) a2s(( n P))implies thata2s⁰(N)

SinceSsatisfies the role hierarchy restriction, there are two cases to consider.

Case 1.3.1:S defines no subroles forP.

(14)

Leta2s(( n P)). By definition of minCardinality, there must bendistinct pairs (a,b1), . . . ,(a,bn)ins(P)and, consequently, ins⁰(N_P), sinces(P) =s⁰(N_P), by Def.

1.8(c).

Recall that NP is neither a ?-node nor a >-node. Then, by Def. 1.7(b) and Def. 1.6(iii), possibly by reorderingb1, . . . ,bn, we then have that there are nodes L0,L1, . . .Lvsuch that

(3) (a,b1)is a seed pair ofNPof the form(g_i0[L₀,P](u),u), triggered byu2s⁰(L₀), whereL0is labeled with( l0P ), for somei02[1,l0]

or

(4) (a,b1)is a seed pair ofNPof the form(a,f1[L1,P](a)), triggered bya2s⁰(L1), whereL1is labeled with( l1P)

and

(5) (a,bj)is a seed pair ofNPof the form(a,fwj[Li,P](a)), triggered bya2s⁰(Li), whereLiis labeled with( li P),j2[(Âⁱ_r=1¹lr)+1,Âⁱr=1lr], withwj2[1,l_i]and i2[2,v]

Furthermore,li6=lj, fori,j2[2,v], withi6=j, since only one node is labeled with ( liP). We may therefore assume without loss of generality thatl1>l2> . . . >lv. But note that we then have thata2s⁰(L_i)anda2s⁰(L_j)andli>lj, for eachi,j2 [1,v], withi<j. But this contradicts the fact that(a,fwj[Lj,P](a))is a seed pair of NPtriggered bya2s⁰(Lj)since, by Def. 1.6(iii), there could be no nodeLilabeled with( liP)withli>ljanda2s⁰(Li). This means that there is just one node,L1, that satisfies (5).

We are now ready to show thata2s⁰(N).

Case 1.3.1.1:n=1.

Case 1.3.1.1.1:ais of the formgi0[L0,P](u).

Recall thatNPis not a?-node. Then, by Def. 1.6(iv),gi0[L0,P](u)is a seed term of the node labeled with( 1P), which must beN, sincen=1 and there is just one node labeled with( 1P). Therefore, sinceNis not a?-node or a>-node, by Def.

1.7(a),a2s⁰(N).

Case 1.3.1.1.2:a is not of the formgi0[L₀,P](u).

Then, by (4) and assumptions of the case, a2s⁰(L₁). Since,L1 is labeled with ( l1P)andNwith( 1P), eithern=l1=1 andN=L1, orl1>n=1 and(L₁,N) is an arc ofG(S,W), by definition ofG(S,W). Then,s⁰(L₁)✓s⁰(N), using Prop.

5(i), for the second alternative. Therefore,a2s⁰(N)as desired, sincea2s⁰(L₁).

Case 1.3.1.2:n>1.

We first show thatnl1. First observe that, by (5) andn>1,s⁰(NP)contains a seed pair(a,fwj[L1,P](a))triggered bya2s⁰(L1). Then, by Prop. 5(iii),s⁰(NP)contains all seed pairs triggered bya2s⁰(L1). In other words, we have thata2s(( n P)) and(a,b1), . . . ,(a,bn)2s⁰(NP)and(a,b1), . . . ,(a,bn)are triggered bya2s⁰(L1).

Therefore, either(a,b1), . . . ,(a,bn)are all pairs triggered bya2s⁰(L1), in which casen=l1, or(a,b1), . . . ,(a,bn),(a,bn+1), . . . ,(a,bl₁), in which casen<l1. Hence, we have thatnl1.

(15)

SinceL1is labeled with( l1P)andNwith( n P), withnl1, eithern=l1and N=L1, orl1>nand(L₁,N)is an arc ofG(S,W), by definition ofG(S,W). Then, s⁰(L₁)✓s⁰(N), using Prop. 5(i), for the second alternative. Therefore,a2s⁰(N)as desired, sincea2s⁰(L₁).

Therefore, we established that (2) holds. Hence, from (1) and (2),s⁰(N) =s((

n P)), as desired.

Case 1.3.2:Sdefines subroles forP.

SinceSsatisfies the role hierarchy restriction and defines subroles forP, thenShas no constraint of the formev¬( 1P)or of the formev¬( 1P ). The proof of this case is a variation of that of Case 1.3.1.

Case 1.4:eis of the form( n P ).

The proof of this case is entirely similar to that of Case 1.3.

Case 2:Nis a?-node.

We have to prove thats(e) =s⁰(N) =?. Again, by the restrictions on constraints and constraint expressions, sinceeis a positive expression, there are four cases to consider.

Case 2.1:eis an atomic conceptC.

Then, by Def. 1.8(b), we trivially have thats(C) =s⁰(N) =?. Case 2.2:Nis an atomic nodeP.

Then, by Def. 1.8(c), we trivially have thats(P) =s⁰(N) =?.

Case 2.3:eis a minCardinality constraint of the form( n p), wherepis eitherP orP and 1n.

We prove thats(( n p)) =?, using an argument similar to that in Case 1.3. LetNP

be the node labeled withP.

Case 2.1.2.1:NPis a?-node

Then, by Def. 1.7(c) and Def. 1.8(c),s(P) =s⁰(NP) =?. Hence,s(( n p)) =?. Case 2.1.2.2:NPis not a?-node.

By Prop. 3(vi-b),NPcannot be a>-node. Then, Def. 1.7(b) applies tos⁰(NP).

We proceed by contradiction. So, assume thats(( n p))6=?and leta2s((

n p)).

By definition of minCardinality and sinces(P) =s⁰(N_P), there must bendistinct pairs(a,b1), . . . ,(a,bn)ins⁰(N_P). Using an argument similar to that in Case 1.3, there are nodesL0andL1such that

(6) (a,b1)is a seed pair ofNPof the form(g_i0[L₀,P](u),u), triggered byu2s⁰(L₀), whereL0is labeled with( l0P ), for somei02[1,l0]

or

(7) (a,b1)is a seed pair ofNPof the form(a,f1[L1,P](a)), triggered bya2s⁰(L1), whereL1is labeled with( l1P)

and

(8) (a,bj)is a seed pair ofNPof the form(a,fwj[L1,P](a)), triggered bya2s⁰(L1), whereL1is labeled with( l1P), withj2[2,l1]

(16)

We are now ready to show that no sucha2s(( n p))exists. Recall thatn>1.

We first show that nl1. First observe that, by (8) and n>1, s⁰(N_P) contains a seed pair(a,fwj[L₁,P](a))triggered bya2s⁰(L₁). Then, by Prop. 5(iii),s⁰(N_P) contains all seed pairs triggered by a2s⁰(L₁). In other words, we have thata2 s(( n P))and(a,b1), . . . ,(a,b_n)2s⁰(N_P)and(a,b₁), . . . ,(a,bn)are triggered by a2s⁰(L₁). Therefore, either(a,b1), . . . ,(a,bn)are all pairs triggered bya2s⁰(L₁), in which casen=l1, or(a,b1), . . . ,(a,bn),(a,bn+1), . . . ,(a,bl₁), in which casen<

l1. Hence, we have thatnl1. SinceL1 is labeled with( l1 P)andN with( n P), with nl1, either n=l1 and N=L1, or l1>n and(L1,N)is an arc of G(S,W), by definition ofG(S,W). Then,s⁰(L1)✓s⁰(N), using Prop. 5(i), for the second alternative. Therefore,a2s⁰(N), sincea2s⁰(L1). But this is impossible, sinces⁰(N) =?.

Hence, we conclude thats(( n p)) =?.

Therefore, we have that, ifNis a?-node, thens⁰(N) =s(e) =?, for any positive expressionethat labelsN.

Therefore, we established, in all cases, that Lemma 1.1(i) holds.

(ii) LetN be a node ofG(S,W). Let¬e be a negative expression that labelsN.

First observe thatNcannot be a?-node. By Prop 3(vi-a),?-nodes are labeled only with positive expressions, which contradicts the assumption that¬eis a negative expression. Then, there are two cases to consider.

Case 1:Nis not a>-node.

We have to prove thats⁰(N)✓s(¬e).

Case 1.1:Nis a concept expression node.

Suppose, by contradiction, that there is a termtsuch thatt2s⁰(N)andt2/s(¬e).

Sincet2/s(¬e), we have thatt2s(e), by definition. LetMbe the node labeled withe. Hence, by Lemma 1.1(i),t2s⁰(M). That is,t2s⁰(M)\s⁰(N).

Note thatMandNare dual nodes sinceMis labeled witheandNis labeled with

¬e. Therefore, sinceNis neither a?-node nor a>-node,Mis also neither a>-node nor a?-node, by definition of>-node.

SinceSsatisfies the role hierarchy restriction, there are two cases to consider.

Case 1.1.1:¬eis not of the form¬( n P)or¬( n P ).

Then, by Def. 1.7(a),t2s⁰(N)ifft is a seed term of a nodeJand there is a path fromJtoK. Furthermore, by Prop. 5(ii), there is a seed nodeK such thatK!M andK!Nandt2s⁰(K). But this is impossible. We would have thatK!Mand K!N,M is labeled with e, andN is labeled with ¬e, which implies thatK is a

?-node. Hence, by Def. 1.7(c),s⁰(K) =?, which implies thatt2/s⁰(K). Therefore, we established that, for all termst, ift2s⁰(N)thent2s(¬e).

Case 1.1.2:¬eis of the form¬( n P)or¬( n P ).

Case 1.1.2.1:S defines no subroles forP.

Follows as in Case 1.1.1, again using Def. 1.7(a) and Prop. 5(ii).

Case 1.1.2.2:S defines subroles forP.

SinceS satisfies the role hierarchy restriction,S has no constraint of the formhv

¬( n P)or of the formhv¬( n P ). Then, by Prop. 3(vii), for any nodeK, if K!N, thenKis labeled only with negative concept expressions. Therefore, there

(17)

could be no seed nodeKsuch thatK!N. Hence, by Def. 1.7(a), there is no termt such thats⁰(N) =?, which contradicts the assumption thatt2s⁰(N).

Therefore, in all cases, we established that, for all termst, if t2s⁰(N)thent2 s(¬e).

Case 1.2:Nis a role expression node.

Follows likewise, using Prop. 5(ii) again and Def. 1.7(b).

Thus, in both cases, we established thats⁰(N)✓s(¬e), as desired.

Case 2:Nis a>-node.

Let ¯N be the dual node of N. SinceN is a>-node, we have that ¯N is a?-node.

Furthermore, since ¬elabels N,e labels ¯N. Sincee is a positive expression, by Lemma 1.1(i),s⁰(N) =¯ s(e) =?.

Case 2.1:Nis a concept expression node.

By Def. 1.7(d) and definition ofs(¬e), we haves⁰(N) =D[F] =s(¬e), which trivially impliess⁰(N)✓s(¬e).

Case 2.2:Nis a role expression node.

By Def. 1.7(e) and definition ofs(¬e), we then haves⁰(N) =D[F]⇥D[F] =s(¬e), which trivially impliess⁰(N)✓s(¬e).

Therefore, we established that, in all cases, Lemma 1.1(ii) holds.

We are now ready to state the first result of the chapter.

Theorem 1.1.Assume that S satisfies the role hierarchy restriction. Let s be the canonical Herbrand interpretation induced by a canonical instance labeling function for G(S,W)andD[F]. Then, we have

(i) s is a model ofS.

(ii) Let e be an atomic concept or a minCardinality constraint of the form( 1P).

Let N be the node of G(S,W)labeled with e. Then, N is a?-node iff s(e) =?. (iii) Let e be a minCardinality constraint of the form( k P), with k>1. Assume

that G(S,W)has a node labeled with e. Then, N is a?-node iff s(e) =?. (iv) Let P be an atomic role. Let N be the node of G(S,W)labeled with P. Then, N

is a?-node iff s(P) =?.

Proof SketchLetS be a set of normalized constraints andW be a set of constraint expressions. LetG(S,W)be the graph that representsS andW. LetF be a set of distinct function symbols and D[F] be the Herbrand Universe forF. Let s⁰ be a canonical instance labeling function forG(S,W)andD[F]andsbe the interpretation induced bys⁰.

(i)We prove thatssatisfies all constraints inS.

Letevf be a constraint inS. By the restrictions on the constraints inS,emust be positive andf can be positive or negative. Therefore, there are two cases to consider.

Case 1:eandf are both positive.

Then, by Lemma 1.1(i),s⁰(M) =s(e)ands⁰(N) =s(f), whereMandNare the nodes labeled witheandf, respectively. If M=N, then we trivially have thats⁰(M) = s⁰(N). So assume that M6=N. Sinceevf is inS andM6=N, there must be an arc(M,N)ofG(S,W). By Prop. 5(i), we then haves⁰(M)✓s⁰(N). Hence,s(e) = s⁰(M)✓s⁰(N) =s(f).

(18)

Case 2:eis positive andf is negative.

Then, by Lemma 1.1(i),s⁰(M) =s(e). and, by Lemma 1.1(ii),s⁰(N)✓s(f), where MandNare the nodes labeled witheandf, respectively. Since negative expressions do not occur on the left-hand side of constraints inS,eandf cannot label nodes that belong to the same clique in the original graph. Therefore, we have thatM6=N.

Sinceevf is inS andM6=N, there must be an arc(M,N)ofG(S,W). By Prop.

5(i), we then haves⁰(M)✓s⁰(N). Hence,s(e) =s⁰(M)✓s⁰(N)✓s(f).

Thus, in both cases,s(e)✓s(f). Therefore, for any constraintevf 2S, we have thats|=evf, which implies thatsis a model ofS.

(ii)Letebe an atomic concept or a minCardinality constraint of the form( 1P).

By Stage 1 of Def. 1.1, G(S,W)always has a nodeN labeled withe. Since eis positive, by Lemma 1.1(i),s(e) =s⁰(N).

Assume that N is a ?-node. Then, by Lemma 1.1(i) and Def. 1.7(c), s(e) = s⁰(N) =?.

Assume that N is not a ?-node. Note that N cannot be a>-node, sinceN is labeled with the positive expressione. Then,N is neither a?-node nor a>-node.

By Def. 1.6(ii) and Def. 1.7(a), the seed termc[N]ofN is such thatc[N]2s⁰(N).

Hence, trivially,s(e) =s⁰(N)6=?. (iii)(iv)Follows as for (ii).

Based on Theorem 1.1, we can then create a simple procedure to test strict satisfiability, which has polynomial time complexity on the size ofS:

SAT(S)

input:a setSof extralite constraints that satisfies the role hierarchy restriction.

output:“YES -Sis strictly satisfiable”

“NO -Sis not strictly satisfiable”

1) Normalize the constraints inS, creating a setS⁰. 2) Construct the constraint graphG(S⁰)that representsS⁰.

3) IfG(S⁰)has no?-node labeled with an atomic concept or an atomic role, thenreturn “YES -Sis strictly satisfiable”;

elsereturn “NO -Sis not strictly satisfiable”.

From Theorem 1.1, we can also prove that:

Theorem 1.2.Assume thatSsatisfies the role hierarchy restriction. Letsbe a normalized extralite constraint. Assume thats is of the form evf and letW ={e,f}. Then,S|=siff one of the following conditions holds:

(i) The node of G(S,W)labeled with e is a?-node; or (ii) The node of G(S,W)labeled with f is a>-node; or

(iii) There is a path in G(S,W)from the node labeled with e to the node labeled with f .

(19)

Proof SketchLetSbe a set of normalized constraints. Assume thatS satisfies the role hierarchy restriction. Letevf be a constraint andW ={e,f}. LetG(S,W)be the graph that representsS andW. Observe that, by construction, G(S,W)has a node labeled witheand a node labeled withf. LetMandNbe such nodes, respectively.

(()Follows directly from Prop. 4.

())We prove that, if the conditions of the theorem do not hold, thenS6|=evf. Sinceevf is a constraint, we have:

(1) eis either an atomic conceptC, an atomic rolePor a minCardinality of the form ( k p), wherepis eitherPorP , and

(2) f is either an atomic conceptC, a negated atomic concept¬D, an atomic roleP, a negated atomic roleQ, a minCardinality constraint of the form( k p), or a negated minCardinality constraint of the form¬( k p), wherepis eitherPor P .

Assume that the conditions of the theorem do not hold, that is:

(3) The nodeMlabeled witheis not a?-node; and (4) The nodeNlabeled withf is not a>-node; and (5) There is no path inG(S,W)fromMtoN.

To prove thatS6|=evf, it suffices to exhibit a modelrofSsuch thatr6|=evf. Recall thatr6|=evf iff (i) ifeandf are concept expressions, there is an individual tsuch thatt2r(e)andt2/r(f)or, equivalently, t2r(¬f); (ii) ifeandf are role expressions, there is a pair of individuals(t,u)such that(t,u)2r(e)and(t,u)2/r(f) or, equivalently,(t,u)2r(¬f);

Recall that, to simplify the notation,e!fdenotes that there is a path inG(S,W) from the node labeled witheto the node labeled withf, ande9f to indicate that no such path exists.

Sinceevf is a constraint,emust be non-negative andf can be negative or not.

Hence, there are two cases to consider.

Case 1:eandf are both positive.

Lets⁰be a canonical instance labeling function forG(S,W)andsbe the interpretation induced bys⁰. By Theorem 1.1,sis a model ofS. We show thats6|=evf. Case 1.1:Nis a?-node.

SinceNis a?-node, by Prop. 4(iii), we have thatS|=f v ?, which implies that s(f) =?, sincesis a model ofS.

By (1),eis either an atomic conceptC, an atomic rolePor a minCardinality of the form( k p), wherepis eitherPorP . By (3),Mis not a?-node. Hence, we have thats(e)6=?, by Theorem 1.1(ii), (iii) and (iv). Hence, we trivially have that s6|=evf.

Case 1.2:Nis not a?-node.

Observe thatMandNare neither a?-node nor a>-node. By assumption of the case and by (4),Nis neither a?-node nor a>-node. Now, by (3),M is not a?-node.

(20)

Furthermore, by Prop. 3(iv-b), sinceM is labeled with a positive expressione,M cannot be a>-node.

By Lemma 1.1(i), sinceeis positive by assumption, by Def. 1.6(ii), (iii) and (iv), and by Def. 1.7(a) and (b), sinceMis neither a?-node nor a>-node, we have (6) s⁰(M) =s(e). and there is a seed termc[M]2s⁰(M), ifM is a concept expres-

sion nodes⁰(M) =s(e). and there is a seed pair (t,u)2s⁰(M), ifM is a role expression node

By definition of canonical instance labeling function, we have:

(7) For each concept expression nodeKofG(S,W)that is neither a?-node nor a

>-node,c[M]2s⁰(K)iff there is a path fromMtoK

For each role expression node K of G(S,W)that is neither a ?-node nor a

>-node,(t,u)2s⁰(K)iff there is a path fromMtoK

By (5), we havee9f. Furthermore,Nis neither a?-node nor a>-node. Hence, by (7), we have:

(8) c[M]2/s⁰(N), ifNis a concept expression node (t,u)2/s⁰(N), ifNis a role expression node

Sincef is positive, by Lemma 1.1(i),s⁰(N) =s(f). Hence, we have (9) c[M]2/s(f), iff is a concept expression

(t,u)2/s(f), iff is a role expression

Therefore, by (6) and (9),s(e)6✓s(f), that is,s6|=evf, as desired.

Case 2:eis positive andf is negative.

Assume thatf is a negative expression of the form¬g, wheregis positive.

Case 2.1:e!g.

Lets⁰be a canonical instance labeling function forG(S,W)andsbe the interpretation induced bys⁰. By Theorem 1.1(i),sis a model ofS. We show thats6|=evf.

By Prop. 4(i) and (ii), and sincesis a model ofS, we have thats|=e⌘g, ifeand glabel the same node, ands|=evg, otherwise. Hence, we have thats6|=ev¬g.

Now, sincef is¬g, we haves6|=evf, as desired.

Case 2.2:e9g.

ConstructFas follows:

(10) F isS with two new constraints,HveandHvg, whereHis a new atomic concept, ifeandgare concept expressions, orHis a new atomic role, ifeand gare role expressions

Letr⁰ be a canonical instance labeling function for G(F,W)andr be the interpretation induced by r⁰. By Theorem 1.1(i),r is a model of F. We show that r6|=evf.

We first observe that

(11) There is no expressionhsuch thate!handg!¬hare paths inG(S,W)

(21)

By construction ofG(S,W),g!¬hiffh!¬g. Bute!handh!¬gimplies e!¬g, contradicting (5), sincef is¬g. Hence, (11) follows.

We now prove that

(12) There is no positive expressionh such thatH!h andH!¬h are paths in G(F,W)

Assume otherwise. Lethbe a positive expression such thatH!handH!¬h are paths inG(F,W).

Case 2.2.1:H!e!handH!g!¬hare paths inG(F,W).

Then,e!handg!¬hmust be paths inG(S,W), which contradicts (11).

Case 2.2.2:H!e!¬handH!g!hare paths inG(F,W).

Then,e!¬handg!hmust be paths inG(S,W). But, sinceg!hiff¬h!¬g, we havee!¬h!¬gis a path inG(S,W), which contradicts (5), recalling thatf is¬g.

Case 2.2.3:H!e!handH!e!¬hare paths inG(F,W).

Then,e!hande!¬hmust be paths inG(S,W), which contradicts (3), by definition of?-node.

Case 2.2.4:H!g!handH!g!¬hare paths inG(F,W).

Then,g!handg!¬hmust be paths inG(S,W). Now, observe that, since¬gis f, that is,f andgare complementary expressions,glabels ¯N, the dual node ofNin G(S,W). Then,g!handg!¬himplies that ¯Nis a?-node ofG(S,W), that is, Nis a>-node, which contradicts (4).

Hence, we established (12).

LetKbe the node ofG(F,W)labeled withH. Note that, by construction ofF, Kis labeled only withH. Then, by (12),Kis not a?-node.

By Theorem 1.1(i),ris a model ofF. Furthermore, by Theorem 1.1(ii) and (iv), and sinceKis not a?-node, we have

(13) r(H)6=?

SinceHveandHvgare inF, and sinceris a model ofF, we also have:

(14) r(H)✓r(e)andr(H)✓r(g)

Therefore, by (13) and (14) and sincef =¬g

(15) r(e)\r(g)6=?or, equivalently,r(e)6✓r(¬g)or, equivalently,r(e)6✓r(f)or, equivalently,r6|=evf

But sinceS✓F,ris also a model ofS. Therefore, for Case 2.2, we also exhibited a modelrofSsuch thatr6|=evf, as desired.

Therefore, in all cases, we exhibited a model ofSthat does not satisfyevf, as desired.

Based on Theorem 1.2, we can then create a simple procedure to test logical implication:

(22)

IMPLIES(S,evf)

input: a setSof constraints satisfies the role hierarchy restriction, and a constraintevf

output: “YES -S logically impliesevf ”

“NO -Sdoes not logically implyevf ” 1) Normalize the constraints inS, creating a setS⁰.

2) Normalizeevf, creating a normalized constrainte⁰vf⁰. 3) ConstructG(S⁰,{e⁰,f⁰}).

4) If the node of G(S⁰,{e⁰,f⁰}) labeled with e⁰ is a ?-node, or the node of G(S⁰,{e⁰,f⁰})labeled withf⁰ is a>-node, or there is a path in G(S⁰,{e⁰,f⁰}) from the node labeled withe⁰to the node labeled withf⁰,

thenreturn “YES -Slogically impliesevf”;

elsereturn “NO -Sdoes not logically implyevf”.

Note thatIMPLIEShas polynomial time complexity on the size ofS[{evf}.

1.4 Examples

1.4.1 Examples of Extralite Schemas

In this section, we introduce examples of concrete, albeit simple extralite schemas with role hierarchies to illustrate how to capture commonly used ER model and UML constructs as extralite constraints.

!"#$%&'(%% is a subset of '(%%

)"#*%(+&,-. is a subset of /%(+&,-.

!"#$%#&'#%#&$%(")*#+',)#*%*'-,).%#&'#%!"#$%&*0"1&%,*%'%*/0*$#%"1%*0"1&%')2%#&'#2

!"#$%&'(%%%,*%'%*/0*$#%"1%'(%%%2"%)"#%,345-%#&'#%)"#*%(+&,-.%,*%'%*/0*$#%"1%/%(3 +&,-.6% 7)% .$)$+'58% (")($4#% ,)(5/*,")*% 2"% )"#% ,345-% +"5$% ,)(5/*,")*8% '*% '5+$'2-% 2,*9 (/**$2%'#%#&$%$)2%"1%:$(#,")%;6<6%% %

!"#$%&'( )=% >,.6% ;?'@% *&"A*% #&$% BC% 2,'.+'3% "1% #&$% D&")$E"34')-;% *(&$3'8% ')2%

>,./+$%;?0@%1"+3'5,F$*%#&$%(")*#+',)#*8%1"55"A,).%#&$%*'3$%"+.'),F'#,")%'*%#&'#%,)%>,.6%

<?0@6%!"#$%#&'#=%

!"#$%&*0"1& and *0"1& are disjoint atomic concepts

!"#$%&'(%% and '(%% are disjoint atomic concepts

*%(+&,-. is an atomic role modeling a binary relationship from '(%% to

*0"1&

)"#*%(+&,-. is an atomic role modeling a binary relationship from !"#$%&3 '(%% to !"#$%&*0"1&

the constraints of the schema imply that *%(+&,-. and )"#*%(+&,-. are disjoint roles, by the disjunction-transfer rule introduced at the end of Section 2.1 (see also Example 3(b)).

duration number

location

*+,-(./#0-%BC%2,'.+'3%"1%#&$%D&")$E"34')-<%*(&$3'%?A,#&"/#%('+2,)'5,#,$*@6%

{disjoint}

%14)#&5% %*0"1&2

%14)#&5% %675$18%2

%,45(7$"1% %'(%%2

%,45(7$"1% %675$18%2

%%"+(7$"1% %!"#$%&'(%%2

%%"+(7$"1% %675$18%2

%/%(+&,-.% %'(%%2

%/%(+&,-.% %*0"1&2

%)"#*%(+&,-.% %!"#$%&'(%%%

%)"#*%(+&,-.% %!"#$%&*0"1&%

*0"1&% %? !9%14)#&5@%

'(%%% %? !9%,45(7$"1@%

!"#$%&'(%%% %? !9%%"+(7$"1@%

'(%%% %? !9%/%(+&,-.@%

!"#$%&'(%%% %? !9%)"#*%(+&,-.@%

:$;&,*0"1&%%

%%%%%%% %*0"1&%

!"#$%&*0"1&%%

%%%%%%% %*0"1&%

!"#$%&*0"1&%%

(((((1(:$;&,*0"1&2

!"#$%&'(%%%%

%%%%%%% %'(%%2 )"#*%(+&,-.%%

%%%%%%% %/%(+&,-.%

*+,-(./20-%>"+3'5%2$1,),#,")%"1%#&$%(")*#+',)#*%"1%#&$%D&")$E"34')-<%*(&$3'62

%

Call

MobileCall

placedBy

mobPlacedBy

Phone

FixedPhone MobilePhone

Fig. 1.1 ER diagram of the PhoneCompany1 schema (without cardinalities).

Example 1:Figure 1.1 shows the ER diagram of the PhoneCompany1 schema.

Figure 1.2 formalizes the constraints: the first column shows the domain and range constraints; the second column depicts the cardinality constraints; and the third column contains the subset and disjointness constraints.

The first column of Figure1.2 indicates that:

(23)

1 On the Problem of Matching Database Schemas 23 )"#*%(+&,-. is a subset of /%(+&,-.

!"#$%#&'#%#&$%(")*#+',)#*%*'-,).%#&'#%!"#$%&*0"1&%,*%'%*/0*$#%"1%*0"1&%')2%#&'#2

!"#$%&'(%%%,*%'%*/0*$#%"1%'(%%%2"%)"#%,345-%#&'#%)"#*%(+&,-.%,*%'%*/0*$#%"1%/%(3 +&,-.6% 7)% .$)$+'58% (")($4#% ,)(5/*,")*% 2"% )"#% ,345-% +"5$% ,)(5/*,")*8% '*% '5+$'2-% 2,*9 (/**$2%'#%#&$%$)2%"1%:$(#,")%;6<6%% %

!"#$%&'( )=% >,.6% ;?'@% *&"A*% #&$% BC% 2,'.+'3% "1% #&$% D&")$E"34')-;% *(&$3'8% ')2%

>,./+$%;?0@%1"+3'5,F$*%#&$%(")*#+',)#*8%1"55"A,).%#&$%*'3$%"+.'),F'#,")%'*%#&'#%,)%>,.6%

<?0@6%!"#$%#&'#=%

!"#$%&*0"1& and *0"1& are disjoint atomic concepts

!"#$%&'(%% and '(%% are disjoint atomic concepts

*%(+&,-. is an atomic role modeling a binary relationship from '(%% to

*0"1&

)"#*%(+&,-. is an atomic role modeling a binary relationship from !"#$%&3 '(%% to !"#$%&*0"1&

the constraints of the schema imply that *%(+&,-. and )"#*%(+&,-. are disjoint roles, by the disjunction-transfer rule introduced at the end of Section 2.1 (see also Example 3(b)).

duration number

location

*+,-(./#0-%BC%2,'.+'3%"1%#&$%D&")$E"34')-<%*(&$3'%?A,#&"/#%('+2,)'5,#,$*@6%

{disjoint}

%14)#&5% %*0"1&2

%14)#&5% %675$18%2

%,45(7$"1% %'(%%2

%,45(7$"1% %675$18%2

%%"+(7$"1% %!"#$%&'(%%2

%%"+(7$"1% %675$18%2

%/%(+&,-.% %'(%%2

%/%(+&,-.% %*0"1&2

%)"#*%(+&,-.% %!"#$%&'(%%%

%)"#*%(+&,-.% %!"#$%&*0"1&%

*0"1&% %? !9%14)#&5@%

'(%%% %? !9%,45(7$"1@%

!"#$%&'(%%% %?!9%%"+(7$"1@%

'(%%% %? !9%/%(+&,-.@%

!"#$%&'(%%% %?!9%)"#*%(+&,-.@%

:$;&,*0"1&%%

%%%%%%% %*0"1&%

!"#$%&*0"1&%%

%%%%%%% %*0"1&%

!"#$%&*0"1&%%

(((((1(:$;&,*0"1&2

!"#$%&'(%%%%

%%%%%%% %'(%%2 )"#*%(+&,-.%%

%%%%%%% %/%(+&,-.%

*+,-(./20-%>"+3'5%2$1,),#,")%"1%#&$%(")*#+',)#*%"1%#&$%D&")$E"34')-<%*(&$3'62

%

Call

MobileCall

placedBy

mobPlacedBy

Phone

FixedPhone MobilePhone

Fig. 1.2 Formal definition of the constraints of the PhoneCompany1 schema.

• numberis an atomic role modeling an attribute ofPhonewith rangeString

• durationis an atomic role modeling an attribute ofCallwith rangeString

• locationis an atomic role modeling an attribute ofCallwith rangeString

• placedBy is an atomic role modeling a binary relationship from Call to Phone

• mobPlacedByis an atomic role modeling a binary relationship fromMobile- Call toMobilePhone

The second column of Figure 1.2 shows the cardinalities of the PhoneCompany1 schema:

• numberhas maxCardinality and minCardinality both equal to 1 w.r.t.Phone

• durationhas maxCardinality and minCardinality both equal to 1 w.r.t.Call

• locationhas maxCardinality and minCardinality both equal to 1 w.r.t.Mo- bileCall

• placedByhas maxCardinality and minCardinality both equal to 1 w.r.t.Call

• (placedBy has unbounded maxCardinality and minCardinality equal to 0 w.r.t.Phone, which need not be explicitly declared)

• mobPlacedByhas maxCardinality and minCardinality both equal to 1 w.r.t.

MobileCall

• (mobPlacedBy has unbounded maxCardinality and minCardinality equal to 0 w.r.t.MobilePhone, which need not be explicitly declared)

The third column of Figure 1.2 indicates that

• MobilePhoneandFixedPhoneare subsets ofPhone

• MobilePhoneandFixedPhoneare disjoint

• MobileCallis a subset ofCall

• mobPlacedByis a subset ofplacedBy

Note that the constraints saying thatMobilePhoneis a subset ofPhoneand that MobileCall is a subset of Call do not imply that mobPlacedByis a subset ofplacedBy. In general, concept inclusions do not imply role inclusions, as already discussed at the end of Section 1.2.1.