A connection-based reasoner for ALC ontologies

(1)

DIMAS LUIZ DIOGO DE MELO FILHO

A CONNECTION-BASED REASONER FOR ALC

ONTOLOGIES

Federal University of Pernambuco posgraduacao@cin.ufpe.br www.cin.ufpe.br/~posgraduacao

RECIFE 2015

(2)

DIMAS LUIZ DIOGO DE MELO FILHO

A CONNECTION-BASED REASONER FOR ALC

ONTOLOGIES

A M.Sc. Dissertation presented to the Center for Informatics of Federal University of Pernambuco in partial fulfillment of the requirements for the degree of Master of Science in Computer Science.

Supervisor: Fred Freitas Co-Supervisor: Jens Otten

RECIFE 2015

(3)

Catalogação na fonte

Bibliotecário Jefferson Luiz Alves Nazareno CRB 4-1758

M568c Melo Filho, Dimas Luiz Diogo.

A connection-based reasoner for ALC ontologies / Dimas Luiz Diogo Melo Filho. – 2015.

74f.: fig., tab.

Orientador: Frederico Luiz Gonçalves de Freitas.

Dissertação (Mestrado) – Universidade Federal de Pernambuco. CIn. Ciência da Computação, Recife, 2015.

Inclui referências e apêndices.

1. Inteligência artificial. 2. Ontologias. 3. Lógicas de descrição. I. Freitas, Frederico Luiz Gonçalves de. (Orientador). II. Titulo.

(4)

DIMAS LUIZ DIOGO DE MELO FILHO

A CONNECTION-BASED REASONER FOR ALC

ONTOLOGIES

Dissertação apresentada ao programa de Pós-Graduação em Ciência da Computação do Centro de Informática da Universidade Federal de Pernam-buco, como requisito parcial para obtenção do título de Mestre em Ciência da Computação.

Aprovado em: 14/09/2015.

BANCA EXAMINADORA

———————————————————————————— Profº. Dr. Frederico Luiz Gonçalves de Freitas (Orientador)

Universidade Federal de Pernambuco

———————————————————————————— Profº. Dr. Ruy José Guerra Barreto de Queiroz (Examinador Interno)

Universidade Federal de Pernambuco

———————————————————————————— Profº. Dr. Ivan José Varzinczak (Examinador Externo)

(5)

(6)

Acknowledgements

I would like to express my appreciation and gratitude to my advisor Professor Frederico de Freitas, you taught and guided me to the right directions. I also would like to thank Dr. Jens Otten for expounding numerous aspects and intricacies of mathematics and logics. Thanks to Adriano Melo for the discussions and suggestions which proved very useful throughout the research.

Words cannot express how grateful I am to my family, and in special to my beloved wife Renata, which inspirits me and entrusts her unconditional support through the journey of life.

(7)

Wir müssen wissen — wir werden wissen!

(8)

Resumo

O Método de Conexões é um procedimento de prova baseado na busca de conexões entre literais complementares em cláusulas diferentes dentro de uma matriz. Abordagens baseadas em conexões foram usadas como base para a construção de raciocinadores automáticos em diversas lógicas, tais como lógica modal, lógica intuicionista e lógica de primeira ordem. Neste contexto, este trabalho apresenta o Raccoon, um raciocinador automático multi-plataforma para Lógicas de Descrição, em particular para a Linguagem de Descrição ALC. O raciocinador é baseado na abordagem CM-ALC, que é uma adaptação do Método de Conexões para ALC. O raciocinador implementa normalização, utiliza técnicas adicionais de otimização e suporta o raciocínio sobre ontologias OWL 2. Foram conduzidos experimentos a fim de avaliar o desempenho do raciocinador, comparando-o com raciocinadores do estado da arte, utilizando o dataset da OWL Reasoner Evaluation Workshop de 2014. Os resultados apresentados indicam que o raciocinador possui um desempenho competitivo para ontologias ALC.

Palavras-chave: Ontologias. Raciocínio Automático. Método de Conexões. Cálculo de Conexões. Lógicas de Descrição. ALC.

(9)

Abstract

The Connection Method is a goal-oriented proof procedure based on the search of connections between complementary literals in different clauses of a matrix. Connection-based approaches were successfully used as a foundation for automated reasoners for various logics, such as modal logic, intuitionistic logic and first-order logic. In this context, this work presents a multi-platform automated reasoner for Description Logics, particularly for the ALC Description Language. The reasoner is based on the CM-ALC approach, a variant of the Connection Method for ALC. The reasoner implements a type of normalization for DL, extends the method with different optimization techniques and supports parsing and reasoning over OWL 2 ontologies. Experiments were carried out to evaluate how the parser performs, comparing it with the state-of-the-art reasoners using the dataset from the OWL Reasoner Evaluation Workshop of 2014. The experiments show that the reasoner performs competitively for ALC ontologies.

Keywords: Automatic Reasoning. Ontologies. Connection Method. Connection Calculus. Description Logics. ALC.

(10)

List of Figures

2.1 A DNF formula represented as a matrix. . . 23

2.2 All possible four paths through an example matrix. . . 23

2.3 Examples of partial paths for a matrix. . . 24

2.4 Example of connection . . . 24

2.5 A connection in a path makes the path a tautology. . . 25

2.6 A non-complementary matrix. . . 25

2.7 A complementary matrix. . . 26

2.8 Connection Method proof as a tree . . . 26

2.9 Axioms of the Connection Calculus . . . 27

2.10 A proof in Connection Calculus . . . 28

3.1 LDNF clauses in matricial form. . . 46

3.2 Clauses in matricial form, as handled by the reasoner. . . 46

3.3 Skolemized clauses in matricial form, as handled by the reasoner. . . 47

3.4 Rules of the Connection Calculus . . . 47

3.5 Full example of Connection Method for the ALC description language (CM-ALC) normalization and reasoning. . . 49

3.6 Ontology in OWL functional-style syntax. . . 51

3.7 DL representation of the OWL ontology. . . 51

3.8 LDNF representation of the OWL ontology. . . 51

3.9 Proof that the OWL ontology is inconsistent. . . 52

3.10 Raccoon internal representation represented in a UML class diagram. . . . 53

3.11 Raccoon front end for OWL 2 functional-style syntax represented in A UML class diagram. . . 54

3.12 Raccoon back end for CM-ALC represented in A UML class diagram. . . 54

3.13 Pseudocode of the Raccoon reasoning. . . 55

4.1 CM-ALC vs CM-ALCp reasoning time per number of axioms. . . 62

4.2 CM-ALC vs Konclude reasoning time per number of axioms. . . 62

4.3 CM-ALC vs ELK reasoning time per number of axioms. . . 63

(11)

LIST OF FIGURES

(12)

List of Tables

2.1 Semantics of Attributive Language (AL) expressions . . . 33

2.2 Semantics of the Language Extensions . . . 33

2.3 Semantics for the new expressions in EL++ . . . 34

2.4 ALC subset of OWL 2, supported by Raccoon . . . 40

3.1 Rules for converting DL statements to DNF. . . 44

3.2 Example of normalization of a DL expression. . . 46

4.1 Reasoner versions and command lines used for consistency checking. . . 59

4.2 Comparison of the number of ontologies in which CM-ALC and CM-ALCp performed better. . . 59

4.3 Comparison of the number of ontologies in which CM-ALC and Konclude performed better. . . 60

4.4 Comparison of the number of ontologies in which CM-ALC and ELK performed better. . . 60

4.5 Comparison of the number of ontologies in which CM-ALC and FaCT++ performed better. . . 60

4.6 Comparison of the number of ontologies in which CM-ALC and Hermit performed better. . . 60

4.7 Overall comparison of the number of ontologies in which each reasoner performed better. . . 60

4.8 Number of AL ontologies reasoned in different time frames. . . 61

4.9 Number of Attributive Language with full Existential quantification (ALE) ontologies reasoned in different time frames. . . 61

4.10 Number of Attributive Language with general Complements (ALC) on-tologies reasoned in different time frames. . . 61

5.1 Comparison of the Axioms used by the OWL profiles . . . 68

(13)

List of Acronyms

AL Attributive Language . . . 32

ALC Attributive Language with general Complements . . . 33

CM-ALC Connection Method for the ALC description language . . . 17

ALE Attributive Language with full Existential quantification . . . . API Application Programming Interface ATP Automatic Theorem Proving . . . 16

BNF Backus-Naur Form . . . 56

CM Connection Method. . . .16

DL Description Logics . . . 16

DNF Disjunctive Normal Form . . . 23

IRI Internationalized Resource Identifier FOL First-Order Logic . . . 42

KR Knowledge Representation . . . 16

KB Knowledge Base LDNF Limited Disjunctive Normal Form . . . 51

OWL Web Ontology Language . . . 16

W3C World Wide Web Consortium . . . 35

lhs Left-Hand-Side . . . 32

rhs Right-Hand-Side . . . 43

(14)

Contents

1 Introduction . . . 16

1.1 Aims and Scope . . . . 17

1.2 Outline of the Dissertation . . . . 18

2 Theoretical Foundations . . . 19

2.1 Language and Notation . . . . 19

2.1.1 Basic Notation . . . . 19 2.1.2 Language Syntax . . . . 20 2.1.3 Language Semantics . . . . 21 2.2 Connection Method . . . . 22 2.2.1 Connection Calculus . . . . 27 2.3 Description Logics . . . . 28 2.3.1 Concept Constructors . . . 29 2.3.2 Knowledge Bases . . . . 31 2.3.2.1 TBox . . . . 31 2.3.2.2 ABox . . . . 32 2.3.3 Description Languages . . . . 32 2.3.3.1 AL . . . 32 2.3.3.2 AL Extensions . . . . 33 2.3.3.3 ALC . . . . 33 2.3.3.4 EL . . . . 34 2.3.3.5 EL++ . . . 34 2.3.3.6 Other Languages . . . . 35

2.4 Web Ontology Language . . . . 35

2.4.1 OWL 2 Syntax . . . . 35

2.4.1.1 Individuals . . . . 36

2.4.1.2 Classes . . . . 36

2.4.1.3 Object Properties . . . . 36

2.4.1.4 Class Expressions . . . . 37

(15)

CONTENTS

2.4.1.6 Object Property Axioms . . . . 38

2.4.1.7 Keys . . . . 39 2.4.2 Profiles . . . . 39 2.5 Related Work . . . . 39 2.5.1 Konclude . . . . 40 2.5.2 ELK . . . . 40 2.5.3 FaCT++ . . . . 41 2.5.4 Hermit . . . . 41

3 Method and Development . . . 42

3.1 CM-ALC . . . . 42

3.1.1 ALC Disjunctive Normal Form . . . . 42

3.1.2 Normalization of the Knowledge Base . . . . 43

3.1.3 Reasoning . . . 46

3.1.3.1 Search Algorithm . . . . 49

3.1.4 Optimization and Reduction . . . . 50

3.1.5 Termination, Soundness and Completeness . . . . 50

3.2 Raccoon . . . . 51 3.2.1 Architecture . . . 52 3.2.2 Reasoning Algorithm . . . . 55 3.2.3 pgen . . . . 56 4 Experimental Results . . . 58 5 Conclusion . . . 64 References . . . 65

Appendix A - Comparison of the Main OWL Profiles . . . 68

(16)

16 16 16

1

Introduction

In the last two decades, there were efforts to define standards for representing knowledge due to an emerging trend at the time: the Semantic Web (BERNERS-LEE et al., 2001). One of the resulting standards, the Web Ontology Language (OWL), is a semantic markup language for representing ontologies. It replaced older technologies which served the same purpose, such as DAML+OIL (BECHHOFER, 2009; MCGUINNESS; HARMELEN, 2004).

Ontologies define a common set of terms and rules that are used to describe and represent an area of knowledge. Ontologies are often expressed in terms of a logic-based language, to provide accurate, meaningful, detailed, consistent and sound assertions regarding the concepts and rules of the domain (GRUBER, 2009).

OWL is based on Description Logics (DL) (BECHHOFER, 2009), which is a set of formalisms for Knowledge Representation (KR) (BAADER; NUTT, 2010). Descrip-tion Logics comprise a family of logic-based DescripDescrip-tion Languages, each providing a distinguished set of constructors, varying on expressiveness (BAADER; NUTT, 2010).

Much of the value of DL, OWL, and other Knowledge Representation formalisms, is in the prospects of using Automatic Theorem Proving (ATP) to assist users by providing automatic inference, classification, consistency checks, and other related services.

There are numerous excellent Automated Theorem Provers for OWL, such as Konclude (STEIGMILLER; THORSTEN; GLIMM, 2014), Hermit (MOTIK; SHEARER; HORROCKS, 2009), FaCT++ (TSARKOV; HORROCKS, 2006), Pellet (SIRIN et al., 2007), RacerPro (HAARSLEV et al., 2012) and ELK (KAZAKOV; KLINOV, 2015). How-ever most of them are specialized variations of the Tableau proof procedure (HAARSLEV et al., 2012; MOTIK; SHEARER; HORROCKS, 2009; SIRIN et al., 2007; STEIGMILLER; THORSTEN; GLIMM, 2014) and none is based on the Connection Method (BIBEL, 1987). The Connection Method (CM) (BIBEL, 1987) is a goal-oriented proof procedure led by connections, which are solely sets of complementary literals connecting two clauses. A proof in CM consists of connecting all the literals of a starting clause, and also all literals

(17)

1.1. AIMS AND SCOPE 17 from each connected clause recursively (BIBEL, 1987).

Successful implementations of variations of the connection method were presented for First-Order logic (OTTEN; BIBEL, 2003), Intuitionistic Logic (OTTEN, 2008), and Modal Logic (OTTEN, 2014). An approach based on the Connection Method for rea-soning over a subset of Description Logics was presented in (FREITAS, 2011), and was proven sound and complete. However no reasoner implements this approach or any other Connection-based method for Description Logics to this date.

1.1 Aims and Scope

Research Question Is the Connection Method a feasible and practical approach for reasoning with Description Logics?

Aim of the Research The aim of the research is to develop a specialized reasoner for OWL using the Connection Method approach, based on the research of (FREITAS, 2011), (OTTEN, 2010) and (LETZ; MAYR; GOLLER, 1994), and compare its performance with state-of-the-art OWL reasoners.

Objectives The objectives of this research are the following: to develop a parser generator;

to develop a specialized reasoner for a subset of OWL using a Connection Method Approach;

to implement optimizations for the reasoner;

to evaluate the performance of the reasoner;

to compare the reasoner with state-of-the-art reasoners on a standard dataset; Out of Scope The reasoner implemented by this work will not support the full OWL

DL, this work will focus on a specific subset of DL.

Statement of the Contributions The main contributions of this work are:

A detailed description of CM-ALC, a specialized version of the Connection Method for a subset of DL.

An implementation of a variation of CM-ALC.

The development of a parser generator for C.

A parser for OWL 2, encoded in YAML.

An evaluation of the performance of Connection Method for the ALC descrip-tion language (CM-ALC) compared to other state-of-the-art DL reasoners.

(18)

1.2. OUTLINE OF THE DISSERTATION 18

1.2 Outline of the Dissertation

This work is structured as follows. The Chapter 2 establishes the theoretical foundations, presenting an overview of the Connection Method, Description Logics and OWL. A description of the Connection Method approach for description logics is presented in Chapter 3, as well as the full description regarding the framework and the implementation of the reasoner. All details of the evaluation of the reasoner, including the results of the experimentation are in Chapter 4. Finally, Chapter 5 presents the conclusion of the research and future work. Additional information regarding the profiles of the OWL can be found on Appendice A.

(19)

19 19 19

2

Theoretical Foundations

This chapter defines and provides examples for the Connection Method, Description Logics and Web Ontology Language. A standard notation for First-Order Logic and Description Logics is also established to be used throughout the rest of the text.

2.1 Language and Notation

Although natural languages evolved to serve the practical purposes of easing human communication of ideas and thoughts, they lack the precision and soundness necessary for logical representation, analysis and reasoning, as stated by (CHURCH, 1996). Hence, there is a need to formalize a standard notation and define a standard language to be used throughout the text.

2.1.1 Basic Notation

Definition 2.1.1 (Variable). A variable is a symbol which can assume various values (CHURCH, 1996). This work uses the lowercase symbols x, y, z, x1, y1, z1, x2, y2, z2, ... to represent variables.

Definition 2.1.2 (Function). A function is an operation which, when applied to certain argument, always yields a specific value (CHURCH, 1996). This work uses the symbols

f0, f1, f2, ... to represent functions.

A similar definition of function, from set theory is given by (HRBACEK; JECH, 1999): a function is a relation F between two objects a and b such that aF b1 and aF b2 implies that b1= b2.

Definition 2.1.3 (Function Arity). Each function f has a positive integer n associated with it, representing the number of arguments of the function, its arity, such that f is an

(20)

2.1. LANGUAGE AND NOTATION 20 Definition 2.1.4 (Function Domain). The domain of a function f is the set dom(f ) = {x| there exists y such that (x, y) ∈ f } (HRBACEK; JECH, 1999).

Definition 2.1.5 (Function Range). The range of a function f is the set ran(f ) = {y| there exists x such that (x, y) ∈ f } (HRBACEK; JECH, 1999).

Definition 2.1.6 (Constant). A constant represents a definite object, a proper name (CHURCH, 1996; HILBERT; ACKERMANN; LUCE, 1950). This research uses the symbols a, b, c, a1, b1, c1, a2, b2, c2, ... to denote constants.

Definition 2.1.7 (Truth-Value). There are two abstract objects called truth values, one of them being truth and the other falsehood (CHURCH, 1996). They are represented by

T and F respectively, as in (FITTING, 1996).

Definition 2.1.8 (Predicate). Predicates are functions that represent properties of objects. An assertion regarding a property of an object may represent truth or falsehood depending on the object (HILBERT; ACKERMANN; LUCE, 1950, p. 56-57). This work relies on the symbols P, Q, R, P1, Q1, R1, P2, Q2, R2, ... to represent predicates.

Definition 2.1.9 (Predicate Arity). Each predicate P has a positive integer n associated with it, representing the number of arguments of the predicate, its arity, such that P is an

n-place predicate symbol (FITTING, 1996).

Definition 2.1.10 (Connectives). The symbol ¬ denotes the logical negation, ∧ denotes conjunction, ∨ represents a disjunction, → denotes an implication and ≡ the equivalence.

¬ is a unary connective, and ∧, ∨, →, ≡ are binary connectives.

Definition 2.1.11 (Quantifiers). The symbols ∀ and ∃ denote the universal quantifier and existential quantifier, respectively.

The syntax and semantics of the language are presented in the following subsection.

2.1.2 Language Syntax

The language used by this work is defined inductively, as follows. Let L denote the first-order language being described.

Definition 2.1.12 (Terms). A term of the language L is defined as follows (CHANG; KEISLER, 1990):

(1) All variables x, y, z, x1, y1, z1, x2, y2, z2, ... are terms.

(21)

2.1. LANGUAGE AND NOTATION 21 (3) If f is a function and t0, t1, ..., tk are terms, then f (t0, t1, ..., tk) is a term.

(4) A string of symbols is a term if and only if it can be shown to be a term by a finite number of applications of (1)-(3).

Definition 2.1.13 (Atoms). The set A of atoms or atomic formulae of the language L comprises all predicates P, Q, R, P1, Q1, R1, P2, Q2, R2, ... of the form P (t0, t1, ..., tk)

where t0, t1, ..., tk are terms.

Considering the set of atoms established by Definition 2.1.13 it is possible to define the set of well-formed formulae of the language as follows.

Definition 2.1.14 (Well-Formed Formulae). The set of well-formed formulae of the language L is defined as follows:

(1) if ϕ is an atomic formula, then ϕ is a well-formed formula;

(2) if ϕ is a well-formed formula, then ¬(ϕ) is a well-formed formula;

(3) if ϕ and ψ are well-formed formulae and ◦ is a binary connective, then (ϕ ◦ ψ) is a well-formed formula;

(4) if ϕ is a formed formula and x is a variable, then ∀x(ϕ) and ∃x(ϕ) are well-formed formulae;

(5) a finite sequence of symbols is a well-formed formula if and only if it can be shown to be a well-formed formula by a finite number of applications of (1)-(4).

For convenience, the parentheses of formulae may be omitted when opportune, without changing the meaning of the expression. The following precedence of connectives is considered: ≡, →, ∨, ∧, ¬, meaning that the equivalence is considered prior to all the other connectives, and the negation is the last to be evaluated. For instance, ((((¬(P1) ∧

Q1) ∨ R1) → P2) ≡ Q2) is the same as ¬P1∧ Q1∨ R1→ P2≡ Q2.

2.1.3 Language Semantics

In order to define the meaning of a first-order language, this work resorts to models and interpretations as in (CHANG; KEISLER, 1990; MENDELSON, 2015; SMULLYAN, 1995).

Definition 2.1.15 (Model and Interpretation). According to (CHANG; KEISLER, 1990), a model A for L is a pair A = h∆, Ii, where:

(22)

2.2. CONNECTION METHOD 22 (2) I denotes an interpretation function mapping the symbols of L to appropriate

relations, functions and constants in ∆.

(3) Each n-place predicate P corresponds to an n-place relation PI ⊂ ∆n _{on ∆.}

(4) Each m-place function f corresponds to an m-place function fI: ∆n→ ∆ on ∆.

(5) Each constant symbol c corresponds to a constant cI ∈ ∆.

Thus, considering the concept of interpretations, the meaning of first-order logic formulae may be defined in terms of satisfiability as follows.

Definition 2.1.16 (Satisfiability). The satisfiability of a well-formed first-order formula F is defined inductively as follows (CHANG; KEISLER, 1990; SMULLYAN, 1995).

(1) An atomic formula (an n-place predicate) P (t1, ..., tn) is satisfiable under model

A if and only if the interpretation I maps each of its terms t1, ..., tn to constants

cI₁, ..., cI_n∈ ∆ such that the relation PI contains the ordered pair hcI₁, ..., cI_ni.

(2) ¬ϕ is satisfiable under A if and only if ϕ is not satisfiable under model A.

(3) ϕ ∧ ψ is satisfiable under A if and only if both ϕ and ψ are satisfiable under model A.

(4) ϕ ∨ ψ is satisfiable under A if and only if at least ϕ or φ is satisfiable under A. (5) ϕ → ψ is satisfiable under model A if ϕ is not satisfiable or ψ is satisfiable under A. (6) ϕ ≡ ψ is satisfiable under model A if ϕ and ψ are either both satisfiable or both not

satisfiable under A.

(7) ∀x(ϕ) is satisfiable under A if and only if ϕx_k is satisfiable for every k ∈ ∆.

(8) ∃x(ϕ) is satisfiable under A if and only if ϕx_k is satisfiable for at least one k ∈ ∆.

Definition 2.1.17 (Validity). A well-formed first-order formula is valid if and only if it is satisfiable under every possible interpretation in every universe (SMULLYAN, 1995).

2.2 Connection Method

The Connection Method (CM) is a proof method developed simultaneously and independently by Bibel (BIBEL, 1981) and Andrews (ANDREWS, 1981). The method is goal-oriented; it consists of traversing the clauses of the formula closing all possible paths with connections between complementary literals. Bibel’s procedure requires the formulae

(23)

2.2. CONNECTION METHOD 23 to be on the Disjunctive Normal Form (DNF),1 represented in a matrix form (BIBEL, 1987).

Definition 2.2.1 (Disjunctive Normal Form). A formula in DNF is a set C of con-junctive clauses joined by disjunctions, taking the form C0∨ C1_{∨ ... ∨ C}n _{where C}i _are

conjunctive clauses of the form Li,1∧ Li,2∧ ... ∧ Li,m, and Li,j ∈ Ci are literals (HILBERT;

ACKERMANN; LUCE, 1950).

The disjunctive normal form formula is usually represented in a matrix form for the connection method (BIBEL, 1987). Figure 2.1 shows an example of DNF represented as a matrix. DNF Formula (P ∧ Q) ∨ (¬Q ∧ R(x0)) ∨ (¬R(x1)) Matrix " P ¬Q ¬R(x1) Q R(x0) #

Figure 2.1: A DNF formula represented as a matrix.

Definition 2.2.2 (Path). A path is a sequence of literals obtained by visiting exactly one literal of each clause while traversing the matrix of the formula horizontally (BIBEL, 1987).

Figure 2.2 shows all possible four paths through the matrix of Figure 2.1.

Figure 2.2: All possible four paths through an example matrix.

Definition 2.2.3 (Partial-path). A partial path in a formula F is any subset of a path through F (BIBEL, 1987).

Figure 2.3 shows examples of partial paths for the matrix of the example in Figure 2.1.

1_{The reader may refer to (HILBERT; ACKERMANN; LUCE, 1950) for a detailed description of how}

(24)

2.2. CONNECTION METHOD 24

Figure 2.3: Examples of partial paths for a matrix.

A connection may be defined as a set of complementary literals appearing on the proof path (BIBEL, 1983). More formally, a connection is defined by (BIBEL, 1987) as follows.

Definition 2.2.4 (Connection). Let π = L1, ..., Ln be a path of literals. A connection

γ is a subset of the path γ ⊆ π assuming the form γ = {Li, Lj}, where Li and Lj are

complementary literals such that σ(Li) = σ(Lj) for some substitution σ.

For instance, considering the formula (P ∧ Q) ∨ (¬Q ∧ R(x0)) ∨ (¬R(x1)), and a path π = {Q, ¬Q, ¬R(x1)}, a connection can be defined as γ = {Q, ¬Q} ⊆ π, as shown in Figure 2.4 below.

Figure 2.4: Example of connection in a matrix.

Lemma 2.2.1. Finding a connection in a path effectively implies that the formula holds true in that path.

If a path π contains a connection w, then the path contains at least two comple-mentary literals {L, ¬L}. Since the formula is in DNF, every path represents a disjunction. In view of the fact that a complementary disjunction is a tautology, it follows that the path π is a tautology. A formal proof can be found in (BIBEL, 1987). Figure 2.5 provides a better understanding of the tautology.

Definition 2.2.5 (Spanning Connections, Complementary Matrix). A set of connections W is spanning for a matrix F if and only if for each possible path π in F there is a connection w ∈ W such that w ⊆ π. A matrix containing a spanning set of connections is called a complementary matrix (BIBEL, 1987).

(25)

Figure 2.5: A connection in a path makes the path a tautology.

Theorem 2.2.1 (Valid Matrix). A formula represented by a matrix F is valid if and only if the matrix is complementary (BIBEL, 1987).

A model-based proof for Theorem 2.2.1, as well as proofs of soundness and complete-ness are presented in (BIBEL, 1987). Considering Theorem 2.2.1, the proof in connection method consists of finding a set of spanning connections W for the matrix F . Figure 2.6 shows an example of a matrix which is not complementary. The first path {P, ¬Q, ¬R(x1)} does not contain any connections.

Figure 2.6: A non-complementary matrix.

Figure 2.7 shows an example of complementary matrix. The first path contains the connection {P, ¬P }. The second path contains the connection {R(x0), ¬R(x1)} with the substitution σ = {x0/x1}. The third path contains the connection {Q, ¬Q}. The fourth path contains the connection {R(x0), ¬R(x1)}. Since all possible paths contain a connection, the matrix is complementary.

The matrix and the set of paths could also be represented as a tree. However, differently from analytic tableau, Bibel’s procedure works with DNF. Thus the rules of construction of the tree work inversely. With respect to Figure 2.8, the ∧ extends the path to multiple branches while the ∨ gives sequence to the same branch. Also, each connected branch (path) is a tautology. Since all branches are tautologies, the formula itself is a tautology.

(26)

Figure 2.7: A complementary matrix.

(27)

2.2. CONNECTION METHOD 27 2.2.1 Connection Calculus

The connection calculus is an approach that was conceived from the Connection Method. It consists of four inference rules as shown on Figure 2.9 (OTTEN, 2010).

Figure 2.9: Axioms of the Connection Calculus (OTTEN, 2010). Axiom(A) {}, M, P Start(S) C 2_{, M, {}} , M,  and C 2 _{is a copy of C}1_{∈ M} Reduction(R) C, M, P ∪ {L2} C ∪ {L1}, M, P ∪ {L2} with σ(L1) = σ(L2) Extension(E) C 2_\{L 2}, M, P ∪ {L1} C, M, P C ∪ {L1}, M, P

and C2 is a copy of C1∈ M and L2∈ C2 with σ(L1) = σ(L2)

Definition 2.2.6 (Connection Calculus). The rules of the connection calculus are given on Figure 2.9. The words of the calculus are tuples (C, M, P ) where C is the open subgoal, M is the set of clauses, and P is the active path, which is a partial-path as in Definition 2.2.3. C1 and C2 are clauses of M , σ is a substitution, {L1, L2} is a connection with σ(L1) = σ(L2). The rules are applied from the bottom-up (OTTEN, 2010).

A proof in the connection calculus starts with the application of the start rule, to select the starting clause. It is followed by the repeated application of the reduction or extension rules.

The reduction rule reduces the open subgoal C when a complement of a literal in C exists in the path P . In this case, the complementary literal in C is removed and the proof continues.

The extension rule represents a connection of a literal from the open subgoal C with a new clause, deriving two branches: (i) one with the open subgoal C minus the connected literal, (ii) and another with a new open subgoal being the literals of the clause that was connected, minus the connected literal. Both extension and reduction depend on a term substitution σ.

The ultimate goal of the calculus is deriving an axiom in each branch (OTTEN, 2010), which represents a tautology. Figure 2.10 depicts a full proof using the connection calculus. For the example, consider S(n) to be the starting rule applied to the n-th clause; E(n) to be the exension rule applied to the n-th clause without substitution; E(n, σ = {...})

(28)

2.3. DESCRIPTION LOGICS 28 to be the extension rule applied to the n-th clause using the term substitution σ; and A to be the axiom rule. An equivalent proof using matrices is shown on Figure 2.7 above.

Formula (P ∧ Q) ∨ (¬Q ∧ R(x0)) ∨ ¬R(x1) ∨ ¬P Clauses Clause Number P, Q 1 ¬Q, R(x0) 2 ¬R(x1) 3 ¬P 4 Proof

Figure 2.10: A proof in Connection Calculus.

The reader may refer to (BIBEL, 1987; LETZ; MAYR; GOLLER, 1994; OTTEN, 2010) for a proof that the calculus is sound and complete.

2.3 Description Logics

Description Logics (DL) is a set of formalisms for Knowledge Representation (KR). It evolved naturally as a logic-based approach from the need to address the lack of formal semantic characterization of the non-logic approaches such as Semantic Networks (QUILLIAN, 1967) and Frames (MINSKY, 1975). The description logics are mainly used to represent information and knowledge about an application domain, providing a formal foundation for the development of automated reasoners to assist the users in various tasks (BAADER; NUTT, 2010).

Description Logics are based on the notion of concepts, roles, and individuals; these three are the basic building blocks of Description Logics (BAADER; NUTT, 2010). Concepts, roles and individuals may be defined in terms of a model2 A= h∆, Ii, where ∆ is the universe set and I is an interpretation function, as follows.

Definition 2.3.1 (Atomic Concept). An atomic concept A defines a class or subset of objects of the universe ∆, such that the interpretation of a concept is AI ⊆ ∆.

2_{In description logics, the model is usually given as A = h∆}I_,˙I_{i. The notation employed by this work} contains minor differences in order to be aligned with the notation used in the definition of the semantics of First-Order logics.

(29)

2.3. DESCRIPTION LOGICS 29 Definition 2.3.2 (Atomic Role). An atomic role R defines a relation between two indi-viduals of the universe ∆, such that the interpretation of a role is RI ⊆ ∆2_.

Definition 2.3.3 (Atomic Symbol). An atomic symbol is an atomic concept or atomic role (BAADER; NUTT, 2010).

Definition 2.3.4 (Individual). An individual a represents an object of the universe ∆, such that aI∈ ∆.

For instance, considering the universe ∆ of living beings, one may state that P erson is a concept of ∆, thus, P ersonI ⊆ ∆. In order to state that an instance of the universe johnI ∈ ∆ is a person, one may state P erson(john). A relation hasChild may be used to relate two individuals of the universe, such that hasChildI ⊆ ∆2_{. Using this relation,} one might state that john has a child called michael, as follows hasChild(john, michael). The following knowledge base in 2.1 would suffice to express this knowlege.

P erson(john)

P erson(michael)

hasChild(john, michael)

(2.1)

2.3.1 Concept Constructors

The atomic concepts and roles may be combined using concept constructors to form general concept descriptions. For instance, one may define a complex concept that contains all individuals which are at the same time P erson and F emale. This can be represented by the intersection constructor as follows.

P erson u F emale (2.2)

As stated before, DL is a family of formalisms for KR, each of these containing their own set of constructors. Indeed, a description language is ultimately defined by the set of constructors it provides (BAADER; NUTT, 2010). This subsection presents the list of constructors used throughout this work, and their respective semantics. The reader may refer to (BAADER; NUTT, 2010) for a larger list of constructors.

Definition 2.3.5 (Intersection). The intersection constructor A u B represents the set of individuals that are members of both A and B simultaneously. A conjunction has the following interpretation AI∩ BI_.

(30)

2.3. DESCRIPTION LOGICS 30 Definition 2.3.6 (Union). The union constructor A t B represents the set of individuals that are at least members of A or B. The disjunction assumes the following interpretation

AI∪ BI_.

Consider, as an example, the concept defined by the disjunction of M ale and F emale. It may be represented as M ale t F emale.

Definition 2.3.7 (Negation). The negation of a concept3 ¬A represents the complemen-tary set of A, that is, the set of individuals which are not in A, such that its interpretation is ∆\AI.

For instance, one may represent the set of everything that is not a person as ¬P erson.

Definition 2.3.8 (Top, Bottom). The top concept > represents a concept which contains all individuals of the universe. Thus, its interpretation is ∆. The bottom concept ⊥ represents a concept which does not contain any individual. Hence, its interpretation is ∅. Definition 2.3.9 (Value Restriction). The value restriction ∀R.C defines the set of all individuals x such that all elements related to it through a relation R are of the class C. A value restriction has the following interpretation {x ∈ ∆|∀y(x, y) ∈ RI → y ∈ CI}.

For clarification consider the definition ∀hasP et.F ish, it represents the set of all individuals that only own f ish as pets. That is, if a person has a fish and a cat, this person does not belong to this concept.

Definition 2.3.10 (Existential Quantification). The existential quantification4 ∃R.C defines the set of all individuals x such that there is at least one individual y of the class C, related to it through a relation R. The existential quantification assumes the following interpretation {x ∈ ∆|∃y(x, y) ∈ RI∧ y ∈ CI}.

As an example consider the definition ∃hasP et.Cat. This definition includes all individuals having at least one cat as a pet. This definition also includes owners of dogs that also own at least one cat.

Definition 2.3.11 (At-least Restriction). The at-least restriction _{> n.R specifies the set} of all elements x which are related to at least n other elements through R. It assumes the

following interpretation x ∈ ∆ {y|(x, y) ∈ R I ≥ n .

3_{It is worth mentioning here that while the definition is valid for a general concept definition, some}

description languages restrict the usage of the negation constructor to atomic concepts. These negations are referred to as atomic negations.

4_{Note that some description languages restrict the usage of the existential quantification to the limited}

(31)

2.3. DESCRIPTION LOGICS 31 For instance, consider the definition > 3.hasDog. It represents the set of all individuals which own at least 3 dogs.

Definition 2.3.12 (At-most restriction). The at-most restriction6 n.R defines the set of all individuals which are related to at most n other elements through R. The restriction has the following interpretation

x ∈ ∆ {y|(x, y) ∈ R I ≤ n .

Analogous to the previous example, consider the definition 6 3.hasDog. It repre-sents the set of all individuals which own at most 3 dogs.

The constructors presented in this subsection are not used in isolation, they are used to provide complex definitions for other concepts. These definitions comprise the TBox as discussed in the next subsection.

2.3.2 Knowledge Bases

Knowledge Representation systems are concerned with building Knowledge Bases from a set of rules, not only to store knowledge and information, but also to discover new rules, infer concepts and assign relations automatically. A KR system provides intelligent tools to its end users, enabling them to perform complex queries and helping them identify inconsistencies by automating such tasks.

In description logics a Knowledge Base incorporates two components: TBox and ABox. Loosely speaking, the TBox comprehends definitions and restrictions related to concepts and roles.5 The ABox consists exclusively of assertions about individuals, defining which concepts and roles apply to each individual (BAADER; NUTT, 2010). The previous examples in 2.1 are all part of the ABox, they are all making assertions about the individuals.

2.3.2.1 TBox

The TBox of a Knowledge Base contains statements about how concepts and roles are related to each other. These statements are called terminological axioms (BAADER; NUTT, 2010). A formal definition is given below.

Definition 2.3.13 (Terminological Axioms). Let C, D be general concept descriptions (as in section 2.3.1). Then the terminological axioms have the form C v D or C ≡ D. The semantics of the terminological axioms is given with respect to a model A = h∆, Ii, such that the interpretation of C v D is CI ⊆ DI_{, and the interpretation of C ≡ D is C}I _{= D}I

(BAADER; NUTT, 2010).

5_{Some definitions consider a partition of the TBox into TBox and RBox, for more information refer to}

(32)

2.3. DESCRIPTION LOGICS 32 For instance, the inclusion Assistant v P erson u ∃helps.P erson states that every Assistant is a Person that helps a Person. However it does not imply that every person that helps a person is an assistant.

When the Left-Hand-Side (lhs) of an equivalence is an atomic concept, the expression is called a definition. The main purpose of this kind of expression is to provide symbolic names for complex definitions (BAADER; NUTT, 2010).

Consider the following general concept definition W oman u ∃hasChild.P erson, meaning the set of all individuals which are woman and have at least one child which is a Person. It could be given a name as follows: M other ≡ W oman u ∃hasChild.P erson. Hence, the concept M other would contain an individual if and only if it was a W oman and it had at least one relation hasChild with another individual which is a P erson.

2.3.2.2 ABox

The ABox contains assertions defining that individuals belong to certain concept and that pairs of individuals relate through specific roles (BAADER; NUTT, 2010). Definition 2.3.14 (World Description Axioms). Let C denote an atomic concept, R an atomic role, a, b individuals. Then the World Description Axioms have the form C(a) and R(a, b). The semantics of those axioms are given by the following interpretations (C(a))I = aI∈ CI and (R(a, b))I = (aI, bI) ∈ RI (BAADER; NUTT, 2010).

For instance, the assertion P erson(john) states that john is a Person, and the assertion hasChild(john, michael) asserts that John has a child called Michael.

2.3.3 Description Languages

Description Logics is a set of Description Languages, which in turn are distinguished by the set of concept constructors they provide. Since the languages are equipped with different constructors, they have different expressiveness, varying also in complexity. The greater their expressiveness, the higher is their complexity (BAADER; NUTT, 2010). This subsection introduces the description languages pertinent to this work.

2.3.3.1 AL

One of the fundamental description languages is the Attributive Language (AL), which has been introduced in (SCHMIDT-SCHAUSS; SMOLKA, 1991) as a minimal language which allows for checking the coherence of concept descriptions in linear time. Definition 2.3.15 (AL concept descriptions). Let A and B denote atomic concepts, R denote atomic roles, having C and D representing general concept descriptions. The

(33)

2.3. DESCRIPTION LOGICS 33 following Table 2.1 defines the syntax of concept descriptions of AL, summarizing their semantics (SCHMIDT-SCHAUSS; SMOLKA, 1991).

Table 2.1: Semantics of AL expressions Constructor Syntax Semantic Interpretation

Atomic Concept A AI ⊆ ∆I

Top > >I = ∆I

Bottom ⊥ ⊥I = ∅I

Atomic Negation ¬A (¬A)I = ∆I\AI

Intersection C u D (C u D)I= CI∩ DI

Universal Restriction ∀R.C (∀R.C)I = {x0∈ ∆I|∀x1[((x0, x1) ∈ RI) ⊃ (x1∈ CI)]} Limited Existential

∃R.> (∃R.>)I = {x0∈ ∆I|∃x1[(x0, x1) ∈ RI]} Quantification

The AL only provides a negation constructor for atomic concepts, and its existential quantification is limited to the case of > concepts.

2.3.3.2 AL Extensions

AL can be extended by including other expressions into the language (BAADER; NUTT, 2010; SCHMIDT-SCHAUSS; SMOLKA, 1991), increasing its expressiveness. It may be extended by using Union U , Full Existential Quantification E, Complex Complements C and Numeric Constraints N as shown by Table 2.2.

Table 2.2: Semantics of the Language Extensions Name Syntax Semantic Interpretation

U C t D (C t D)I = CI∪ DI E ∃R.C (∃R.C)I = {x0∈ ∆I|∃x1∈ ∆I[((x0, x1) ∈ RI) ∩ (x1∈ CI)]} C ¬C (¬C)I= ∆I\CI N 6 nR (6 nR) I ₌_{a ∈ ∆}I |{b ∈ ∆I_{|(a, b) ∈ R}I_{}| ≤ n} > nR (> nR)I = a ∈ ∆I |{b ∈ ∆ I_{|(a, b) ∈ R}I_{}| ≥ n} 2.3.3.3 ALC

The most influential language for this research was the Attributive Language with general Complements (ALC). ALC can be obtained by using general complements of the form ¬C, instead of restricting the syntax to atomic complements as in AL (BAADER; NUTT, 2010; SCHMIDT-SCHAUSS; SMOLKA, 1991). The only difference between the general complement ¬C and the atomic complement ¬A is that the atomic complement

(34)

2.3. DESCRIPTION LOGICS 34 can only be used with atomic concepts, while the general complement may be used with any general concept description.

Since the general complement can be applied to any concept, ALC also has union through the complement of the intersection as shown by equation (2.3) (BAADER; NUTT, 2010; SCHMIDT-SCHAUSS; SMOLKA, 1991).

C t D = ¬(¬C u ¬D) (2.3)

A full existential quantification can be obtained as well, by negating the value restriction as illustrated by equation (2.4) (BAADER; NUTT, 2010; SCHMIDT-SCHAUSS; SMOLKA, 1991).

∃R.C = ¬∀R.¬C (2.4)

In this context, ALC has the same expresiveness of ALU E, the extension of AL which includes Union and Full Existential Quantification.

2.3.3.4 EL

The EL is a simple language which provides constructors for full existential quantification (∃r.C), conjunction C u D and the top concept > (BAADER; BRANDT; LUTZ, 2008). It uses inclusions and equivalences to describe complex concepts on the TBox (BAADER; MORAWSKA, 2009).

2.3.3.5 EL++

The EL++ extends EL by providing the bottom concept ⊥, nominal definitions of concepts with a single individual {a}, concrete domain values such as strings and integers, role inclusion and composition R1◦ ... ◦ Rkv R, domain restrictions dom(R) v C, range

restrictions ran(R) v C (BAADER; BRANDT; LUTZ, 2008). Table 2.3 provides the semantics for the new constructors.

Table 2.3: Semantics for the new expressions in EL++

Constructor Syntax Semantics

Nominal (a)I {aI}

Role Composition (R1◦ ... ◦ Rkv R)I R1I◦ ... ◦ RIk ⊆ RI

Domain Restriction (dom(R) v C)I RI⊆ CI_{× ∆}I

(35)

2.4. WEB ONTOLOGY LANGUAGE 35 2.3.3.6 Other Languages

There are other Description Languages, however they are out of the scope of this work. The reader may refer to (BAADER; NUTT, 2010) for more information on those languages.

2.4 Web Ontology Language

The Web Ontology Language (OWL) is a language for describing ontologies (HIT-ZLER et al., 2012). It was conceived by World Wide Web Consortium (W3C) working groups,6 from the need to define and represent advanced and expressive knowledge about domains. Its current version, OWL 2, is closely related to Description Logics, providing formal semantics, well-defined syntax and a reasonable balance between expressiveness and efficient reasoning (ANTONIOU et al., 2012).

The formal semantic specification establishes the exact meaning of each language construct, preventing misinterpretations or diverging construals from the same set of data. As detailed in (ANTONIOU et al., 2012; HITZLER et al., 2012), OWL 2 provides constructs for defining classes (concepts), properties (roles) and instances (individuals). Classes can be related by subsumption, equivalence, disjointness, and combined using Boolean connectives. OWL 2 also provides means for determining instance equality, datatype restrictions, domain and range restrictions for roles and data, cardinality restrictions, role transitivity, symmetry, asymmetry, reflexivity, uniqueness and disjointness.

2.4.1 OWL 2 Syntax

There are various syntaxes for describing OWL ontologies (HITZLER et al., 2012), each with its benefits and drawbacks (ANTONIOU et al., 2012). Only one syntax is addressed by this work: the functional-style syntax. The functional-style syntax is a relatively concise syntax and structurally very similar to the formal specification of ontologies (ANTONIOU et al., 2012; MOTIK; PATEL-SCHNEIDER; PARSIA, 2012). It is mainly used for specification purposes, for the implementation of Application Programming Interfaces (APIs) and reasoners (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012). Other notable syntaxes include the Manchester Syntax (HORRIDGE; PATEL-SCHNEIDER, 2012), the XML syntax (MOTIK; PARSIA; PATEL-SCHNEIDER, 2012) and the Turtle syntax.

Expressions (2.5) and (2.6) show the definition of the concept of a Boy, in DL and OWL 2 respectively. The expressions describe a Boy as being a Male and a Child simultaneously.

(36)

2.4. WEB ONTOLOGY LANGUAGE 36

Boy ≡ M ale u Child (2.5)

EquivalentClasses(:Boy ObjectIntersectionOf(:Male :Child)) (2.6)

2.4.1.1 Individuals

The OWL 2 Individuals are analogous to the DL individuals. However, in OWL they can be Named or Anonymous. Named individuals are defined by Internationalized Resource Identifiers (IRIs). Anonymous individuals are defined by a nodeId, which consists of an underscore followed by a colon and then by an internal name. The main difference between Named and Anonymous individuals is that anonymous individuals are visible and meaningful only within their ontology (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012). For instance, the statement ObjectPropertyAssertion( :hasChild :John _:c1) means that John, which is a Named individual, has a child (internally called _:c1) which is an anonymous individual (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012).

2.4.1.2 Classes

Classes in OWL 2 are analogous to DL concepts. A class is also represented by an IRI (DÜRST; SUIGNARD, 2005; MOTIK; PATEL-SCHNEIDER; PARSIA, 2012). The classes can be explicitly declared at the beginning of the ontology or implicitly declared within an expression or rule. The OWL 2 standard provides two built-in classes with predefined semantics:

1. owl:Thing representing the set of all individuals, as > in DL; 2. owl:Nothing representing the empty set, as ⊥ in DL.

Individuals can be associated to Classes with the ClassAssertion statement, as follows: ClassAssertion(:Person :John) which is equivalent to the DL assertion P erson(john).

2.4.1.3 Object Properties

Object Properties are equivalent to DL roles. They represent a relation between individuals (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012). The OWL 2 specification provides two built-in Object Properties:

1. owl:topObjectProperty which connects all possible pairs of individuals, as > v ∀R>.> in DL;

(37)

2.4. WEB ONTOLOGY LANGUAGE 37 2. owl:bottomObjectProperty which does not connect any pair of individuals,

as > v ∀R⊥.⊥ in DL.

Individuals can be related to other individuals through object properties. This relation can be defined by using the ObjectPropertyAssertion statement as shown in (2.7), which is equivalent to the DL expression (2.8).

ObjectPropertyAssertion(:hasChild :John :Michael) (2.7)

hasChild(john, michael) (2.8)

2.4.1.4 Class Expressions

Class Expressions are constructed from classes and property expressions by formally specifying conditions on the aspects and properties of the individuals. The individuals satisfying these conditions are instances of the class expression. Class expressions are also called descriptions, and are the Description Logics equivalent of complex concepts (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012). In OWL 2 it is possible to describe a class by enumerating all of its elements. This can be achieved by using the ObjectOneOf construct (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012).

OWL 2 provides the traditional Boolean constructors and, or, and not, in the form of the expressions ObjectIntersectionOf, ObjectUnionOf, and ObjectComplementOf. A restricted form of existential and universal quantifiers, similar to those of DL, are provided by ObjectSomeValuesFrom and ObjectAllValuesFrom respectively (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012). OWL 2 provides a syntactic shortcut named ObjectHasValue for denoting the set of all individuals which are related to another indi-vidual :a through an object property :ope. In this context, stating ObjectHasValue(:ope :a) is essentially the same as ObjectSomeValuesFrom( :ope ObjectOneOf(:a)) (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012).

Yet another syntactic shortcut named ObjectHasSelf denotes all objects which are connected by some object property to themselves. Thus, stating ObjectHasSelf(:likes) represents the set of individuals which like themselves (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012).

The OWL 2 allows the definition of sets of individuals by the cardinality restric-tion of object properties. For instance, considering the object property :hasChild it is possible to define the set of all individuals which have at least two male children by writing ObjectMinCardinality(2 :hasChild :Male). Likewise, it is possible to use the ObjectMaxCardinality and Object ExactCardinality to describe the sets of individ-uals which have at most two male children and exactly two male children respectively (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012).

(38)

2.4. WEB ONTOLOGY LANGUAGE 38 2.4.1.5 Class Expression Axioms

In OWL 2, relationships between class expressions can be established by using one of the four class expression axioms SubClassOf, EquivalentClasses, DisjointClasses, and DisjointUnion. These axioms can be used to describe subsumptions, equivalences, disjointness and a class composed by a set of disjoint classes, respectively (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012).

2.4.1.6 Object Property Axioms

The object property axioms can be used to describe relationships between object property expressions (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012). The following object property axioms are provided by OWL 2:

1. SubObjectPropertyOf to state that all individuals related by an object property are also related by another object property, as in a property subsumption. 2. EquivalentObjectProperties to state that two object properties are

equiv-alent, meaning that if two objects are related by one property they must be related by the other property as well, and the inverse is also true.

3. DisjointObjectProperties to define that no two individuals can be related by both object properties simultaneously, as in a disjunction of properties. 4. InverseObjectPropeties to define that if two individuals are related by an

object property :ope1 they have to be inversely related by another object property :ope2.

5. ObjectPropertyDomain to restrict the class of the individuals that belong to the domain of an object property.

6. ObjectPropertyRange to restrict the class of the individuals that belong to the range of an object propery.

7. FunctionalObjectProperty to state that an object property behaves as a function, allowing only a single image for each element of the domain.

8. InverseFunctionalObjectProperty to state that each individual of the range of an object property can only be related to a single element of the domain. 9. ReflexiveObjectProperty to define that each individual related to any other

individual by an object property also has to be related to himself by the same object property.

(39)

2.5. RELATED WORK 39 10. IrreflexiveObjectProperty to establish that no individual can relate to itself

by an object property.

11. SymmetricObjectProperty to establish that if an individual x is related to y, y must also be related to x by the same object property.

12. AsymmetricObjectProperty to state that if an object property is asymmetric. That is, if an x is related to y, y cannot be related to x by the same object property.

13. TransitiveObjectProperty to define that an object property is transitive. Thus, if x is related to y and y is related to z, then x is related to z.

2.4.1.7 Keys

A special axiom HasKey can be used to state that each named instance of a class expression is uniquely identified by a set of object property relations and data property expressions. Data property expressions allow concrete domain data such as strings and integers to be represented in OWL 2 (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012). However, they are not addressed in this work. The reader may refer to (MOTIK; PATEL-SCHNEIDER; PARSIA, 2012) for a detailed explanation on all definitions of the OWL 2 semantics.

2.4.2 Profiles

The OWL 2 specification designates subsets of the language, also called language profiles. The main profile, which includes all constructs of the language is called OWL 2 DL. The others are subsets of the OWL 2 DL, varying in expressiveness and restrictions. The Appendix 5 contains detailed comparison tables of the profiles.

Nonetheless, the focus of this work is the ALC subset of OWL 2, as detailed in Table 2.4.

2.5 Related Work

This section briefly introduces and describes related reasoners for Description Logics.

(40)

2.5. RELATED WORK 40 Table 2.4: ALC subset of OWL 2, supported by Raccoon

OWL DL Description

Top and Bottom

owl:Thing > The top concept

owl:Nothing ⊥ The bottom concept

Class Expressions

ObjectComplementOf(:C) ¬C Complement of a concept

ObjectIntersectionOf(:C :D) C u D Intersection of concepts

ObjectUnionOf(:C :D) C t D Union of concepts

ObjectAllValuesFrom(:R :C) ∀R.C Full Universal Restriction ObjectSomeValuesFrom(:R :C) ∃R.C Full Existential Quantification

Class Axioms

SubClassOf(:C :D) C v D Concept Subsumption

EquivalentClasses(:C :D) C ≡ D Concept Equivalence DisjointClasses(:C :D) C v ¬D Concept Disjointness

DisjointUnion(:C :D) C v ¬D, Union of Disjoint Concepts C t D

Assertion Axioms

ClassAssertion(:C :a) C(a) Concept Assertion

ObjectPropertyAssertion(:R :a :b) R(a, b) Role Assertion

NegativeObjectPropertyAssertion(:R :a :b) ¬R(a, b) Negative Role Assertion

2.5.1 Konclude

Konclude7 (STEIGMILLER; THORSTEN; GLIMM, 2014) is a multi-threaded tableau based reasoner for the SR OIQV description language. It was implemented in C++ using the Qt framework,8 hence it is cross-platform, running on Windows, Linux, OSX, Solaris, etc.

Konclude employs various optimizations such as: (i) absorption (HORROCKS; TOBIES, 2000), (ii) lexical normalization, (iii) lazy unfolding, (iv) semantic branching, (v) Boolean constraint propagation, (vi) anywhere blocking, (vii) dependency directed backtracking, (viii) caching of satisfiability status (HORROCKS, 2010). These reductions are out of the scope of this work.

2.5.2 ELK

ELK9 (KAZAKOV; KLINOV, 2015) is a reasoner developed in Java specifically for the OWL EL profile10 multi-threaded reasoner based on saturation. The saturation

7_{http://www.derivo.de/produkte/konclude.html} 8_{https://www.qt.io/}

9_{https://code.google.com/p/elk-reasoner/}

10_{While the authors in (KAZAKOV; KLINOV, 2015) claim that ELK was built for the OWL EL profile,}

they mention that it is designed for the EL description language, but the OWL 2 specification states that the OWL EL profile is based on the EL++ description language (MOTIK et al., 2012).

(41)

2.5. RELATED WORK 41 process is described by the authors of ELK in (KAZAKOV; KLINOV, 2015), and is out of scope of this work.

2.5.3 FaCT++

Just like Raccoon, FaCT++11was designed as a platform for experimenting with dif-ferent reasoning algorithms and optimizations (TSARKOV; HORROCKS, 2006). FaCT++ is single-threaded and handles the SH OIQ description language using various optimiza-tions: (i) lexical normalization, (ii) absorption, (iii) Told Cycle Elimination, (iv) Synonym Replacement, (v) Dependency Directed Backtracking, (vi) Boolean constraint propagation, (vii) Semantic Branching, (viii) Ordering Heuristics, (ix) definitional ordering, (x) Model

Merging, (xi) Completely Defined Concepts, (xii) Clustering. All those optimizations are out of the scope of this work and are further detailed in (TSARKOV; HORROCKS, 2006).

2.5.4 Hermit

Hermit12 is a single-threaded reasoner developed in Java for the SH OIQ+ descrip-tion language (MOTIK; SHEARER; HORROCKS, 2009). It is based on the hypertableau calculi (BAUMGARTNER; FURBACH; NIEMELÄ, 1996). It employs the following optimizations: (i) dependency directed backtracking (TSARKOV; HORROCKS, 2006), (ii) reading classification relationships from concept labels, (iii) caching blocking labels (MOTIK; SHEARER; HORROCKS, 2009). Those optimizations are out of the scope of this work, for further information on those optimizations refer to (TSARKOV; HORROCKS, 2006) and (MOTIK; SHEARER; HORROCKS, 2009).

11_{http://owl.man.ac.uk/factplusplus} 12_{http://hermit-reasoner.com/}

(42)

42 42 42

3

Method and Development

3.1 CM-ALC

This subsection presents CM-ALC, the method used by Raccoon for reasoning. The method is an adaptation of the method presented in (FREITAS, 2011) for reasoning over ALC ontologies using the Connection Method. Different from (FREITAS, 2011), CM-ALC relies on regularity instead of a blocking rule to manage cycles, as further explained later. The original Connection Method was proposed for First-Order Logic, CM-ALC was defined in terms of a semantic mapping of DL constructs to their corrspond-ing First-Order Logic (FOL) representation. Nevertheless, provcorrspond-ing an FOL translation is much slower than reasoning directly with DL (TSARKOV; HORROCKS, 2003). Thus, instead of proving an FOL translation, CM-ALC works directly with DL, employing a Limited Disjunctive Normal Form of the expressions, which is explained on the next subsection.

3.1.1 ALC Disjunctive Normal Form

The conventional DNF comprises a disjunction of conjuncive clauses in the form C0∨ C1∨ ... ∨ Ci, where Ciis the i-th conjunctive clause, assuming the form Li,0∧ Li,1∧ ... ∧

Li,k, where Li,k is the k-th literal of the i-th clause, negated or not as in Definition 2.2.1.

Obtaining DNF from DL formulae for a validational (direct) proof is straightforward. A validational proof consists of showing that KB α holds for all models. This is the same as showing that for all models either KB is false or α is true, hence the entire Knowledge Base KB is negated and the hypothesis α is not (BAAZ; FERMÜLLER; SALZER, 2001). As a consequence, every conjunction in KB will become a disjunction and vice-versa. If an ALC clause Ci is in the form presented by statement (3.1), it will be converted directly to

DNF after the negation of the Knowledge Base KB, resulting on (3.2). A demonstration of this process is presented on Appendix 5.

(43)

3.1. CM-ALC 43

RLi,0u Li,1u ... u Li,mv Li,m+1t Li,m+2t ... t Li,m+n (3.1)

Li,0∧ Li,1∧ ... ∧ Li,m∧ ¬Li,m+1∧ ¬Li,m+2∧ ... ∧ ¬Li,m+n (3.2)

The same happens to a clause Ci containing existential quantifiers on the lhs of the inclusion and universal restrictions on the Right-Hand-Side (rhs) of the inclusion. For instance, the subsumption statement in (3.3) is converted directly to (3.4) when the knowledge base is negated. A demonstration is presented on Appendix 5.

∃ri,0.Li,1 v ∀ri,2.Li,3 (3.3)

ri,0(x, y) ∧ Li,1(y) ∧ ri,2(x, z) ∧ ¬Li,3(z) (3.4)

The general DL form which is directly converted to DNF upon the negation of the Knowledge Base is expressed in (3.5). If the DL statement is not in this form, it has to be normalized before the knowledge base negation takes place. The normalization process is discussed in the following subsection.

( n

u

j=0 Li,j) u ( m

u

j=n+1 ∃ri,j.Li,j) v ( w

t

j=m+1 Li,j) t ( v

t

j=w+1 ∀si,j.Li,j) (3.5)

3.1.2 Normalization of the Knowledge Base

On real ontologies, the knowledge base generally contains statements which are not in the form in (3.5). These statements need to be normalized before the negation of the Knowledge Base takes place, in order to be converted to DNF. The normalization is defined by the rules in Table 3.1. When reading the rules, consider C as an arbitrary conjunction, D as a disjunction, A as anything, N as a new concept, and the ellipses as complex expressions. Appendix 5 provides a description of why the rules are correct and why they do not change the meaning of the resulting ontology.

Definition 3.1.1 (Equivalence Normalization, R1). The rule R1 transforms an equivalence C ≡ D into two simpler inclusions C v D and D v D.

Definition 3.1.2 (Left-Hand-Side Disjunction Normalization, R2). The rule R2 removes disjunctions from the lhs of an inclusion by splitting it into two clauses. If C0t C1v D then C0 alone is a subset of D. The same happens to C1, resulting in two clauses: C0v D and C1v D. Note that C0, C1, D can be complex concepts. If this is the case, those will be further normalized subsequently.

(44)

3.1. CM-ALC 44 Table 3.1: Rules for converting DL statements to DNF.

Rule Original Statement Simplified Statements

R1 C ≡ D C v D D v C R2 C0t C1v D C0v D C1v D R3 C v D0u D1 C v D0 C v D1 R4 (C0u ... u Cn) u (D0t ... t Dm) v A C0u ... u Cnu N v A D0t ... t Dmv N R5 A v (D0t ... t Dm) t (C0u ... u Cn) A v D0t ... t Dmt N N v C0u ... u Cn R6 C v D t ∃r.A C v D t ∃r.N N v A R7 C v D t ∀r.A C v D t ∀r.N N v A R8 C u ∃r.A v D C u ∃r.N v D A v N R9 C u ∀r.A v D C u ∀r.N v D A v N

Definition 3.1.3 (Right-Hand-Side Conjunction Normalization, R3). The rule R3 is analogous to R2 but for the rhs. Since after applying the negation to the Knowledge Base the rhs will be negated, if it is a conjunction it will become a disjunction, and vice-versa. Thus, the conjunctions need to be removed from the rhs. The procedure is similar to that of R2, however since D₀I and D₁I appear on the rhs on the original clause, they have to be

placed on the rhs of the resulting clauses as well.

Definition 3.1.4 (Left-Hand-Side Impurity Normalization, R4). The rule R4 applies the notion of renaming to remove a disjunctive impurity D0t ... t Dm from the lhs of an

inclusion. This is accomplished by creating a new literal N to replace the disjunction on the original clause. A new clause is created stating that it is possible to obtain N from the disjunction D0t ... t Dm, i.e., the disjunction is a subset of N .

Definition 3.1.5 (Right-Hand-Side Impurity Normalization, R5). The rule R5 is analogous to R4, but for the rhs. The impurity in this case is a conjunction, because after the Knowledge Base is negated it will become a disjunction and will not be in the format stated in (3.5). This rule replaces the impure conjunction C0u ... u Cn by a new literal N ,

and adds a new clause stating that N v C0u ... u Cn.

Definition 3.1.6 ( Right-Hand-Side Complex Existential Normalization, R6). The rule R6 is suitable to normalize complex concept descriptions inside full existential quantifiers in the form C v D t ∃r.A, where C, A are complex concept descriptions and D is a