Revision of Boolean Logical Models of Biological Regulatory Networks using Answer-Set Programming

(1)

DEPARTMENT OF COMPUTER SCIENCE

FREDERICO PINTO ALEIXO Bachelor in Computer Science

REVISION OF BOOLEAN LOGICAL MODELS OF BIOLOGICAL REGULATORY NETWORKS USING ANSWER-SET PROGRAMMING

MASTER IN COMPUTER SCIENCE

NOVA University Lisbon

(2)

COMPUTER SCIENCE

REVISION OF BOOLEAN LOGICAL MODELS OF BIOLOGICAL REGULATORY NETWORKS USING ANSWER-SET PROGRAMMING

FREDERICO PINTO ALEIXO Bachelor in Computer Science

Adviser: João Alexandre Carvalho Pinheiro Leite

Associate Professor, NOVA University Lisbon

Co-adviser: Matthias Knorr

Assistant Professor, NOVA University Lisbon

MASTER IN COMPUTER SCIENCE

(3)

Revision of Boolean Logical Models of Biological Regulatory Networks using Answer-Set Programming

The NOVA School of Science and Technology and the NOVA University Lisbon have the right, perpetual and without geographical boundaries, to file and publish this dissertation through printed copies reproduced on paper or on digital form, or by any other means known or that may be invented, and to disseminate through scientific repositories and admit its copying and distribution for non-commercial, educational or research purposes, as long as credit is given to the author and editor.

(4)

(5)

A c k n o w l e d g e m e n t s

I would like to start by thanking my supervisors, Professor João Leite and Professor Matthias Knorr. Your guidance, insights and readiness to assist were invaluable in these months. You made me feel like I could count on you to steer me in the right direction come what may, and you were consistently available to help whenever I reached out. I realize how much more arduous this journey could have been without such counsel, and am grateful that because of you it was not so.

A big thank you to FCT/UNL, especially to the Department of Computer Science, for the work they do in shaping their students. The knowledge, confidence, tenacity and sense of duty they have instilled in me for these past years are a big part of what made this work possible.

I’d like to thank my friends Afonso and Bruno, who have been with me for most of my time in this course and who I could always count with. You were a constant reminder that nobody has to be alone in their undertakings, whatever they may be, and that it is by depending on others and letting others depend on us that we can achieve more than we could ever achieve by ourselves.

I’d also like to thank my friend Manel, for all the moments we shared during these past few months. You are proof that meaningful bonds can be forged from where we would least expect them to.

A special thanks to Rafa, Canas and Filipe who have stuck with me for all these years.

Hopefully, next year’s lunch will be nearer than half a country’s distance away.

To my friend Tiago, thank you for your support and for all the talks we had. What- ever the future awaits, good or bad, I’ll gladly join you to spend yet another afternoon complaining about it. May we share many more such afternoons.

To Mr António, thank you for all you did for me and for the invaluable knowledge, wisdom and advice you imparted onto me at one of the most important stages of my life.

All that hard work did pay off. Here we finally are.

And of course, a huge thank you to my family, for providing me with everything I needed to be where I am today.

Thank you, and may you have great success in all your endeavours!

(6)

Troly-Curtin)

(7)

A b s t r a c t

Biological regulatory networks are one of the most prominent tools used to represent complex, regulatory cellular processes. Creating computational models of these networks is key to better comprehend the corresponding cellular processes, as they allow for the reproduction of known behaviors, the testing of hypotheses, and the identification of predictionsin silico. However, given that the process of constructing and revising such models is mainly a manual one, it is prone to error, and would therefore benefit from automation.

An attempt at solving this problem has already been made using a mixture ofAnswer Set Programming (ASP)and C++. The previous attempt automated the process of revising these models, by using ASP to verify whether a Boolean logical model of a biological regulatory network was consistent with a given set of experimental observations and, in case of inconsistencies, used C++ to implement an algorithm capable of searching for possible sets of repair operations to render the model consistent.

In our work we propose an alternative solution for this problem, a solution that fully leveragesASPwhich, being a declarative language tailored for this type of difficult search problems, has demonstrated to be a great tool to use both for consistency checking as well as model repair. This is in view of the fact thatASP offers a more intuitive and elaboration-tolerant programming style, which facilitates the processes of understanding, and modifying the code behind the model revision process. This, coupled with the powerful and exhaustively optimized solving capabilities provided by the state of the art ASPsystemclingo, has shown that there is great potential in adopting a fullyASP-based approach to aid in the automation of the revision of Boolean logical models.

In this thesis we present the tool that we have developed to automate the process of revising Boolean logical models ofBiological Regulatory Network(s) (BRN), which uses ASPto search for inconsistencies and perform repairs on these models.

Keywords:Answer Set Programming, (A)Synchronous Dynamics, Boolean Logical Mod- els, Model Revision, Regulatory Networks, Stable States

(8)

As redes reguladoras biológicas são das ferramentas mais proeminentes usadas para repre- sentar processos celulares regulatórios complexos. A criação de modelos computacionais destas redes é fundamental para entender melhor os processos celulares corresponden- tes, pois permitem reproduzir comportamentos conhecidos, testar hipóteses e identificar previsõesin silico. Porém, dado que o processo de construção e revisão destes modelos é principalmente manual, torna-se propenso a erros e, logo, beneficiaria de automação.

Já foi feita uma tentativa de resolução deste problema usando uma mistura de Pro- gramação por Conjuntos de Resposta (ASP) com C++. A tentativa anterior automatizou o processo de revisão destes modelos, usando ASPpara verificar se um modelo lógico booleano de uma rede regulatória é consistente com um determinado conjunto de ob- servações experimentais e, caso inconsistências se verifiquem, é utilizado um algoritmo desenvolvido em C++ capaz de encontrar possíveis conjuntos de operações de reparo para tornar o modelo consistente.

No nosso trabalho, propomos uma solução alternativa para este problema, que tira completo partido da utilizaçãoASPque, sendo uma linguagem declarativa adaptada a este tipo de problemas de busca difíceis, demonstrou ser uma excelente ferramenta a utilizar tanto para a verificação da consistência como para a reparação de modelos. Tal deve-se ao facto deASPoferecer um estilo de programação mais intuitivo e tolerante à elaboração, o que facilita os processos de compreensão, e a modificação do código por detrás do processo de revisão de modelos. Isto, juntamente com as poderosas e otimizadas capacidades de resolução de problemas de busca oferecidas pelo sistemaASPde última geração clingo, demonstrou que existe um grande potencial na adopção de um sistema totalmente baseado emASPpara ajudar na automatização da revisão destes modelos.

Nesta tese apresentamos a ferramenta que desenvolvemos para automatizar o processo de revisão de modelos lógicos booleanos de redes reguladoras biológicas (BRN), que utilizaASPpara procurar inconsistências e efectuar reparações nestes modelos.

Palavras-chave:Programação por Conjuntos de Resposta, Dinâmica (As)Síncrona, Mode- los Lógicos Booleanos, Revisão de Modelos, Redes Regulatórias, Estados Estáveis

(9)

C o n t e n t s

List of Figures xi

List of Tables xiii

List of Listings xvi

Glossary xvii

Acronyms xviii

Symbols xix

1 Introduction 1

2 Background 4

2.1 Answer Set Programming. . . 4

2.1.1 What is ASP?. . . 4

2.1.2 Logic Programs . . . 5

2.1.3 An Overview of Answer Sets . . . 5

2.1.4 Additional ASP Constructs. . . 7

2.1.5 Clingo . . . 8

2.2 Biological Regulatory Networks . . . 13

2.2.1 Different Modeling Types . . . 15

2.2.2 Logical Models . . . 16

2.2.3 Dynamics . . . 21

2.3 Boolean Regulatory Functions . . . 23

3 Related Work 25 3.1 Non-ASP-based approaches in Systems Biology . . . 25

3.1.1 Network and Model Inference . . . 25

3.1.2 Model Analysis . . . 27

(10)

3.1.3 Model Revision . . . 30

3.2 ASP in Systems Biology . . . 31

3.2.1 Network and Model Inference . . . 31

3.2.2 Model Analysis . . . 32

3.2.3 Model Revision . . . 34

3.3 Relevant Tools . . . 35

4 Model Revision 37 4.1 Format of the Inputs . . . 37

4.1.1 Encoding Boolean Logical Models. . . 37

4.1.2 Encoding Observations . . . 39

4.2 Consistency Checking. . . 41

4.2.1 Overview . . . 41

4.2.2 Encoding Stable State Consistency . . . 42

4.2.3 Encoding Time Series Consistency . . . 47

4.3 Repairing . . . 52

4.3.1 Overview . . . 52

4.3.2 Encoding Stable State Repairs . . . 63

4.3.3 Encoding Time Series Repairs . . . 68

5 Assessment 77 5.1 Instances . . . 77

5.2 Experiments . . . 79

5.2.1 Stable State Observations. . . 81

5.2.2 Time-series Observations. . . 89

5.3 Discussion . . . 98

5.3.1 Results Analysis . . . 98

5.3.2 Comparisons with ModRev . . . 100

6 Conclusion 104 6.1 Future Work . . . 105

6.1.1 Function Synthesis . . . 105

6.1.2 Explainable Repairs . . . 106

6.1.3 Additional Interaction Schemes . . . 106

6.1.4 Additional Optimization Criteria . . . 106

Bibliography 107

Appendices

A Results 116

(11)

L i s t o f F i g u r e s

2.1 Basic ideas behind ASP [24]. . . 4 2.2 General process of automated ASP program solving. . . 8 2.3 Result of placing queens on a 4×4 board. . . 11 2.4 The process of constructing and revising models for regulatory systems. . . 14 2.5 Example of a regulatory graph. . . 15 2.6 Example of a Boolean logical model. . . 17 2.7 Example of an interaction in a generalised logical model. . . 18 2.8 Example of how regulatory functions are selected in the nodes of a Probabilis-

tic Boolean Network (PBN). . . 19 2.9 Example of a Sign Consistency Model, wherev₁, v₂andv₃are the observed

nodes, withv₁being an input node. . . 20 2.10 Boolean model for the State Transition Graph (STG) example. . . 22 2.11 STG of the model in Figure 2.10, considering a synchronous updating scheme.

The left image has input nodev₄ inactive, whereas the right image has that node active [32]. . . 22 2.12 STG of the model in Figure 2.10, considering an asynchronous updating

scheme. The left image has input nodev₄ inactive, whereas the right image has that node active.. . . 23 4.1 Boolean logical model defined using the standard notation. . . 38 5.1 Revision time of each family of corrupted models, under stable state observa-

tions. . . 82 5.2 Impact of the original regulator number in function repair times under stable

state observations (Th). . . 86 5.3 Impact of the original function node number in function repair times under

stable state observations (Th). . . 87 5.4 Impact of the regulator number in function repair times, under stable state

observations. . . 87

(12)

5.5 Impact of the node number in function repair times, under stable state observations. . . 88 5.6 Impact of the various repairs applied to the inconsistent functions, under

stable state observations. . . 91 5.7 Revision time of each family of corrupted models, under synchronous obser-

vations (5 experiments, 20 timesteps). . . 93 5.8 Impact of regulator number in function repair times, under synchronous ob-

servations (5 experiments, 20 timesteps). . . 94 5.9 Impact of node number in function repair times under, synchronous observa-

tions (5 experiments, 20 timesteps). . . 94 5.10 Impact of the various repairs applied to the inconsistent functions, under

synchronous observations (5 experiments, 20 timesteps).. . . 95 5.11 Revision time of each family of corrupted models, under asynchronous obser-

vations (5 experiments, 20 timesteps). . . 96 5.12 Impact of regulator number in function repair times, under asynchronous

observations (5 experiments, 20 timesteps). . . 97 5.13 Impact of node number in function repair times under, asynchronous observa-

tions (5 experiments, 20 timesteps). . . 97 5.14 Impact of the various repairs applied to the inconsistent functions, under

asynchronous observations (5 experiments, 20 timesteps). . . 98

(13)

L i s t o f T a b l e s

2.1 Right-leaning enumeration. . . 12

2.2 Left-leaning enumeration. . . 12

3.1 Noteworthy non-ASP tools. . . 35

3.2 Noteworthy ASP tools. . . 35

5.1 Boolean models used for evaluation. Displayed is a model’s abbreviation (Abbr.), the model’s name (Model), its number of compounds (#C), number of compound interactions (#I), number of stable states (#SS), average number of regulators (Avg.Reg.), maximum number of regulators (Max.Reg.), and associated reference (Ref.). . . 78

5.2 The 24 different corruption configurations used for evaluation. Displayed is an ID for the configuration (Config. #), and the probability (0%-100%) of applying each of the four different corruption operations: (F)unction change, (E)dge sign flip, (R)emove a regulator, and (A)dd a regulator. . . 80

5.3 Revision process statistics under stable state observations. . . 81

5.4 Revision process times for each family of models, under stable state observations. . . 82

5.5 How function repair time varies with the number of regulators in the original function, under stable state observations. . . 84

5.6 How function repair time varies with the number of nodes in the original function, under stable state observations. . . 85

5.7 Number of times new regulators were added to functions, under stable state observations. . . 85

5.8 Number of times regulators were removed from functions, under stable state observations. . . 86

5.9 Number of times regulator signs were changed in functions, under stable state observations. . . 88

5.10 Number of times regulators were added to function nodes, under stable state observations. . . 89

(14)

5.11 Number of times regulators were removed from function nodes, under stable state observations. . . 90 5.12 Impact of the experiment size and number in revision process times, under

synchronous observations (Th). . . 92 5.13 Revision process statistics, under synchronous observations (5 experiments,

20 timesteps). . . 92 5.14 Revision process times for each family of models, under synchronous observa-

tions (5 experiments, 20 timesteps). . . 92 5.15 Revision process times for each family of models, under asynchronous obser-

vations (5 experiments, 20 timesteps). . . 96 5.16 Revision process statistics, under asynchronous observations (5 experiments,

20 timesteps). . . 97 5.17 Comparison of model revision success and revision times between our ap-

proach and ModRev, for each family of models (for time series observations, comparisons were made using the complete set of observations, with 5 experiments and 20 timesteps). . . 102 A.1 Impact of the experiment size and number in revision process times, under

synchronous observations (MCC). . . 116 A.2 Impact of the experiment size and number in revision process times, under

synchronous observations (FY). . . 117 A.3 Impact of the experiment size and number in revision process times, under

synchronous observations (TCR). . . 117 A.4 Impact of the experiment size and number in revision process times, under

synchronous observations (SP). . . 118 A.5 How function repair time varies with the number of regulators in the original

function, under synchronous observations (5 experiments, 20 timesteps). . 119 A.6 How function repair time varies with the number of nodes in the original

function, under synchronous observations (5 experiments, 20 timesteps). . 120 A.7 Number of added/removed nodes from repaired functions, under synchronous

observations (5 experiments, 20 timesteps). . . 120 A.8 Number of times new regulators were added to functions, under synchronous

observations (5 experiments, 20 timesteps). . . 121 A.9 Number of times regulators were removed from functions, under synchronous

observations (5 experiments, 20 timesteps). . . 121 A.10 Number of times regulator signs were changed in functions, under synchronous

observations (5 experiments, 20 timesteps). . . 122 A.11 Number of times regulators were added to function nodes, under synchronous

observations (5 experiments, 20 timesteps). . . 123 A.12 Number of times regulators were removed from function nodes, under syn-

chronous observations (5 experiments, 20 timesteps).. . . 124

(15)

L I S T O F TA B L E S

A.13 Impact of the experiment size and number in revision process times, under asynchronous observations (MCC). . . 125 A.14 Impact of the experiment size and number in revision process times, under

asynchronous observations (FY).. . . 125 A.15 Impact of the experiment size and number in revision process times, under

asynchronous observations (TCR). . . 126 A.16 Impact of the experiment size and number in revision process times, under

asynchronous observations (SP). . . 126 A.17 Impact of the experiment size and number in revision process times, under

asynchronous observations (Th).. . . 127 A.18 How function repair time varies with the number of regulators in the original

function, under asynchronous observations (5 experiments, 20 timesteps). 128 A.19 How function repair time varies with the number of nodes in the original

function, under asynchronous observations (5 experiments, 20 timesteps). 129 A.20 Number of times new regulators were added to functions, under asynchronous

observations (5 experiments, 20 timesteps). . . 129 A.21 Number of times regulators were removed from functions, under asynchronous

observations (5 experiments, 20 timesteps). . . 130 A.22 Number of times regulator signs were changed in functions, under asynchronous

observations (5 experiments, 20 timesteps). . . 130 A.23 Number of times regulators were added to function nodes, under asynchronous

observations (5 experiments, 20 timesteps). . . 131 A.24 Number of times regulators were removed from function nodes, under asyn-

chronous observations (5 experiments, 20 timesteps).. . . 132

(16)

2.1 Defining then×nboard and placingnqueens on it. . . 9

2.2 Result of running the encoding of Listing 2.1 on a 2×2 board. . . 10

2.3 Encoding ofn-queens completed with the missing problem constraints. 10 2.4 Result of running the completed encoding of Listing 2.3 on a 4×4 board. 11 2.5 Final optimized encoding ofn-queens. . . . 13

4.1 Listing with the model in Figure 4.1 represented in ASP. . . 39

4.2 Listing for model 4.1’s synchronous observations with 111 as the initial state. . . 40

4.3 Listing representing model 4.1’s 011 stable state. . . 41

4.4 Listing for the stable state consistency encoding. . . 46

4.5 Listing for synchronous consistency. . . 48

4.6 Listing for asynchronous consistency. . . 50

4.7 Listing for model 4.1’s encoding, fixing some regulators. . . 58

4.8 Listing for repairs under stable observations. . . 65

4.9 Listing for repairs under synchronous observations. . . 69

4.10 Listing for repairs under asynchronous observations. . . 73

(17)

G l o s s a r y

attraction basin The attraction basin of an attractor is the set of all network states that lead to that attractor.23,33

hypergraph A directed and signed hypergraphH = (V , E) is a generalization of a directed and signed graphG= (V , A), whereV is the set of nodes andEthe set ofhyperedges. While edges inAconnect pairs of nodes a, b∈V,hyperedgesinEconnect pairs of sets of nodesS, T ⊆V.31 implicant A conjunction of literalsP is an implicant of a Boolean function F,

denotedP ≤F, ifPimplies F (i.e., wheneverPtakes the value 1 so doesF).24

in silico Anin silicoexperiment is an experiment performed on computer or by computer simulation.vii,viii,1,14

model reduction Model reduction is the act of decreasing the computational complexity of a model, while preserving some important properties of the original model.29,30

prime implicant A prime implicant of a function is an implicant which cannot be con- densed any further with other terms in the expression.24

systems biology Systems biology is a scientific approach that combines the principles of engineering, mathematics, physics, and computer science with ex- tensive experimental data, with the goal of analysing and modeling complex biological systems.1,2,3,25

(18)

ASP Answer Set Programmingvii,viii,xi,xvi,2,3,4,5,7,8,13,25,31,32,33,34,35, 36,37,38,39,41,45,100,101,103,104,105

BCF Blake Canonical Form23,24,42,62 BDD Binary Decision Diagrams28,29

BN Boolean network(s)16,28,29,31,32,33

BRN Biological Regulatory Network(s)vii,viii,1,2,3,4,13,14,15,16,21,25,27,32 DNF Disjunctive Normal Form23,31,34

ILP Integer Linear Programming27

PBN Probabilistic Boolean Networkxi,18,19 PKN Prior Knowledge Network(s)31,32 SAT Boolean satisfiability problem28,29 SCM Sign Consistency Model19,20,30,34 SMT Satisfiability Modulo Theories27,28 STG State Transition Graphxi,21,22,23,29,32 SVM Support Vector Machines26

(19)

S y m b o l s

B the set {0,1}17,19,21

Bⁿ then-dimensional cartesian product of the setB17

V ann-dimensional set of variables representing the regulatory compounds of a net- work14,17,18,19

(20)

(21)

1

I n t r o d u c t i o n

The field ofsystems biologyhas been flourishing for the last two decades. It focuses on making the most of the principles of engineering, mathematics, physics, and computer science to model real-life biological systems, in an attempt to understand them through a holistic lens that allows us to gain new insight about these systems in ways that were simply not possible before [70]. One of the cornerstones of this field is the ability to model real-world biological systems accurately, with the ultimate goal of acquiring a better understanding of the complex processes that take place in cells, as doing so may potentiate new discoveries and theories about living organisms. By using Biological Regulatory Network(s) (BRN)to represent such processes, we are able to computationally generate models that allow for the emulation of the patterns and behaviors described by the real-world systems, as well as the testing of hypotheses and the identification of predictionsin silico[36].

The creation of these models has already been extensively explored, as we will see in Chapter3, with many distinct algorithms proposed, tried and tested, and tools developed that expedite this process [58, 63, 51, 66, 81]. Moreover, the analysis of regulatory models is also a well explored topic, and there exist many alternatives that can assist and automate it, thus minimizing human error [48,73,14,38,2] (we shall discuss these and other approaches in greater detail when analysing the related work). However, there is still a great need for automation in model revision, as it relies too much on manual intervention. This makes that process cumbersome, slow and error-prone.

The process of revising models of regulatory networks is an essential one, given that, as these models are updated or new experimental data is gathered, it is imperative to assure that the models remain consistent with newly-obtained experimental data. In- consistencies must be identified and corrected, in order to uphold this demand. This is often times a manual process carried out by domain experts, but this approach is far from efficient. Because there can be a substantial number of different options when it comes to building or revising such models, this means we have a non-trivial combinatory problem on our hands, much too difficult to be solved optimally solely through manual means.

Moreover, the models themselves can generate incredible amounts of dynamical behavior,

(22)

which further complicates the task of revision, as this generated behavior must not deviate from the behavior observed in the data of the real-world system it is attempting to model.

If tools are created that can automate model revision, then biologists could use them to achieve their goals faster, with more correctness, and with a more optimal allocation of human resources. As such, model revision is an area ofsystems biologythat has still much ground to cover, and the development of new tools focused on automating processes can greatly enhance the quantity and quality of the results obtained by biologists [64].

In our work we have produced one such tool, by usingAnswer Set Programming (ASP) to identify inconsistent Boolean logical models of biological networks and search for repairs that will render them consistent, in an automated matter. BecauseASPis a form of declarative programming oriented towards difficult search problems, by developing an ASP-based solution we were able to facilitate the process of understanding and performing changes to the underlying mechanisms that govern how such models are verified and repaired. This is not only due to the declarative and intuitive nature ofASPencodings, but also due to elaboration tolerance [47] being a part of its essence, which permits for the rules that govern those mechanisms to be easily altered. Moreover, by leveraging the state of the artASPsystemclingo, it was possible to create efficient encodings that allow for both consistency checking and model repair, thanks to the exhaustively optimized underlying architecture of this system.

As we will see in greater detail in Subsection3.2.3, while several approaches to automate model revision have been proposed throughout the years, to our knowledge no solution has been proposed that manages to fully leverageASPand the level of expression allowed by Boolean logical models to perform such revisions. Attempts that have automated the task of model revision either do so using less expressive modeling techniques [23,50], have run into scenarios where revisions cannot be accurately obtained [52], or did not consider all the possible Boolean functions when repairing inconsistent models [44]. An attempt to useASPto automate model revision in Boolean logical models was previously made, which culminated in the creation of theModRev[32] tool. WhileASP was used to check the consistency of models, C++ was used to perform the repairs.

The approach we present here is a highly-flexible, easy to understand and efficient manner of revising Boolean logical models. It is able of working with three distinct types of observations (stable state, synchronous, and asynchronous), and always provides optimal repairs when dealing with inconsistent models. Moreover, it offers the possibility of greatly customizing the optimization criteria used when repairing functions, which further expands the utility of the tool. The fact that a declarative language was used to implement the processes of consistency checking and repairing makes it so that biologists should have a much easier time reading, checking the correctness of, and modifying this solution than they would have with a more imperative, less transparent approach.

In order to grasp howASPhas already been employed in the field ofsystems biology to tackle the creation, analysis and revision ofBRNs, we will be exploring in some detail the work done in each of these three branches of the field. We will start by looking at

(23)

solutions that do not leverageASP, in Section3.1. Then, we will see howASPwas used to make up for some of the limitations of these solutions, as well as allow for some issues to be tackled more efficiently than previous approaches, in Section3.2. Subsection3.2.3 presents some of the past work on model revision usingASP, with a special focus on the shortcomings of these approaches.

Before doing so, we will start by discussingASPandBRNs in detail in Chapter2, as doing so is crucial to better comprehend the essential concepts and ideas surrounding these two cornerstones of the work we present. We then delve into the related work in Chapter3to see the impact thatASPhas had in the realm ofsystems biology. We discuss the inner workings of our solution in Chapter4, by going over the encodings used for consistency checking, repairs, and how they are seamlessly brought together by Python to create a tool capable of performing model revision. Then, in Chapter5, we will test our solution and discuss the observed results, as well as compare it to ModRev. Lastly, in Chapter6, we will succinctly highlight the features of the developed tool, the main observable conclusions and present some possible future directions in which this work could be taken.

(24)

2

B a c k g r o u n d

In the following sections we will review some of the essential concepts required to understand our problem. We will start by exploringAnswer Set Programming (ASP)and seeing how we can use clingo to encode and solve a problem of some complexity, with varying degrees of optimization, to showcase how different encodings of the same problem can vary greatly in terms of efficiently computing results, as well as demonstrate clingo’s powerful modeling and solving capabilities. Then, we will explain what aBiological Reg- ulatory Network(s) (BRN)is, how we can model it and how we can leverage its dynamics to look for interesting properties and behaviors, as they play a crucial role in ensuring correct model revisions.

2.1 Answer Set Programming

2.1.1 What is ASP?

ASPis a form of declarative programming oriented towards difficult, primarily NP-hard, search problems [45]. It has become an attractive paradigm for knowledge representation and reasoning, thanks to its appealing combination of rich, yet simple modeling languages with powerful solving engines [27]. It is based on the stable model (or answer set) semantics of logic programming [29].

The basic idea behindASPis to provide a specification of our problem by encoding it as a set of rules. Using that encoding, one or more answer sets are computed, which we then interpret in order to obtain our solution(s). Figure2.1illustrates this process.

modeling

problem logic program computation answer set(s) interpretation solution(s)

Figure 2.1: Basic ideas behind ASP [24].

In the next few sections we will cover how alogic programis defined, how it is then processed and solved, and lastly how we can use the ASPsystemclingoto encode one such program, using multiple different modelling techniques which all produce the same answer sets, but that vary greatly in performance.

(25)

2 . 1 . A N S W E R S E T P R O G R A M M I N G

2.1.2 Logic Programs

A logic program can be defined as a finite set of rules in the form of the following ruler:

a₀←a₁, ..., a_m, not a_m+1, ..., not a_n.

where eacha_i( 0≤i≤n, 0≤m≤n) is named anatom. An atom takes the formp(t₁, ..., t_k), wherepis a predicate symbol of arityk, andt₁, ..., t_k are terms, built from variables and constants of the (implicit) language of the program ofr. Atoms to the left of the arrow (←) are said to be the head of the rule, whereas (negated) atoms to the right of the arrow are the body of the rule. We define as literals both an atomaas well as its negation,not a.

We say that the head ofrisa₀(head(r) =a₀), and the body ofris the set of all literals that occur in its body (also known as body literals) (body(r) ={a₁, ..., a_m, not a_m+1, ..., not a_n}).

The intuitive reading of ruleris thathead(r)must be true ifbody(r)holds (i.e.head(r)must be true if every body literal inbody(r)is true). The commas separating the expressions in the body are similar to the conjunction symbol∧(“and”) in logical formulas [27].

This formalism allows us to write rules whose body can never be true, which is especially useful when we do not want a given condition to happen. This type of rule is named anintegrity constraint, and is written as such:

←a₁, ..., a_m, not a_m+1, ..., not a_n.

where the head of the rule is intentionally left blank, signifying that it is false (⊥). If the head is false then, for the rule to be satisfied, the body of the rule can never be true.

We can also leave the body of a rule blank, signifying that it is always true. If the body is always true (⊤), then so is the head of the rule. Such rules are calledfacts, and can be represented by a rule with no body (in which case the←can simply be omitted):

a₀.

Lastly, we define asbody⁺(r) the set of positive body literals of ruler, and asbody⁻(r) the set of negative body literals. Ifbody⁻(r_k) =∅applies for every rulekof a program, then we say that the program is positive.

2.1.3 An Overview of Answer Sets

As we have seen in section2.1.1, ASPprograms can have one or more answer sets (or stable models). To better comprehend what an answer set is, let us take a look at the following example. Consider the following logical formula:

q∧(q∧ ¬r→s)

This formula has three (classical) models:{q, s},{q, r}and{q, r, s}(in simple terms, a model of a formula is a set of atoms that make the formula evaluate to true).

(26)

Now, we will take our formula and turn it into a logic program:

q. s←q, not r.

The above program contains one answer set:{q, s}. Informally, a set of atoms is an answer set of a logic program if it is one of the (classical) models of that program, and if all atoms in the set are justified by some rule in the program. The atoms in the answer set{q, s}are justified because, according to the first rule in the program,qmust always be true. In the second rule, since we have no way of justifyingr, we consider negation by omission and assume it to be false. Ifqis true andris false, then the body of the second rule is true, and thus we can justifys.

While this example is not difficult to understand, more interesting programs will involve more complex predicates, which will usually accept one or more terms (predicates with arity greater than zero). Let us take a look at a more complex example using predicates of higher arities, in programPbelow:

q(1,2). r(X)←q(X, Y), not r(Y).

Here,qandrare predicates of arity two and one, respectively. In order to understand how we can obtain the answer set(s) of the above programP, let us first take a look at Gelfond and Lifschitz’s formal definition of stable model (answer set) [29].

Definition 2.1.1(Stable Model). LetΠbe a logic program. We assume that each rule containing variables is replaced by all its ground instances, so that all atoms in Πare ground. For any set M of atoms from Π, letΠ_M be the program obtained from Π by deleting:

(i) each rule that has a negative literalnot Bin its body withB∈M;

(ii) all negative literals in the bodies of the remaining rules.

Clearly, Π_M is negation-free. IfΠ_M has at least one Herbrand model, then Π_M has a unique minimal Herbrand model. If the unique minimal Herbrand model coincides with M, then we say thatM is a stable set ofΠ.

With this definition in mind, let us try to calculate the answer sets of the program above. First, we must ground it. Grounding is the process of replacing the variables in the program with ground terms. In our particular example, the two ground terms are 1 and 2. The result of this process would be:

q(1,2). r(1)←q(1,1), not r(1).

r(2)←q(2,1), not r(1). r(2)←q(2,2), not r(2). r(1)←q(1,2), not r(2).

Then, we need to test with different setsMin order to find the stable models. Let us start by considering thatM={r(2)}. In this case,Π_Mwould be:

q(1,2). r(1)←q(1,1). r(2)←q(2,1).

(27)

The minimal Herbrand model of this program is{q(1,2)}. The Herbrand models of a program are the models whose universe is the set of ground terms of the language of that program, and whose object and function constants are interpreted in such a way that every ground term denotes itself [29]. For instance, in the above program each Herbrand model would contain a combination of the atomsq(1,2), r(1), q(1,1), r(2) and q(2,1). Of these atoms, the ones that occur in every single Herbrand model form the minimal Herbrand model. Sinceq(1,2) is the only atom that is always true and, therefore, must be included in every model of the program, then we know that the minimal Herbrand model is{q(1,2)}. Since{q(1,2)}is different fromM={r(2)},M is not stable and we must try with a different setM.

If we test withM={q(1,2), r(1)}, we obtain the followingΠ_M: q(1,2). r(1)←q(1,2). r(2)←q(2,2).

Here, the minimal Herbrand model is {q(1,2), r(1)}, and therefore{q(1,2), r(1)} is a stable model of the initial programP. It is, in fact, the only answer set ofP.

2.1.4 Additional ASP Constructs

Now that we have a grasp of some of the basics ofASP, it is opportune that we take a look at some more language constructs that will be useful in our encodings.

2.1.4.1 Choice Rules

One such construct is a choice rule. A choice rule is of the form {a₁, ..., a_m} ←a_m+1, ..., a_n, not a_n+1, ..., not a_o.

where 0≤m≤n≤oand eacha_i is an atom for 1≤i≤o.

A choice rule’s purpose is to express choices over subsets of atoms. If the body literals of the rule are satisfied, then any subset of its head atoms can be a part of the stable model.

For example, let us say we have the following program:

a. {b} ←a.

In this program, we would have two stable models: {a}, and {a,b}.

2.1.4.2 Cardinality Constraints

Expanding upon choice rules, let us extend that concept to feature cardinality constraints.

Consider the following ruler:

l{a₁, ..., a_m}u←a_m+1, ..., a_n, not a_n+1, ..., not a_o.

where 0≤m≤n≤o, and eacha_i is an atom for 1≤i≤o, andlanduare non-negative integers. We call the head of this rule a cardinality constraint.

(28)

Essentially, this is a choice rule whose selection is limited by the associated lower and upper bounds in the head atom. For instance, let us say we have the following program:

a. 1{b, c}2 ←a.

Here, we would have three stable models: {a,b}, {a,c} and {a,b,c}.

As a final remark, it should be noted that when cardinality constraints appear at the head of a rule (like in ruler), those rules can also be referred to as simply by generators.

2.1.5 Clingo

As we can imagine, the process of grounding and then solving for answer sets can become very time consuming as we consider more complex programs. Thankfully, tools that automate this process exist and we can leverage them to solveASPencodings in a much more timely manner. The general process of solving anASPprogram using such tools is illustrated in Figure2.2.

Logic Program

Logic Program Logic

Program Grounder Solver Answer

Set Answer

Set Answer Variable-free Set

Program

Figure 2.2: General process of automatedASPprogram solving.

Logic programs are given as inputs to a grounder, which is responsible for performing the grounding process we have seen in Subsection 2.1.3. The resulting variable-free program is then fed to a solver, which will compute the answer sets of the program and output them as a result.

In this work, we will be using the state of the artASPsystemclingo[25]. This system is composed of a grounder,gringo[26], and a solver,clasp[28]. Besides providing a grounder and a solver, clingo also offers additional flexibility when it comes to the encoding ofASP programs, allowing for the usage of complex operations such as conditional literals and aggregations, which we will explore in more detail in the following subsection.

2.1.5.1 Modeling

Here, we will go over a possibleASPencoding for a problem of some complexity using clingo. We will start by building a not very efficient, but intuitive and correct solution, and then improve upon it by making it more computationally efficient, by employing some advanced modeling techniques. This will allow us to gauge how differentASPen- codings of the same problem can make a great difference when it comes to computational efficiency, as well as showcase some of clingo’s more complex operations.

When modeling a problem using ASP, the basic approach consists of following a generate-and-test(also known asguess-and-check) methodology, which consists of a “generating” part responsible for providing solution candidates, and a “testing” part that is

(29)

in charge of eliminating candidates that violate some requirements. We will be making use of this methodology as we model this problem.

The following example was adapted from the solution iterations for then-queens problem, taken from the bookAnswer Set Solving in Practice[24]. In then-queens problem, the goal is to placenqueens on ann×nchess board, such that no two queens attack each other. A queen can attack another queen if they share the same row, column, or diagonal.

We will start by defining the board, consisting ofnrows andncolumns. Then, we will define a rule that will placenqueens on this board. The resulting encoding can be found in Listing2.1.

Listing 2.1: Defining then×nboard and placingnqueens on it.

1 row(1..n).

2 col(1..n).

3

4 n {queen(I,J) : col(I), row(J)} n.

Let us start by analyzing the syntax used to accomplish this. Lines 1 and 2 represent our board, with the predicates row andcol. Instead of specifying each individual row and column (row(1),col(1),...,row(n),col(n)), clingo facilitates this process by allowing the usage of 1..n. This refers to an interval abbreviating distinct facts over the values from 1 ton. Using the notation from2.1.2, it would be equivalent to having the facts ‘row(1)←, ... ,row(n)←’ representing each row, and anothernfacts representing the columns.

Line 4 makes use of some of clingo’s more complex constructs. First, we have what is called a conditional literal ‘queen(I, J)’ in ‘queen(I, J) :col(I), row(J)’. The purpose of this construct is to govern the instantiation of the head literal ‘(queen(I, J)’ through the literals ‘col(I), row(J)’. In practice,IandJare replaced during the grounding process by the values of the fact terms already defined in lines 1 and 2. For instance, ‘col(1), row(1)’

will instantiate ‘queen(1,1)’, and ‘col(1), row(2)’ will instantiate ‘queen(1,2)’, and so on.

Because we want to instantiatenqueens only, we place this conditional literal in a cardinality constraint: ‘n{queen(I, J) :col(I), row(J)}n’. As we have seen in Subsection2.1.4.2, cardinality constraints restrict the number of literals that are instantiated. In a general case, the value before the left curly bracket dictates the minimum number of literals to instantiate, and the value after the right curly bracket dictates the maximum. In our particular case, since we want exactlynqueens, these values are the same in both sides.

In Listing2.2we can see the result obtained from running the encoding of Listing2.1 on a 2×2 board.

As we can see, six stable models were found, which actually represent every single way two queens could be placed on a 2×2 board. Given our initial problem description, the correct solution should not have found any models, since it is impossible to place two queens on a 2×2 board without them sharing a row, column or diagonal. So, we have to update our encoding with the constraints that prevent this from happening. The

(30)

Listing 2.2: Result of running the encoding of Listing2.1on a 2×2 board.

c l i n g o queens . lp −−c o n s t n=2 0 c l i n g o v e r s i o n 4 . 5 . 4

Reading from queens . lp S o l v i n g . . .

Answer : 1

row ( 1 ) row ( 2 ) c o l ( 1 ) c o l ( 2 ) queen ( 1 , 2 ) queen ( 2 , 2 ) Answer : 2

row ( 1 ) row ( 2 ) c o l ( 1 ) c o l ( 2 ) queen ( 1 , 1 ) queen ( 2 , 2 ) SATISFIABLE

Models : 6

C a l l s : 1

Time : 0.021 s ( S o l v i n g : 0 . 0 2 s 1 s t Model : 0 . 0 0 s Unsat : 0 . 0 0 s ) CPU Time : 0.016 s

complete encoding can be found in Listing2.3.

Listing 2.3: Encoding ofn-queens completed with the missing problem constraints.

1 row(1..n).

2 col(1..n).

3

4 n {queen(I,J) : col(I), row(J)} n.

5

6 :− queen(I,J), queen(I,JJ), J != JJ.

7 :− queen(I,J), queen(II ,J), I != II.

8 :− queen(I,J), queen(II ,JJ), (I,J) != (II ,JJ), I−J == II−JJ.

9 :− queen(I,J), queen(II ,JJ), (I,J) != (II ,JJ), I+J == II+JJ.

10

11 #show queen/2.

In this updated encoding, lines 6-9 are integrity constraints much like the ones we have seen in 2.1.2: they are rules in which the head is left blank (with the head here being to the left of :- instead of to the left of←), and so their bodies can never be true.

Line 6 is responsible for ensuring that no two queens can be placed on the same column in different rows. Line 7 ensures that no two queens can be placed on the same row in different columns. Lastly, lines 8 and 9 are both related to the diagonal restriction. Line 8 guarantees that no queens can be placed on squares that are on the same right-leaning diagonal as another queen, and line 9 analogously guarantees the same for the left-leaning diagonal.

Finally, line 11 is a directive that advises the solver to project stable models onto instances of predicatequeenwith arity two. This makes the outputted result less cluttered,

(31)

as only the predicate queen will be shown when the stable models are computed. The result of running this encoding on a 4×4 board can be seen in Listing2.4(2×2 and 3×3 boards yield no stable models, as it is impossible to place queens that comply with the problem’s restrictions on these boards).

Listing 2.4: Result of running the completed encoding of Listing2.3on a 4×4 board.

c l i n g o queens . lp −−c o n s t n=4 0 c l i n g o v e r s i o n 4 . 5 . 4

Reading from queens . lp S o l v i n g . . .

Answer : 1

queen ( 1 , 2 ) queen ( 2 , 4 ) queen ( 3 , 1 ) queen ( 4 , 3 ) Answer : 2

queen ( 1 , 3 ) queen ( 2 , 1 ) queen ( 3 , 4 ) queen ( 4 , 2 ) SATISFIABLE

Models : 2

C a l l s : 1

Time : 0.009 s ( S o l v i n g : 0 . 0 1 s 1 s t Model : 0 . 0 0 s Unsat : 0 . 0 0 s ) CPU Time : 0.000 s

As we can see, this complete encoding produces results as expected. Figure2.3dis- plays what the two different queen placements would look like. Even though our solution is now correct, if we try to use higher values ofn, such asn= 100, then even after 300 seconds we would not be able to obtain all the answer sets (for reference, calculating for n= 4 took about one hundredth of a second, as evidenced by Listing2.4). This is because our encoding is not very efficient, and there are many optimizations we can apply that will make both the grounding and solving processes much faster.

1 2 3 4

4

1 2 3 4

4 3 2 1 3

2 1

Figure 2.3: Result of placing queens on a 4×4 board.

A first improvement can be obtained by eliminating symmetric rules that are obtained from the same constraint after the grounding process is complete. For instance, the integrity constraint ‘:−queen(I, J), queen(I, JJ), J! =JJ’ in line 6 of Listing2.3 will give rise to ground instances ‘:−queen(3,1), queen(3,2)’ and ‘:−queen(3,2), queen(3,1)’, both of which prevent the exact same placement of queens. We can remove this redundancy by replacing the inequality ‘J! =JJ’ with ‘J < JJ’, not only inline 6but also in the other integrity constraints. While this change almost halves the number of ground instances obtained from the four integrity constraints oflines 6-9, the resulting encoding still scales poorly and gives rise to over 1.5 million rules for the 100-queens problem.

(32)

A more thorough analysis of the integrity constraints reveals that all of them give rise to a cubic number of ground instances, which is to say that they produce O(n³) ground rules. We can drastically reduce this number by replacing the rule restricting placements in columns in line 6, ‘:−queen(I, J), queen(I, JJ), J < JJ’, with the rule

‘: −col(I), not1{queen(I, J)}1’. This rule asserts that, for each column, there has to be one and exactly one queen. Since we will have one of these rules for each of the n columns, we will be producing O(n) rules (each of size O(n)) as opposed to O(n³) as before. Similarly, the rule restricting placements in rows inline 7can also be replaced with the rule ‘:−row(J), not1{queen(I, J)}1’. It should also be noted that these two rules already imply that there has to be one and exactly one queen per column and row, respectively. As such, we can remove the redundant cardinality constraint inline 4, keeping

‘{queen(I, J) :col(I), row(J)}’.

As for the integrity constraints regarding the diagonals inlines 8 and 9, they require a more ingenious solution. In order to make use of cardinality constraints to optimize them, we will need to adopt an enumeration scheme. The idea is to enumerate diagonals in two ways, once from the upper right to the lower left (for the right-leaning diagonals), and analogously from the upper left to the lower right (for the left-leaning diagonals).

Tables2.1and2.2illustrate this forn= 4.

1 2 3 4

1 1 2 3 4

2 2 3 4 5

3 3 4 5 6

4 4 5 6 7

Table 2.1: Right-leaning enumeration.

1 2 3 4

1 4 3 2 1

2 5 4 3 2

3 6 5 4 3

4 7 6 5 4

Table 2.2: Left-leaning enumeration.

A number in these tables indicates the respectively numbered diagonal. We can easily identify each right-leaning and left-leaning diagonal by making use of the equations D =I+J−1 andD =I−J+n, respectively. For example, the first equation allows us to determine that diagonal 2 is defined by positions (1,2) and (2,1), as indicated in italics and bold in Table 2.1. With these equations, we can now replace the rule in line 8 with ‘:−D = 1..n∗2−1, not{queen(I, J) :D ==I−J+n}1’, and the rule in line 9with

‘:−D= 1..n∗2−1, not{queen(I, J) :D==I+J−1}1’.

In doing so, the 100-queens problem can now be solved in under a second, generating a mere 1494 rules as opposed to the over 1.5 million rules previously observed. Despite this, the grounding time still does not scale appropriately. If we were to tryn= 500, then grounding would still take 25 seconds.

By making use of clingo’s–verboseoption, which prints additional information during computation, we are able to pinpoint the source of the problem: the newly-replaced lines 8 and 9. It turns out that, during grounding, the tests ‘D==I−J+n’ and ‘D==I+J−1’ are repeatedly computed for the same values ofIandJ. To prevent this, we can precalculate both conditions by making use of two new rules: ‘d1(I, J, I−J+n) :−col(I), row(J)’ and

(33)

2 . 2 . B I O L O G I C A L R E G U L AT O RY N E T WO R K S

‘d2(I, J, I+J−1) :−col(I), row(J)’. We can now update the newly-replaced lines 8 and 9, and substitute the conditions that are being repeatedly computed in the conditional literals of these lines by ‘d1(I, J, D)’ and ‘d2(I, J, D)’, respectively. The result of this can be found in Listing2.5, which presents the fully optimized version.

Listing 2.5: Final optimized encoding ofn-queens.

1 row(1..n).

2 col(1..n).

3

4 {queen(I,J) : row(I), col(J)}.

5

6 :− col(I), not 1 {queen(I,J)} 1.

7 :− row(J), not 1 {queen(I,J)} 1.

8 :− D = 1..n*2−1, not {queen(I,J) : d1(I,J,D) } 1.

9 :− D = 1..n*2−1, not {queen(I,J) : d2(I,J,D) } 1.

10

11 d1(I,J,I−J+n) :− col(I), row(J).

12 d2(I,J,I+J−1) :− col(I), row(J).

13

14 #show queen/2.

While introducing lines 11 and 12 does result in a quadratic number of additional facts being computed when compared to the previous solution, because their computation is straightforward and exploits indexing techniques known from database systems, the grounding time is significantly reduced. Compared to the previous solution applied in the 500-queens problem, which would compute 3997 ground rules in 25 seconds, this new approach computes 503997 ground rules in just 3 seconds. This goes to show how factoring out relations as was done in lines 11 and 12 can actually accelerate grounding times, despite the fact that more rules are being generated. Testing with higher values ofn, such as 1000, further displays the efficiency of this solution, with grounding time taking 19 seconds and solving time 52 seconds. The previous solution would take 195 seconds to ground this program, and 73 seconds to solve it, while the other iterations were unable to complete the grounding altogether under the 300 second timeout.

To summarize, while it may not be challenging to create a working, correct encoding of a problem, optimizing it and making sure it is scalable is no easy task. By following a generate-and-test methodology, we start by generating all the solutions, and then use integrity constraints to test whether each generated solution is valid in light of our problem’s restrictions. This process helps us create a correct encoding. Then, to decrease the time it takes for these solutions to be computed, we can look for performance bottlenecks and optimize our encoding until an acceptable efficiency level is attained.

2.2 Biological Regulatory Networks

Now that we are somewhat familiarized withASP, let us shift our focus toBRNs. ABRNis a set of biological compounds (be they proteins, genes, metabolites or other) that interact

(34)

with each other and with other substances in a cell, representing the complex biological processes that take place in that environment.

Creating computational models of these networks is key to be able to reproduce existing observations, test hypotheses, and identify predictionsin silico. To better comprehend how these networks are modeled and revised, Figure2.4displays the combined applica- tion of experimental and computational tools that play an essential role in that process.

By using the available, pre-existing biological knowledge about a particular regulatory system, an initial model is constructed that allows for the behavior of the system to be simulated for a variety of experimental conditions. Comparing the data from these simu- lations with the data gathered from performing real-world experiments in the system, we are able to assert whether our model’s results match the expected results (in other words, we are able to check the consistency of our model). If they do not, the model is revised and the process is repeated until an adequate model is obtained [36].

constructs knowledge

generate

models

simulation

data regulatory

system

experiments help revising

comparisons

subjected to experimental

data

produce both required for

Figure 2.4: The process of constructing and revising models for regulatory systems.

A typical way of modeling aBRNis by making use of aregulatory graph.

Definition 2.2.1(Regulatory graph). A regulatory graph is a directed graphG= (V, E), where V = {v₁, ..., v_n} is the set ofn vertices (nodes) representing the regulatory compounds, andE={(u, v, s) :u, v ∈V , s∈ {+,−}}is the set of signed edges representing the interactions between compounds.

The nodes of a regulatory graph represent the biological compounds, whereas the edges show the interactions between two given compounds. An edge with a positive (+) sign means a positive interaction (or activation), signifying that the compound at the tail of the edge activates the compound at the head of the edge (if our edge is (u,v,+), we say that u activates v). Conversely, an edge with a negative (-) sign means a negative interaction (or inhibition), and thus we say that the compound at the tail inhibits the compound at the head (if our edge is (u,v,-), then u inhibits v). For a more visual representation, let us take a look at Figure2.5, which describes a regulatory graph whereV ={v₁, v₂, v₃, v₄} andE={(v₁, v₂,+),(v₁, v₃,−),(v₂, v₃,+),(v₂, v₄,+),(v₄, v₃,−)}.

Even though this regulatory graph allows us to see which compounds interact with others, its representation is not rich enough to help us understand exactly how the compounds are affected by those interactions. For instance, take nodev₃. By analyzing the graph we can see thatv₁andv₄inhibitv₃, and thatv₂activatesv₃, but what is the exact combination that actually makesv₃becomeactive? Perhapsv₂being active is enough, or

(35)

2 . 2 . B I O L O G I C A L R E G U L AT O RY N E T WO R K S

𝑣

^𝟷

𝑣

^𝟸

𝑣

^𝟹

𝑣

^𝟺

+

_ + +

_

Figure 2.5: Example of a regulatory graph.

perhaps it is insufficient and it is required that bothv₂ be active andv₁ be inactive, or even thatv₂be active and all the inhibitors be inactive. As it stands, we are incapable of knowing which combination of behaviors produces an effect onv₃. In section2.2.2, we will go over some additions that can be made to our representation that can shed some light on this missing information. Before doing so, however, we will first take a look at the different formalisms used by the community to model aBRN.

2.2.1 Different Modeling Types

There exist two main model types one can choose from when modeling biological regulatory networks: quantitative, or qualitative. It should be noted that some literature divide modeling types not in two, but three categories, with those being: continuous models, single-molecule level models, and logical (qualitative) models [37]. However, because continuous and single-molecule level models usually require far more quantitative information than qualitative models, and since our primary focus will be the latter, we have placed them both under the quantitative category.

2.2.1.1 Quantitative Type

Quantitative models are typically constructed with the aid of differential equations, thus requiring specific mathematical skills. Moreover, in order to achieve optimal results, they usually require a considerable amount of detailed experimental data (sometimes even complete data, that is, a data set in which each instance contains the values of all the variables in the network), collected over time. Two other important characteristics of these models are that they use real variables to represent the concentration of each biological compound, as well as equations to define the changes in concentration values over time. Some of the existing quantitative models are based on Bayesian Networks [20], Ordinary Differential Equations [37], Partial Differential Equations [36], Piecewise Linear Differential Equations [36], and Stochastic Master Equations [30]. As one can imagine, building such models is a rather intricate process, which may require data that is not always available. Our focus will be directed towards another, less demanding type of models: qualitative models.

(36)

2.2.1.2 Qualitative Type

Qualitative models help us deal with the lack of highly detailed data, since they offer simpler representations that consider a discrete number of usually small and (mostly) Boolean states, which describe the possible states for each compound (for example, active or inactive). Because of their low dependence on large amounts of detailed data, this type of models have shown to be ideal to represent systems where the available information is usually incomplete or noisy.

One possible concern that may arise when we think about adopting simpler representations of complex, real-world processes, is whether such representations actually provide us an acceptable level of accuracy and plausibility. In the case of qualitative models, because a compound tends to only affect another while above (or below) a certain concentration threshold, we are capable of using the discrete variables at our disposal to embody the different levels of concentration in our compounds. In doing so, we effec- tively manage to translate the real-world events under observation to a simpler, yet still useful and believable model for our purposes.

In the following subsection, we will review some of the most common types of qualitative (or logical) models we can use to model aBRN.

2.2.2 Logical Models

First introduced by Kauffman [31] and Thomas [77], logical models provide a simple way to define the state of a compound in aBRN, by treating the compound as a Boolean variable. Because Boolean variables can either beTrue(e.g. representing an active state) orFalse(e.g. representing an inactive state), we can map the behavior of our compounds to these variables in such a way that, if the concentration threshold defined for the compound to become active is reached, then we can represent it using a Boolean variable with the valueTrue. By the same token, should the concentration be below that threshold, we can represent it as a variable with the valueFalse.

2.2.2.1 Boolean Logical Models

As of now, we have not only established a way of representing the state of a compound, but in Section2.2we also saw how we could visualize the interactions between compounds via aregulatory graph. However, as we saw, we are lacking critical information regarding the specific combination of activators and inhibitors that produce a change in the state of the compound regulated by them (recall the example using nodev₃in Fig.2.5, in which we had no way of knowing what combination of states ofv₁,v₂ andv₄ activate it). To mend this, Boolean logical models (also known asBoolean network(s) (BN)) make use of regulatory functions for each compound, thus specifying what combination of regulators produces an effect on a given compound. Let us start by giving formal definitions of