Repairing - Revision of Boolean Logical Models of Biological Regulatory Networks using Answer-S

55 %there exists some implicant I_NO which is not inactive, then C has

56 %at least one active implicant at time T and so will be active at time T+1 57 active(E,T+1,C) :− function(C,I), not implicant_inactive(E,T,C,I_NO), 58 experiment(E), term(C,I_NO,_), time(T).

60 %If a compound’s state has been changed from time T to T+1, 61 %identify that compound

62 changed_compound(E,T,C) :− curated_observation(E,T,C,S1), 63 curated_observation(E,T+1,C,S2), S1 != S2.

65 %If a changed compound C is active but there is an 66 %observation stating it should be inactive,

67 %then the model is inconsistent

68 inconsistent(E,T,C,1) :− changed_compound(E,T−1,C), 69 active(E,T,C), curated_observation(E,T,C,0), T > 0.

71 %If a changed compound C is inactive but there is an 72 %observation stating it should be active,

73 %then the model is inconsistent

74 inconsistent(E,T,C,0) :− changed_compound(E,T−1,C),

75 not active(E,T,C), curated_observation(E,T,C,1), T > 0.

76 77

78 %Test

79 %In asynchronous mode, only one compound state 80 %may change at each time step

81 :− changed_compound(E,T,C1),

82 changed_compound(E,T,C2), C1 != C2.

83 84

85 %Display

86 #show experiment/1.

87 #show curated_observation/4.

88 #show inconsistent/4.

89 90

91 %Optimize

92 %Optimize for the smallest number of inconsistent compounds (applicable for 93 %when we’re dealing with incomplete observations)

94 #minimize{1,E,T,C : inconsistent(E,T,C,_)}.

4 . 3 . R E PA I R I N G

There are multiple different ways to repair a model, when it comes to what type of changes to the inconsistent functions one chooses to prioritize, in order to render them consistent:

• Add or remove a regulator;

• Change a regulator from activator to inhibitor, or vice versa;

• Change the format of the function (change number of terms and/or which regulators go in which terms).

This creates a large variety of possible solutions for repairing functions. Of course that, keeping in mind that a model which is built by hand by a human expert could have taken a lot of time and effort to be created, it is desirable that, as much as possible, we honor the model’s original design by introducing as few changes as possible to the inconsistent functions therein, in order to render them consistent. The more we alter a function, the more we will be changing its behavior, and so the more we deviate from the original functions intended behavior. As such, whatever modifications are done to a function, it would be in our best interest to minimize them. In our work, our aim was to approach this problem as declaratively as possible, so that we could provide an easy to understand and flexible framework for model repairs. The way we accomplished this was by minimizing the changes done to the original function with the following order of priority (from most important to minimize, to least important to minimize):

1. Minimize the changes to the number of function terms;

2. Minimize the changes to the function’s regulators;

3. Minimize the changes to the signs of regulators;

4. Minimize the changes to the format of each term.

The reasoning behind this prioritization and the impact that it has on the obtained results will be discussed in detail in Chapter5. For now, our focus will be to thoroughly explore how this repair process was implemented.

Our search for consistent candidates always begins at the same number of terms as the original inconsistent function. If no candidates are found, we consider all candidates with a difference of one in number of terms, then a difference of two, and so on until a consistent candidate is found. If two candidates are found at the same distance, they are compared and the best candidate is chosen to be presented to the user. Algorithm 1 illustrates this process in greater detail. Let us briefly go over it. In lines 1-3, we start by initializing the variables we will need to use. We start by setting the current node variation to zero, since we will begin our search by looking for functions that have the same number of nodes as the provided inconsistent function. Since we have no repaired function yet, we initialize it toNone, and lastly we obtain the node number of

the inconsistent function and the maximum number of nodes to consider by calling the functiongetStartingNodeNumberAndMaxLimit. This max limit will be discussed shortly after this explanation of the algorithm. With these variables initialized, we begin the search by using the functioncallClingoRepairto search for functions that have as many nodes as the starting node number, using clingo(lines 5 and 6). If a function is found, then it is returned and the algorithm terminates (lines 20 and 21). If no function is found, in lines 23-25, the node variation is incremented by one, and before going to the next iteration, we check to see if we have not yet exhausted the space of functions to explore (we will go over this part in greater detail briefly). If we have, we returnNonesince no possible function exists, otherwise we move onto the next iteration and enter lines 8-18.

In lines 8-16, we search for functions that are a node_variationdistance away from the inconsistent function, in terms of nodes. Then, in line 18 we compare the results from that search, and return the best function according to the selected optimization criteria, if one exists. If not, we go through lines 23-25 once more, and the cycle repeats until we either find a solution or reach the bounds set in line 24.

Let us go over those bounds now. We determine that no solution exists after we reach two bounds simultaneously:

• A lower bound of zero, since we cannot look for functions that have no terms;

• An upper bound that varies depending on the number of unique positive observa-tions, above which we will not find any solutions.

The lower bound is clear to understand, since we cannot search for functions that have no terms. But, the upper bound requires some discussion. Whenever the compound whose function we are repairing is active in some observation, we must ensure that, in our solution, at least one of the terms is active, so that the function outputs1and is therefore able to replicate said observation adequately. Likewise, whenever that compound is inactive in some observation, we must see to it that none of our terms is active, so that the function outputs0. Notice that while we require that all terms be inactive in order for negative observations to be upheld, the same is not true for positive observations. In order to uphold positive observations, we require that at least one term be active in the function.

Often times, it is the case that one term can single-handedly cover multiple positive observations (i.e. allow the function to replicate those observations), and it can also happen that a single positive observation is covered by multiple distinct nodes. However, let us assume the worst case scenario in which each observation is only covered by a single node, and each node only covers a single observation. In this scenario, we would need as many terms as positive observations, in order to obtain a consistent function. This is the upper bound we are using. If we are searching for functions that have as many terms as there are positive observations and yet we find no candidates, then we know that no possible candidate exists, lest we would have found it already. Instead of directly

4 . 3 . R E PA I R I N G

Algorithm 1:Function repair algorithm

input :model,observations,interaction_mode,inconsistent_f unction output :repaired_f unctionorN one

1 letcurrent_node_variation= 0;

2 letrepaired_f unction= None;

3 letstarting_node_number, max_node_limit =

getStartingNodeNumberAndMaxLimit(model,observations, inconsistent_f unction);

4 whileTruedo

5 ifcurrent_node_variation= 0then

6 repaired_f unction=callClingoRepair(model,observations, interaction_mode,inconsistent_f unction,starting_node_number)

7 else

8 lethigh_node_number=starting_node_number+current_node_variation;

9 letrepaired_f unction_above= None;

10 ifhigh_node_number ≤max_node_limitthen

11 repaired_f unction_above=callClingoRepair(model,observations, interaction_mode,inconsistent_f unction,high_node_number)

12 end

13 letlow_node_number=starting_node_number-current_node_variation;

14 letrepaired_f unction_below= None;

15 iflow_node_number >0then

16 repaired_f unction_below=callClingoRepair(model,observations, interaction_mode,inconsistent_f unction,low_node_number)

17 end

18 repaired_f unction=compareAndGetBestFunction(

repaired_f unction_below,repaired_f unction_above)

19 end

20 ifrepaired_f unctionthen

21 returnrepaired_f unction

22 else

23 current_node_variation+= 1;

24 ifstarting_node_number +current_node_variation>max_node_limit andstarting_node_number -current_node_variation≤0then

25 returnN one

26 end

27 end

28 end

using the number of positive observations as our upper bound, however, we do some pre-processing to filter out some duplicated positive observations (i.e. positive observations that describe the same model state, and that also come from the same previous model state). If we can ensure that one such observation is covered, due to the deterministic nature of our problem, then all other duplicate observations are also covered, since they would provide the same function inputs as the original. This pre-processing is especially helpful when we are dealing with time-series observations, as the number of observations we have at our disposal can be quite large due to the additional time component.

In our repair encodings, the solution (the repaired function) is represented as a set of nodes, with each node having one or more variables inside. Each node is a term of the repaired function, and each variable a regulator of that term. For example, say that via the repair process, the functionF_d= (¬a∧b)∨cwas obtained. In our repair encodings, this function would be encoded with the following atoms:

regulator_inhibitor(a).

regulator_activator(b).

regulator_activator(c).

node_regulator(1,a).

node_regulator(1,b).

node_regulator(2,c).

This representation is used simply because it is more convenient compared to the repre-sentation we have seen in Section4.1.1, due to how the rest of the repair encodings are implemented (as we will be seeing shortly). The final solution that is presented to the user is then converted into the usual format, so what they would see is:

regulates(a,d,1).

regulates(b,d,0).

regulates(c,d,0).

function(d,2).

term(d,1,a).

term(d,1,b).

term(d,2,c).

When exploring the possible candidates for a given number of function terms, we want to find the consistent candidate that most closely resembles the original function.

The most important aspect we chose to consider here is related to the regulators present in that candidate, in that they should deviate as little as possible from the regulators present in the original function. To do this, we respectively identify the regulators in the original function and in our solution as such:

original_regulator(C) :− regulates(C,compound,_).

present_regulator(C) :− node_regulator(N,C).

4 . 3 . R E PA I R I N G

In the first rule, we are labelling the regulators that are already defined in the model’s encoding as original regulators. Note that the second argument of theregulatespredicate is a constant namedcompound. Thiscompoundrefers to the compound whose function we are currently repairing. To clarify, our repair encodings repair only a single inconsistent function at a time, as opposed to the entire model in one go. This is not only to avoid an unnecessary increase inclingo’s grounding and solving time, but also in the number and complexity of our encoding’s rules. By focusing on repairing one function at a time, we can keep our encoding more readable, and more efficient. If we considered all the inconsistent functions at once, we would have to include additional rules, and possibly even additional arguments in the predicates we use, which would increaseclingo’s solving and grounding time, on top of making the encoding more difficult to understand. In this case, separating the problem and treating each inconsistent function separately seems like the wiser option. This is also why we have adopted a slightly different representation for the functions here. To clarify, repairing the model function by function or by considering all functions simultaneously does not influence the quality of the obtained results in any way. Because we already have the observations that tell us exactly what the expected behavior is for each function, we do not need to actually know what each function is like (in terms of regulators, their signs, and format) to be able to determine how they should interact with one another, since these interactions will already be implicitly represented in the observations.

The second rule is merely looking at the regulatorsC that are present in the nodes N of our solution, and labelling those compounds as being present. Our goal with these predicates is to be able to gauge which regulators are missing from the original function, and which were added in our solution:

missing_regulator(C) :− original_regulator(C), not present_regulator(C).

extra_regulator(C) :− not original_regulator(C), present_regulator(C).

An original regulator which is not present in the solution is labelled as amissing_regulator, while a regulator that is present but was not one of the original ones is labelled as an extra_regulator. With these predicates, we can tell clingo to minimize the number of missing and extra regulators, so that the regulators of the repaired function we obtain are as close as possible to those of the original function. This is accomplished via the following minimizations:

#minimize{1@3,C : missing_regulator(C)}.

#minimize{1@3,C : extra_regulator(C)}.

Note here that the weight given to each minimization is assigned after the ‘@’ symbol in each rule, with an integer. The higher the integer, the higher the priority given to that minimization. Here, both rules have the same weight since, in our second criterion, we

do not make any distinction between adding regulators or removing regulators when minimizing the changes to the regulators, and so either of them is just as important as the other.

Next, let us address how we chose to tackle the minimization of the signs of regulators, as it is straightforward to understand and allows us to display how we can use some of the prior knowledge we may have of a network to our advantage when performing repairs. Besides using observations, another part of the prior knowledge we have of a model may be the fact that we know that certain compounds are regulators of a certain other compound, or even that they are activators / inhibitors of that compound. This information can be directly written in a model’s encoding, and clingowill take it into account when performing repairs. For example, looking back at the model in Figure4.1, let us focus on the function of compoundv₁. Say that we knew for certain thatv₂was a regulator and, moreover, thatv₃was an inhibitor ofv₁. In this case, we could encode the model as is shown in Listing4.7(with changes made to lines 16 and 17).

Listing 4.7: Listing for model4.1’s encoding, fixing some regulators.

1 %Compounds 2 compound(v1).

3 compound(v2).

4 compound(v3).

6 %Regulators of v1 7 regulates(v2, v1, 1).

8 regulates(v3, v1, 1).

10 %Regulatory function of v1 11 function(v1, 1).

12 term(v1, 1, v2).

13 term(v1, 1, v3).

15 %Fixed regulators of v1 16 fixed(v2,v1).

17 fixed(v3,v1,1).

19 %Regulators of v2 20 regulates(v3, v2, 0).

22 %Regulatory function of v2 23 function(v2, 1).

24 term(v2, 1, v3).

26 %Regulators of v3 27 regulates(v2, v3, 0).

29 %Regulatory function of v3 30 function(v3, 1).

31 term(v3, 1, v2).

4 . 3 . R E PA I R I N G

Now, lines 16 and 17 encode that information. Line 16 is fixingv₂as a regulator ofv₁, but because we are not certain of its sign, that is all we can write. However, in line 17, because we know not only thatv₃is a regulator ofv₁, but that it also must be an inhibitor, we can use a third argument to encode the type of regulator it is. So,v3will always be seen as an inhibitor ifclingowas to repairv₁’s function, andv₂can still be either an inhibitor or an activator. Aside from that, what these two lines also tell us is that both of these regulators must be present in the final solution. This restriction is encoded in the following way:

:− fixed_regulator(C), not node_regulator(_,C).

If a compoundCis a fixed regulator, but is not present in any of the nodes in our solution, then the model is inconsistent. So, in order to determine the signs of the regulators for a given function,clingocan decide if a compound should be an activator, or an inhibitor.

This is done with the following rule:

1 {activator(C); inhibitor(C)} 1 :− compound(C), not fixed_activator(C), not fixed_inhibitor(C).

If a compoundChas not been fixed as an activator or as an inhibitor, then it is up toclingo to decide which one of the two the compound may be (as we will see, this choice does not imply that every compound will be a regulator of the function, it is merely choosing the sign it would have if it were). Using the signs that are assigned to each compound, we can then know which signs have changed by using the next two rules:

sign_changed(C) :− regulates(C,compound,0), inhibitor(C).

sign_changed(C) :− regulates(C,compound,1), activator(C).

In the first rule, if a regulatorCis an activator in the original model, but an inhibitor in our solution, then its sign has changed. Similarly, in the second rule, ifCis an inhibitor in the original model but an activator in our solution then its sign has also changed.

Finally, the optimization responsible for minimizing the changes to compound signs is the following one:

#minimize{1@2,C : sign_changed(C)}.

Let us now examine how we enforce the final optimization, regarding minimizing changes to the format of each original node. Firstly, we should begin by understanding the importance of this minimization. Suppose we have two functions that were obtained via the repair process:

F₁=a∨b∨c F₂=ab∨ac∨bc

Both of these functions have the same number of terms, the same regulators, and even the same regulator signs. Yet, they differ in their format. Suppose that the original

inconsistent function was:

F=a∨b

In this scenario, in order to minimize the changes done to the original function, we would like to chooseF₁overF₂. This is why this minimization was implemented. The way this is accomplished is by mapping (some of) the nodes of our solution to (some of) the nodes of the original function. This is done as follows:

missing_node_regulator(ID,R) :− term(compound, ID, R), node_ID(ID), not node_regulator(ID, R).

extra_node_regulator(ID,R) :− node_regulator(ID, R), term(compound, ID, _), not term(compound, ID, R).

In the first rule, if a nodeIDin the original function has some regulatorRwhich is absent from the nodeIDin our solution, then we say that node IDis missing regulatorR. In the second rule, if a node in our solution shares the sameIDas a node in the original function, but it contains a regulatorRwhich is not present in the original node, then we mark that nodeIDas having an extra regulatorR. The optimizations that ensure that the format suffers as little changes as possible are as follows:

#minimize{1@1,N,C : missing_node_regulator(N,C)}.

#minimize{1@1,N,C : extra_node_regulator(N,C)}.

These two optimizations are minimizing the number of changes done to the format of each node that shares the same ID as one of the originals. To enforce that we always want to use the original function’s IDs (so that we can minimize as much as possible the changes we are doing to the function), we make use of the following integrity constraints:

:− node_number <= TERM_NO, function(compound, TERM_NO), node_ID(N), N > TERM_NO.

:− node_number > TERM_NO, function(compound, TERM_NO), term(compound,N,_), not node_ID(N).

The first constraint applies when we are searching for functions that have less or an equal number terms compared to the original. When thenode_numberis less or equal to the highest original ID TERM_NO, if one of the nodes in our solution has an IDN greater thanTERM_NO, then we discard the solution because we are not using all of the original IDs (recall that we have another integrity constraint that ensures that we always have node_numbernodes in our solution. If we are searching for a function with less nodes than the original and yet we are using an ID that is not a part of the original IDs, then we are not using as many original IDs as we could). The second rule is for when we are searching for more nodes than the original function had. In this case, we discard our solution whenever there is some nodeN in the original function which is not present in

4 . 3 . R E PA I R I N G

our solution (if we are searching for more nodes than the original, then all the nodes in the original should be a part of our solution).

Now that we have seen how each of the minimizations work, let us get a better idea of how thenode_regulatorpredicate is created. This is the predicate that represents the repaired function. As we have already mentioned, when searching for function repairs, we begin by searching for functions that have the same number of terms (or function nodes) as the original. So, a simple way to create node_regulatoratoms would be to do something like this:

#const node_number = n.

node_ID(1..node_number).

1 {node_regulator(ID,C) : compound(C)} :− node_ID(ID).

In the first rule, we define a constantnode_numberthat tells us how many nodes we are considering when searching for solutions. Then, in the second rule, we create anode_ID atom for each number up to node_number, so that we may then use them in the third rule to generate the node_regulatoratoms (note that there is a cardinality constraint at the head of the third rule to ensure that eachnode_IDhas at least somenode_regulator, so that every term in our function has at least some variable in it). However, there is an issue with this thought process. Suppose that our original inconsistent function had 4 terms, yet we were unable to find any consistent functions with that same number of terms. Next, we would have to look for functions with 3 terms and 5 terms. Our problem resides in the former case. Suppose that we were able to find a function with 3 terms, and that this function was obtainable by simply taking the first, second and fourth terms from the original function. Because ournode_numberis limited to 3 (with our node IDs being 1,2,3, while the original ones were 1,2,4), we would never consider the fourth term of the original function when performing the minimization of the node formats (recall that the rules responsible for this can only compare nodes that share the sameID).clingowould instead be registering unnecessary changes to the third function term, which could have been avoided if only we could consider all of the initial function terms in our repairs, and simply say that our solution is comprised of the nodes with IDs 1, 2 and 4. So how do we correct this? We do so in the following manner:

available_node_ID(1..TERM_NO) :− function(compound, TERM_NO).

available_node_ID(TERM_NO + 1..node_number):−

function(compound, TERM_NO), node_number > TERM_NO.

{node_regulator(N,C) : compound(C)} :− available_node_ID(N).

:− node_number != #count{N : node_regulator(N,R)}.

We define anavailable_node_IDpredicate, that always has as many IDs as terms in the original function. This is done in the first rule. In this manner, we will always be able to

No documento Revision of Boolean Logical Models of Biological Regulatory Networks using Answer-Set Programming (páginas 72-97)