Nonself detection in immune-inspired models

(1)

Universidade de Aveiro

2010 Departamento de Física

André Machado

Lindo

Detecção de elementos estranhos em modelos

inspirados em imunologia

(2)

Universidade de Aveiro

2010 Departamento de Física

André Machado

Lindo

Detecção de elementos estranhos em modelos

inspirados em imunologia

Nonself detection in immune-inspired models

dissertação apresentada à Universidade de Aveiro para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Engenharia Física, realizada sob a orientação científica do Dr. Fernão Rodrigues Vístulo de Abreu, Professor auxiliar do Departamento de Física da Universidade de Aveiro

(3)

o júri

Presidente Prof. Dr. João Filipe Calapez de Albuquerque Veloso

professor auxiliar do Departamento de Física da Universidade de Aveiro

Orientador Prof. Dr. Fernão Rodrigues Vístulo de Abreu

professor auxiliar do Departamento de Física da Universidade de Aveiro

Arguente Prof. Dr. Bruno Miguel Paz Mendes de Oliveira

(4)

acknowledgements To my parents, for raising me and supporting my academic studies.

To my project supervisor, Professor Fernão Abreu, for his intellect, for his guidance and for what I’ve learned from him.

To Bruno Faria, Patricia Mostardinha, and my other colleagues of the Bio-Inspired Physics Group, for the help and useful discussions.

(5)

palavras-chave sistemas imunológicos artificiais, selecção negativa, frustração celular, discriminação self / nonself.

resumo Neste trabalho é apresentado um algoritmo para detecção de elementos

estranhos (nonself) baseado no mecanismo de Frustração Celular. Este mecanismo apresenta uma nova abordagem às interacções celulares que ocorrem no sistema imunológico adaptativo. O conceito é o de que qualquer elemento estranho estabelecerá interacções menos frustradas do que os restantes elementos do sistema, podendo por isso, através do seu comportamento anómalo, ser detectado. O algoritmo proposto possui vantagens em relação aos sistemas imunológicos artificiais mais conhecidos. Entre elas está a possibilidade de obter detecção perfeita com um número reduzido de detectores. Nesta tese, analisa-se comparativamente este algoritmo com algoritmos de selecção negativa existentes na literatura.

(6)

keywords artificial immune systems, negative selection, cellular frustration, self/nonself discrimination.

abstract In this work an algorithm for nonself detection is presented, based on the

Cellular Frustration mechanism. This mechanism presents a novel approach to cellular interactions occurring in the adaptive immune system. The concept is that any nonself element will establish less frustrated interactions than the remaining elements of the system, can thus, by its anomalous behaviour, be detected. The proposed algorithm has advantages over the most know artificial immune systems. Among the advantages there is the possibility to achieve perfect detection using a reduced number of detectors. In this thesis, this algorithm is analysed comparatively to negative selection algorithms that can be found in literature.

(7)

“There is nothing worse for mortals than a wandering life.” Homer, Odyssey

(8)

(9)

List of Figures

1.1 Representation of surface elements of T cells and APCs . . . 5

1.2 Different representations of self and nonself space . . . 6

1.3 Schematics of string space ( ) and detector space ( ) and their subsets . . . 8

1.4 The appearance of holes . . . 9

1.5 Operation of the NSA using the exhaustive search algorithm . . . 10

1.6 Results from table 1.2 considering the first self set ( = 100, [0, 2 − 1]) for different string space defined by the string length . . . ₁₆

1.7 Results from table 1.2 ( ∈ [0, 2 − 1]) and table 1.3 ( ∈ [0, 2 − 1]), considering the self sets with = 100 for the string space defined by = 12 . . . ₁₆

1.8 Results from table 1.3 considering the mean of 25 random self sets ( = 100 , ∈ [0, 2 − 1]) for the string space defined by the string length = 12 . . . ₁₇

1.9 Increase in the size of the complete detector set for a failure probability about 1% . . . . 18

2.1 Cellular Frustration in ABC system . . . 23

2.2 Cellular Frustration in ABCD system . . . 24

2.3 Circular frustrated system . . . 27

2.4 Generalized model for nonself detection consisting of two cell classes: APCs and T cells . . . ₃₀

2.5 Features of the Cellular Frustration model . . . 31

3.1 Example for the score calculus between TWR and #rcb rules, showing the difference between the two calculations. . . ₃₆

4.1 Information structure of T cells and APCs used in the Cellular Frustration Algorithm . 40 4.2 Example of the receptor for maximal frustration given the T cell ligand . . . 43

4.3 Evolution of conjugate lifetimes during the education process using adaptive threshold education . . . ₄₄

4.4 Monitoring analysis. . . 45

4.5 Plots showing a good detection with some orders of magnitude (a) and a non-detection (b) . . . ₄₆

(10)

List of Tables

1.1 Biological to computational correspondence of structures . . . 6

1.2 Experimental results for different matching thresholds , for two related self sets . . . 15

1.3 Experimental results for different matching thresholds , for 25 random self sets of size = 100 . . . ₁₅

2.1 Comparison between important features of the two immunological frameworks: negative selection and cellular frustration. . . ₂₅

3.1 Number of ILists for the TWR rule . . . 36

3.2 Number of ILists for the #rcb rule . . . 37

3.3 Singular toggles considered and the result of their application. . . 37

4.1 Results for known self sets (also used in chapter 1, table 1.2) . . . 46

4.2 Results for 10 random self sets for three different self sizes (standard deviation in brackets) . . . ₄₇

(11)

Introduction

It is impractical to find and patch every security hole in a large computer system. The need for more comprehensive approaches to security is constantly increasing due to the potential space for a malicious code. This potential space grows exponentially with the number of instructions, and so, it is virtually infinite. Classical protection mechanisms mostly use detection of known signatures or complete mapping of every potential harmful signature (positive approach). On one hand, this approach requires large databases to store all the known harmful signatures. It is also unpractical to verify all the signatures in such a large database. On the other hand, there is the problem of mimicry by malicious codes (viral) which are similar to known safe codes (non-viral). This would even require scanning all the surroundings to verify very similar codes against the database. This kind of approaches is also not prepared to novel types of intrusions, i.e., unknown codes.

The problems presented suggest a discussion whether to use negative approaches instead of a positive approach (as currently used in computer antivirus). The negative approaches use the known safe information (self) to detect completely unknown viral codes (nonself). In this case, the known information is quite less in size and diversity. The use of immune-inspired algorithms is a pertinent solution in this context. The immune system provides a robust and multi-layered protection of the human body against external agents (nonself). Artificial immune systems (AIS) draw inspiration from the immune system’s specific properties such as recognition, learning and memory.

Stephanie Forrest and collaborators built approaches using the so-called Negative Selection Algorithms (NSA). NSAs are immune-inspired algorithms dealing with nonself detection in data of computer systems. These algorithms consist in generating the smallest set of detectors to cover the largest possible regions of the nonself space. However, in this kind of approaches, achieving perfect nonself detection is unachievable. NSAs require an unreasonable number of detectors to obtain good detection performances. To overcome this problem and reduce the size of the detector set, the detection performance considered is safe (~0.9) but not perfect detection.

A different approach is proposed in this thesis to address the same motivation as NSAs. The mechanism developed by the supervisor of this thesis is based on the Cellular Frustration framework. Pathogenic interactions will be less frustrated than the interactions of the remaining system and, therefore, because of its anomalous behaviour can be detected. In the literature, we have outlined the possibility of using cellular frustration approaches to discuss immunological mechanisms. It has been

(14)

2

explained, for instance, that it is possible to conceive the existence of tolerance with completely reactive cells. Conventionally, immunologists do not think that it is possible that very reactive cells can achieve tolerance in this way of cells’ frustration. It has been discussed how to reconcile these two apparent incompatible concepts, which is: high reactivity against anything that was not previously in the system (nonself) and complete tolerance against everything that has already been presented (self).

The main contribution of this thesis concerns the formulation and validation of a Cellular Frustration algorithm (CFA) for self/nonself discrimination in general data sets. This work follows from the PhD work under development by Patricia Mostardinha. In her work she recently showed, using the Interaction Lists (ILs) formalism, that it is possible to define a Cellular Frustration algorithm to perform perfect self/nonself discrimination on a set of arbitrarily diverse elements (this work is in preparation for publication). Important issues remained on how these results could be applied on practical settings as, for instance, those addressed by Forrest and collaborators, where an intruder is looked for in a data set made of binary strings. Mostardinha’s approach defines T cells receptors as Interaction Lists, and the similarity between ligands as ranks in these Interaction Lists. Furthermore, the number of detectors considered in this approach is of the size of the self set, which is completely opposite to the approach of Forrest and collaborators. The aim of this thesis is to define algorithms that use only strings to code the information Mostardinha’s put on Interaction Lists. A huge information compression takes place in this passage. Consequently, it becomes nontrivial to understand whether such a class of algorithms could perform as well as predicted by Mostardinha by more theoretical arguments. We also want to compare the detection performances of the developed algorithm with results from NSAs found in literature.

This thesis is organized as follows.

In chapter 1, Negative Selection algorithms (NSA) are presented, introducing the immunological concepts that inspired this kind of algorithms. The algorithmic approaches proposed by Forrest and collaborators for nonself detection are explored and results for diverse cases are obtained with numerical simulations.

In chapter 2, the Cellular Frustration framework is discussed. The mechanism is explained with some examples and different possible models. A new model for nonself detection is presented, providing the basis for the Cellular Frustration algorithm developed in chapter 4. Comparisons with the approaches by Forrest and collaborators are established. This approach led to the development of strategies based on very diverse coding binary rules to compress the information contained in Interaction Lists. In chapter 3, sets of binary rules are explored, focusing their diversity and application in the Cellular Frustration algorithm developed in chapter 4.

In chapter 4, a Cellular Frustration algorithm is developed for nonself detection using binary representation. The algorithm’s features are explained and some improvements introduced. Results from numerical simulations are presented for diverse systems and compared with results of Negative Selection algorithms. This thesis ends with overall conclusions.

(15)

3

Chapter 1

Negative Selection Algorithms

Negative Selection algorithms (NSA) are immune-inspired algorithms developed for complex anomaly detection problems. This type of algorithm tries to mimic the process of self/nonself discrimination that results from the negative selection mechanism in the thymus.

This chapter focuses on NSAs developed by Forrest and collaborators [1-3]. First I will introduce some basic immunological concepts needed to understand negative selection in the immune system. Afterwards I show how computational algorithms were developed inspired by these ideas. Important mathematical quantities used to measure the distance between two strings are defined. Results from numerical simulations are presented and detection performances examined for several possible parameters. These results are important for a comparison with the algorithm we are developing which is introduced in chapter 4.

1.1 Basic Immunology

The human body is constantly subject to attacks from a huge diversity of micro-organisms such as bacteria, parasites, viruses, and fungi, known collectively as pathogens. These pathogens are a source of many disorders. For instance, pneumonia is caused by bacteria, AIDS and influenza are caused by viruses, and malaria is caused by parasites. Pathogens can be especially harmful because they replicate. A limited definition of the Immune System’s “purpose” would be that the Immune System exists to protect the body from pathogens, and to do so while minimizing damages inflicted to the body and maintaining its continued functioning. The two main problems that the Immune System faces are:

1. The detection of pathogens;

(16)

4 Negative Selection Algorithms

This thesis deals only with the problem of detecting pathogens (1.). This problem is often described as that of distinguishing “self” from “nonself”. That is, distinguishing elements that belong to the body (self) from pathogens (nonself) [4-6].

The Immune System is divided in two main structures: the innate immune system and the adaptive immune system. All vertebrates are born with the innate immune system. This system is formed by a set of cells that are able to recognize and bind to common molecular patterns, eliminating the associated pathogens. The innate immune system is inherited from ancestors and is mainly static. Cells from the innate immune system are also used to present molecular patterns to cells from the adaptive immune system and possibly initiating an adaptive immune system response.

The adaptive immune system will be focused in this thesis. This system is “antigen-specific”, which means it adapts and “learns” to recognize and attack only specific kinds of pathogens. It also retains “memory” for future detections. Another interesting feature is that the primary response to a pathogenic invasion often becomes apparent only a few days after the infection. The adaptive immune system is made up of white blood cells, called lymphocytes. These lymphocytes can be subdivided in two classes: B and T cells. They carry recognition units on their surface, called antibodies and receptors, respectively. A vast majority of researchers believe that through these recognition units the adaptive immune system is capable of performing self/nonself discrimination.

Immature T cells are produced in the bone marrow. In order to perform self/nonself discrimination T cells must undergo a maturation process. This process occurs in the thymus gland in two main phases: positive selection and negative selection.

Antigen presenting cells (APCs) are cells from the innate immune system that display fragments of antigens to T cells. They present them on their surface through a major histocompatibility complex (MHC). These fragments are called epitopes and constitute the APCs’ ligand used as a “binding region” to T cell receptors.

In positive selection, T cells are selected depending on whether they are fully functional in recognizing MHC molecules. In this phase, non-functional T cells which are unable to bind to MHC molecules, are eliminated by a process called “death by neglect” [5]. This mechanism is also known as MHC restriction.

Negative selection is the mechanism used by the immune system to avoid that lymphocytes react

against self cells (tolerance). The thymus allows thymocytes (immature T cells) to mature in an environment protected from contact with foreign antigens (blood-thymic barrier). During this process antigen presenting cells present only self peptides through MHCs to T cells. T cells binding with high affinity will be eliminated by apoptosis. As a result, only T cells reacting strongly to nonself peptides will remain. Negative selection provides in this way the possibility of distinguishing self antigens from nonself antigens because, in principle, only those T cells that bind (strongly) against any peptide that is not a self peptide will remain.

(17)

Nega O prese Fig. 1 antigen inside w Th recog comp that e divers Th patho adapt Pa prote mech Speci memo

1.2 I

Th maint comp biolog due to tive Selectio

nly after the nted by antig

.1 Representatio

n presenting cel within the cell, a

here are som gnition event plex. These in each type of se repertoire he immune ogens replica t to specific f athogens sho ct the body hanisms to av ialized T cell ory.

Introduct

he Immune tenance of putation) are gical Immun o its multi-lay on Algorithm e maturation gen presentin on of surface e ll (APC) binds and is recognize me key feature ts occur by nteractions ar f receptor ca of receptors system need ate exponenti foreign protei ould be elimi y and maint void autoimm ls may be re

tion to Ne

System is a the body. a field devo e System. Th yered defence s n process T g cells (APC) elements of T c to the T cell re ed by the T cell. es of the imm y ligand-rece re specific and an recognize to cover the ds to detect ially. The im ins. nated but, o ain homeost mune respon esponsible fo

egative S

remarkable a Artificial Im oted to the s he broad use e mechanism cells are ab ).

cells and APCs eceptor (TCR). T (adapted from mune system eptor interac d affinity-bas many differ whole patho and elimina mmune system n the other tasis. This i nses by induc or tolerance

Selection A

and complex mmune Syst study of com of biological ms. ble to recog . The major hi The MHC cont [7])

m that are rel ctions (APC sed. And beca rent pathoge ogen space. ate pathogen m incorpora hand, self ce s called tole cing toleranc (e.g., T-help

Algorithm

x natural syst tems (AIS) mputational m ly inspired A gnize differen stocompatibility tains peptide fra

evant for thi -Tcell), form ause recognit ns, there sho s as quickly tes mechanis ells must be erance. The e to self (e.g er and T-reg

ms

em: it provid (also know models based Artificial Imm ent antigenic y complex (MH agments from t is thesis. Det ming the TC tion is specifi ould be a su as possible sms that “le preserved in immune sy g., negative s g cells) as w

des both def wn as immu d on the prin mune Systems 5 peptides HC) of the the antigen tection or CR-MHC ic, despite ufficiently , because earn” and n order to ystem has selection). well as for fence and unological nciples of is mainly

(18)

6 In Artifi matur produ elimin select list of Table Th opera comp using unma Th check nonse Fig. 1. have a of diffe n particular, icial Immune ration of T c uction of T nated. By neg ted mature T f the termino 1.1 Biological to he main strat ation of the m prehends the g a defined m atched ones a he second sta ked continuo elf if there is .2 Different rep abstract represen ferent sized dete

negative sele e Systems. N cells in the th

cells in the b gative selecti T cells are rea ologies used in o computationa Immun T cells APCs epitope pat self affinit tegy used by majority of N detector gen matching rul are kept. age consists i ously against a matching, a presentations of ntations and dif ectors (in green)

ection algorit NSAs try to hymus. The m bone marrow on, only T c ady to detect

n the two sys

l correspondenc ne System , antibodies s, antigens es, peptides thogens antigens ty measure NSAs is to c NSAs consis eration. Strin le, candidate in the detecti incoming s and as self ot

f self and nonsel fferent complexi showing an ove thms (NSA) replicate th majority of th w. Then, in t cells that bind

pathogens. T stems. ce of structures. m s s

cover the non ts of two sta ngs are genera e detectors t ion of the no strings. Using therwise.

lf space. For dif ity. a) 2-Dimen erlap, in the 2-D are anomaly he negative s he algorithms the thymus, T d nonself are Table 1.1 con . Comput dete pres strings anomal comput se match nself space w ages: censoring ated randoml that match a onself. Detect g the match fferent anomaly sional possible Dimensional repr Negative y/change det selection pro begin by rep T cells that b e kept. Only ntains a corre er System ectors senters s of data lous data, er viruses lf set hing rule with a suitable g and monitori ly and compa any self strin

tors generate hing rule, str y detection prob representation ( resentation (self Selection Al tection appro ocess involve producing th

bind self pep after this pro espondence b e set of detec ring [5]. The f ared with a se ng are discar ed in the first rings are cla

blems, domains’ (self in green). b

f with red crosse

lgorithms oaches in ed in the e random ptides are ocess, the between a ctors. The first stage elf set. By rded, and t stage are ssified as ’ space can b) Example s).

(19)

Negative Selection Algorithms 7

NSAs work through the definition of the complement set of the self set. The purpose is to discriminate self and nonself giving as an input the set of self samples only. These are called “one-class learning” algorithms. No prior knowledge of nonself is required.

Particular NSAs are characterized by the way detectors are represented, the matching rules used, and the mechanisms for generating detectors and discarding self-reactive candidate detectors.

A generic NSA is presented in pseudo-code as follows: Algorithm 1.1: Generic Negative Selection Algorithm.

input : = set of self elements, = shape space, ⊂ 1 begin

2 Generate a set of detectors, such that each fails to match any element in .

3 Monitor data continuously by matching the detectors in against any string in . If any detector matches with consider the string as nonself , otherwise as self .

4 end

1.3 Detector Generating Algorithms

Here I will present algorithms proposed by Forrest and collaborators [1-3]. The first algorithm was proposed in 1994 [1] as one of the earliest artificial immune systems and consequently it received a lot of attention. Since then, many variations have been proposed, such as the linear time and the greedy algorithms [2, 3]. Here I will only discuss the original algorithm (known as the exhaustive search algorithm) and the linear time algorithm.

An important feature in these algorithms is how distances between strings are established, which is known as matching rules. Matching rules are crucial to the performance of the algorithm. There are many matching rules that can be defined depending on the application or motivation. Examples are such as Hamming distance, r-chunk and Rogers-Tanimoto, etc. The matching rule used by Forrest and collaborators is the -contiguous matching rule, also called “ -contiguous bits” (rcb). It is well accepted among theoretical immunologists the suitability of the -contiguous matching rule to quantify the binding strength between antibodies and antigens [5, 6]. This matching rule works as an affinity threshold between the complementary immunological structures and has a natural representation in binary coding.

Given a matching threshold , if there are < contiguous bits between two strings, there is no matching. Otherwise, if there are at least contiguous bits ( ≥ ), then matching takes place.

Definition A bit string with = … , and a detector with = … , match with -contiguous rule, if a position exists where = , for = , … , + − 1 and ≤ − + 1.

(20)

8 Th gener requir enlarg A of tw on th To algori possib purpo possib chose the d nonse which string the m at the Fig. 1. ( ) an chosen Detect detecto he value of ral is the dete

rements by d ges the region

perfect matc wo random str he matching th o understand ithms [2, 3], ble strings a ose of the NS ble ( = en from the c detector set elf strings h is known a gs. It should b matching thre e expense of h .3 Schematics of nd nonself ( ). n from . Nons table nonself st or set . (from determines ector (higher diminishing th n covered by ch is rare for rings of lengt hreshold th d the relation the following and the s SA is to find − ) withou candidate de

has all the ’. However, s holes ( = be stressed o shold is in having an inc f string space ( Candidate dete self strings can b trings can be d [3]) the detectors detection cov he number of each detecto r a reasonable th matching hrough the eq ≈ nships betwe g relationship space of all p a detector se ut matching a tectors , co candidate de a fraction o = − ). H ut that holes creased, i.e., creasing num ) and detector ectors ( ) are t be subdivided in detected ( ) o s’ degree of s verage). The f bits that ha or. e string lengt g under -co quation: 2 − 2 +

een the diffe ps are represe possible dete et ⊂ th any of the s onsisting of a etectors , it f the total n Holes appear are unavoida having small mber of detect r space ( ) and those that do no n detectable non or not ( = specificity: th -contiguou ve to be mat th and mat ntiguous bits 1 erent string d ented in figur ectors. Given hat matches a elf strings in all strings tha would be po nonself string r because no able in NSAs ler detection tors necessary d their subsets. S ot match any st nself strings ( − , not lab Negative he smaller the s matching ru ched (not the

ching thresho s [1], for bina domains invo re 1.3. Consi n a set of se as many of th n . The dete at do not ma ossible to ma gs is miscl possible dete . The numbe regions. How y. String space is p tring in . The ′) and holes ( beled), dependin Selection Al e value of , ule relax the e whole strin old . The p ary alphabets volved in this ider the sp elf strings he nonself str ector strings atch any strin

atch all the d lassified as b tector can ma er of holes de wever, this is partitioned into detector set = − , no ng on the cho lgorithms the more matching ng). It also robability , depends (1.1) s kind of pace of all ⊂ , the rings as in are ng in . If detectable being self, atch these ecreases if achieved self strings is a subset ot labeled). oice of the

(21)

Nega G all no detec undet string collab Co establ To Fig. 1. created strings tive Selectio Generally, only onself strings tor set is s tectable strin gs misclassifi borators do n onsidering th lished [5, 6]: o further ana = = = = = = = .4 The appearan d when candidat s that are discard

on Algorithm y a subset of s in . False smaller than t ngs after all t ied as being not allow this

he classifica

alyse these alg initial dete detector re number of probability matching r probability probability detector in number of nce of holes. Se te detectors that ded as matching s f the candidat negatives the set of all the possible d g nonself. H type of error tion represe = ∪ = ∪ ∪ ′ ∩ = gorithms, som ector repertoi epertoire size f self strings y that a rando rule y of a random y that a sing n , i.e., prob f holes

elf space will be t match these str g self, and so, no

te detectors i are nonsel candidate de detectors hav However, neg rs: the numb ented in figu , ∩ ⊆ ′ ⊂ ∪ , = ∅, ∩ ⊆ me notation a

ire size (cand e, after censorin (size of the s omly chosen m string not m gle random n bability of no matched differ trings are elimin o valid detectors is included in lf strings whi etectors ( ve been defin gative selecti er of false po ure 1.3, the = ∅ = ∪ ′ = ∅ and correspon didate detecto ng elf set, | |) string and de matching any nonself strin n-detection rently depending ated due to mat s are possible to

n , allowing ich are not d

= ’ − ). ned. False po on algorithm ositives is alw e following nding definiti rs), before cen etector match y of the se ng will not b g on the matchi ching with a sel generate that m

g only the det detectable be Holes are th ositives woul ms from Fo ways equal to 0 relationships

ions are intro

ensoring h according to elf strings be matched b ing threshold . lf string. Holes a match these strin

9 tection of cause the e leftover ld be self rrest and 0. s can be (1.2), (1.3) (1.4) (1.5), (1.6) (1.7), (1.8) (1.9) oduced: o the by any . Holes are are nonself ngs.

(22)

10

1.3.1

An and it algori check set numb with a in the Fig. 1. are col consid Th proba Here Wher

1 The Exh

n exhaustive t was claimed ithm is simila ked against al , are kept as ber of desired all the genera e following fig .5 Operation of llected to the d dered nonself de he algorithm ability of mat is: re is the p

haustive

search algori d to be analo ar to Algorith ll the self stri s valid detec d detectors is ated detectors

gure.

f the NSA using detector set . tections. (adapte m tests r tching self ( ) probability of

Search A

ithm was firs ogous to the hm 1.1 presen ngs, using th tors. The de s achieved. N s (one by one g the exhaustive Then by contin ed from [6]) andom cand ), is then rela = f matching an

Algorithm

st proposed b natural nega nted above. C he -contiguo etector set Nonself detec

e). The severa

e search algorith nuously matchin didate detecto ated by: = ∙ = (1 − ) nd is the si by Stephanie ative selection Candidate de ous bits rule. S

is obtained ction is achiev al steps invol hm. Candidate d ng strings with ors building

ize of the sel

Negative Forrest and n process in t etectors are Strings which by repeating ved by check ved in this al detectors that the detector se a detector se f set. Selection Al collaborators the thymus [ e drawn rand h fail to matc ng this proce king every giv

lgorithm are i

t do not match et, positive mat

et of size lgorithms s in 1994, [1-4]. The domly and ch the self ss until a ven string illustrated self strings tchings are . The (1.10) (1.11)

(23)

For small and a large this can be approximated by:

≈ (1.12)

Similarly, for the probability of failure :

= (1 − ) (1.13)

Which for low and large , can be approximated by:

≈ (1.14)

Using expressions (1.10) and (1.14), it is possible to find that:

= ∙ ≈− (1.15)

Solving (1.15) with respect to , and then using (1.12), it is easy to obtain:

= −

∙ (1 − ) ≈ −

∙ _(1.16)

With these formulas it is possible to calculate theoretically the size of the initial repertoire for a fixed failure probability. These formulas are approximate for the case of independent detectors. However, when or increase, detectors become less independent which means that more detectors overlap. This results in a higher failure probability than it would be expected from (1.13). Other considerations concerning these formulas are thoroughly discussed in [1-3].

Relatively to the computation time of this algorithm, the algorithm depends exponentially in the size of as is clear from (1.16). In fact, the algorithm time complexity depends on finding suitable detectors among all candidate detectors. In terms of memory space complexity the algorithm depends only on storing the self strings of length .

time: _{∙ (1 −}− ₎ ∙

space: ( ∙ )

This algorithm works with any matching rule. However it has the disadvantage of being inefficient for detector generation. The linear time and greedy algorithms were developed to be more time efficient.

1.3.2 Linear Time Algorithm

The linear time detector generating algorithm was developed in order to improve the computational practicality of the exhaustive search approach. The approach considered by Forrest and collaborators considers the counting of candidate detectors with common patterns (templates). This feature avoids selecting similar detectors, which is recurrent in random detector generation. The algorithm requires

(24)

the storage of a matrix that contains information about all the possible ways two strings can match over -contiguous bits. Furthermore, matrix contains the number of all correlated strings that are unmatched by the self set .

The linear time algorithm follows a two-phase scheme [2, 3, 5]. In Phase I, the counting recurrence of the candidate detectors set is overcome. In Phase II, a detector set is generated randomly using the enumeration created previously in Phase I. The following notation is used to describe the method:

̂ – the string stripped of its first bit.

̂ ∙ – the string ̂ appended with after the last bit, where {0,1}

, – template with fully specified contiguous bits starting at position given from the string .

The template is a size string with − blank symbols (e.g., ∗∗∗ 101 ∗) [ ] – number of right completions of , unmatched by any string in

(example: with = 7, = 3, = 110, , =∗∗ 110 ∗∗, one right completion for , is ∗∗ 11010) Phase I: Solving the counting recurrence. Here, the following question is answered: how many -bit strings containing the template are not matched by any string in ? This question is answered recursively by calculating [ ]. This means evaluating all the possible template matchings from

= − + 1 to 1. Starting from [ ], there are no blanks to the right and so, no right completions. Therefore, [ ] will be zero if the template , is matched in , one otherwise.

For all the other [ ], their value will be zero if , matches in . Otherwise, their value is the sum of

the previously unmatched right completions ̂ ∙ 0 and ̂ ∙ 1 ( [ ̂ ∙ 0] + [ ̂ ∙ 1]). This way, [ ] will contain information about all the possible unmatched right completions.

Phase II: Generating strings unmatched by . In this phase, the process will occur inversely as of the Phase I. The detectors will be generated by creating -bit right completions of with the information stored in [ ]. As = ∑ [ ] is the size of the candidate detector set, each unique detector string is numbered from 1 to . By drawing random numbers , to find the unmatched is to find successively the ̂ ∙ with {0,1}. This is achieved by jumping from intervals into intervals until finding ∑ + [ ] = .

Example 1.1 Let = 4 = {0: 2 − 1} , = 2 and = { , , }, with = {0011}, = {0110} and = {1100}. It is not difficult to verify that only one detector can be generated,

namely {1001}. This implies that all bit strings that match the templates {10 ∗∗,∗ 00 ∗,∗∗ 01} are detectable as nonself ( ’), which in this case corresponds to 8 strings. Inversely, all bit strings from the set ∖ { ∪ ’} = = {0010,0100,0111,1110,1111} are not detectable, and considered as holes. If an additional string such as = {0001} was also part of the self set , no detectors can be generated and so, = ∅ and no nonself strings are possible to detect.

(25)

A pseudo-code for the linear time algorithm is presented as follows.

Algorithm 1.2: Linear Time Algorithm.

input : , , ∈ , where 1 ≤ ≤ , = ( ) and ∈ [0, 2 − 1] ∈ and { , } ⊂ output : detector set Phase I: Solving the counting recurrence 1 begin 2 for 0 ≤ ≤ 2 − 1 3 [ ] = 0,_1, if , is matched in S otherwise 4 endfor 5 for = ( − ), … , 1 6 for 0 ≤ ≤ 2 − 1

7 if , matches with any bit string of then

8 [ ] = 0 9 else 10 [ ] = [ ̂ ∙ 0]+ [ ̂ ∙ 1] 11 endif 12 endfor 13 endfor 14 end Phase II: Generating strings unmatched by S 1 begin

2 The size of the total candidate detectors : = [ ] , where [ ] denotes the number of unmatched -bit strings starting with the -bit binary string

3 for = 1, … , detectors

4 draw randomly ∈ {1,2, … , } 5

First interval ] , ] is calculated by finding such that

= [ ] < ≤ [ ] = 6 Determine the unmatched string by evaluating 7 for = 2, … , ( − + 1) 8 if ≤ ( + [ ̂ ∙ 0]) then 9 = 10 = 0 11 else 12 = ( + [ ̂ ∙ 0]) 13 = 1 14 endif 15 endfor 16 endfor 17 end

(26)

The time and space complexities depend largely on the construction of the matrix . In Phase I, the time is consumed essentially in filling those entries that match self strings ( (( − ) ∙ )) and filling the rest of the matrix. In Phase II, it takes time proportional to ( ( )) to find the first bits of the detectors, and ( ( − )) to complete the detector, for each of the detectors:

time: ( − ) ∙ + ( − ) ∙ 2 + ( ∙ ) space: (( − ) ∙ 2 )

Therefore, the linear time algorithm runs in time linear with the size of the self and detector sets ( & ). However, given that it has a high dependency in and , using long strings and matching thresholds represents a problem both for time and space. The linear time algorithm is limited to the

r-contiguous bits matching rule.

Generally, only a fraction of the candidate detectors is included in the detector set , mostly because of overlapping. The greedy algorithm solves this problem by choosing first detectors far apart, matching unmatched nonself strings as possible [2, 3, 5].

1.4 Analysis

In order to analyse the features of the general mechanism of negative selection proposed by Forrest and collaborators, experimental results were obtained by numerical simulations. These simulations were performed using the linear time algorithm for detector generation and an additional “exhaustive” matching algorithm for counting holes.

The exact expression for the failure probability is:

= ( − )⁄ (1.17)

However, given that holes are undetectable, a lower bound for (for each ) is:

≥ ⁄ (1.18)

Results from the simulations are presented in tables 1.2 and 1.3.

Table 1.2 shows how the probability of failure and the percentage of holes evolve when the parameters and are changed for a same self set. Results from self sets with a different number of elements ( =100 and =150) are presented. The self set with =150 was expanded from the =100 set, by adding 50 extra strings. Strings were drawn from random numbers between 0 and 29_-1. These same self sets will be used in chapter 4 to test cellular frustration algorithms. In this way the performance of the different approaches on the same conditions can be compared.

In Table 1.3, results obtained from self sets randomly generated from strings with 12 bits are presented. In order to analyse and eliminate the possible statistical variability from the previous example, 25 self set were tested and averages and standard deviations are presented. In these results the

(27)

dimension of the space was maintained relatively small ( ≤ 12) because we tested results considering the whole set of detectors that could be defined, and also we tested all possible strings in the universe.

Table 1.2 Experimental results for different matching thresholds , for two related self sets.

Strings of the self sets are previously defined within the range [0:29_-1].

holes Holes 9 9 100 0.000 0 29_-100 150 0.000 0 29_-150 8 0.107 44 272 0.191 69 175 7 0.340 140 89 0.674 244 24 6 0.864 356 6 1.000 29_{-150 0} 5 1.000 29_-100 ₀ _{1.000 2}9_{-150 0} 10 10 100 0.000 0 210_-100 150 0.000 0 210_-150 9 0.030 28 752 0.053 46 620 8 0.083 77 459 0.184 161 255 7 0.285 263 147 0.674 589 32 6 0.827 764 12 1.000 210_{-150 0} 5 1.000 210_-100 ₀ _{1.000 2}10_{-150 0} 11 11 100 0.000 0 211_-100 150 0.000 0 211_-150 10 0.014 28 1776 0.024 46 1644 9 0.021 41 1408 0.033 63 1102 8 0.071 138 856 0.130 247 447 7 0.253 492 281 0.614 1166 56 6 0.737 1436 24 1.000 211_{-150 0} 12 12 100 0.000 0 212_-100 150 0.000 0 212_-150 11 0.007 28 3824 0.012 46 3692 10 0.010 41 3456 0.016 63 3150 9 0.023 90 2740 0.036 141 2108 8 0.063 253 1690 0.115 453 877 7 0.162 647 558 0.476 1878 112 6 0.684 2732 48 1.000 212_{-150 0}

Table 1.3 Experimental results for different matching thresholds , for 25 random self sets of size = 100. Strings of the self sets are within the range [0:212_{-1]. Standard deviation values are in parentheses.}

holes 12 12 100 0.000 0 212_-100 11 0.002 (0.001) 6.8 (2.9) 3802.8 (2.9) 10 0.006 (0.002) 23.5 (7.0) 3360.6 (15.8) 9 0.020 (0.004) 78.0 (16.6) 2506.7 (41.5) 8 0.064 (0.010) 254.0 (38.8) 1236.6 (52.4) Holes 7 0.281 (0.039) 1121.8 (156.3) 252.6 (57.1) ≈ 6.6 0.460 ≈ 1840 ≈ 100 ≈ 6 0.849 (0.072) 3391.2 (288.0) 6.9 (6.0)

(28)

16 Th the st Howe prese numb obser Fig. 1 defined Co the la This c numb detec An with betwe larger the co detec perfo practi in a sp he previous r tring space (i ever this inc nted in Tabl ber of holes rvations are s 1.6 Results from d by the string l omparing the arger the self can be explai ber of discar tors, a larger nother analy = 12 drawn een 0 and 2 r higher de ontrary, bette tion perform ormances are ical applicatio parse space. results are ve increasing ) creases the n le 1.2, the sel s increases w ummarized in m table 1.2 con ength . Plots fo e results for t set, the large ined by the f rded detector number of h sis can be pe n from rando − 1 (Table etection perfo er results are mances are po required, th ons, because ery interesting , requires ge number of d lf set was co with , the n Figure 1.6 a

nsidering the fir or: a) size of the

the two self s er the number fact that when

rs during th holes are left, erformed by m numbers b 1.2). Figure ormances can e obtained in oorer in this en clustering general data g. First it is c nerating mor detection erro onfined to 29 probability a) & b). rst self set ( e complete detec sets (table 1.2 r of holes an n the numbe he censoring and hence comparing t between 0 an 1.7 b) show n be obtained n the more co case. Conseq g of data cou sets are very

clear from T re detectors ors, i.e. the n

different str of failure d

= 100, ∈ [0 ctor set; b) numb

2), it can be c d the smaller er of self strin process. Wi increases. the results o nd 2 − 1 (Ta ws a transition d with the mo onfined self s quently, we c uld be detrim y often comp Negative able 1.2 that to cover the number of h rings. As a re decreases (in 0, 2 − 1]) for ber of holes cre

concluded tha r the number ngs increases th a smaller btained using able 1.3) or fr n occurring w ore disperse set, when i can conclude mental. This c osed by clust Selection Al increasing th whole nons holes. In the esult, even th n percentage r different string eated.

at, for the sa r of detectors , the larger w r number of g =100 se from random when is va set of self st is small. How that if high could be imp tered points lgorithms he size of elf space. examples hough the e). These g space me ( , ), required. will be the f available elf strings m numbers aried. For rings. On wever the detection portant in dispersed

(29)

Nega Fig. 1. for the In ( lar whole Fig. 1. space b) low Fo estim ( = and f expon An dimen string metho tive Selectio .7 Results from e string space n Figure 1.8 it rge), and how

e nonself spa

.8 Results from

defined by th er bound of the

or a compar mated the num

= = 100) found that in nential fit and nother impo nsion increas gs can easily b ods as in th on Algorithm table 1.2 ( ∈ defined by = t is shown ho w simultaneo ace grows. In m table 1.3 consi he string length e failure probabi rison with re mber of holes ). I applied a this case is d found that ortant analysi es. This is pa be much sma he previous c s [0, 2 − 1]) an 12. Plots for: a) ow the numb ously the num

that case, the

idering the mea = 12. Plots fo ility. esults obtaine s when the nu smoothing s s close to 6.6 this number is concerns articularly imp

aller than the case, I estima

nd table 1.3 ( ∈ a) size of the com

ber of holes d mber of dete e probability

an of 25 random or: a) size of the

ed in chapte umber of det spline to the 6. To estimate would be how the pr portant for p e number of ated from ta ∈ [0, 2 − 1]), mplete detector decreases whe ectors that ne of failure dec m self sets ( e complete detec er 4 with a C tectors is the values of the e the number ~1840 (tab robability of practical appli possible strin able 1.2, the , considering the set; b) number o en more spec eed to be co creases with = 100 , ∈ [0 ctor set and the

Cellular Frus same as the e number of d r of holes in le 1.3, right). failure chan cations becau ngs: ≪ 2 number of

e self sets with of holes. cific detectors onsidered to (Figure 1.8b 0, 2 − 1]) for e number of hol stration algor number of se detectors on this case, I a nges when t use the numb . Using simi detectors re 17 = 100 s are used cover the b)). r the string les created. rithm, we elf strings table 1.3, applied an the space ber of self ilar fitting quired to

(30)

18 achiev is clea possib =0 order A confin overe maint An neces This too la Fig. 1 values string l

1.5 F

Th which match numb the gr cost o case, this le ve =0.01, f ar from this ble to extrap 0.01. This sho r. final remark ned self set estimation of tained and co nother intere ssary that on shows that to arge. .9 Increase in t of table 1.2 co length .

Final Rem

he NSAs dev h the self size hing threshol ber of detecto reedy algorith of such smal the size of t eads to inclu

for each valu plot that the polate that fo ows that this

k should be m in Table 1. f the actual n onsequently th esting remark average each o achieve fail

the size of the onsidering the f

marks

veloped by F e is small com ld compar ors is at least hm and for a ll failure prob the detector s ding many o ue of from 9 e number of or =32, abou s type of alg made here. Th .2. Hence, w number of de he qualitative k is the obs h detector wo lure probabil complete detec first self set (

orrest and co mpared to the ed to the str 3 to 4 times a ≈ 0.1). babilities is an set becomes verlapping d 9 to 12. The r detectors in ut 2.44 ∙ 10 gorithms bec he later analy we should ex etectors requ e nature of th servation tha ould detect on lities of the o

ctor set for a fa = 100, ∈ [ ollaborators h e dimension rings length larger than th Smaller value n unreasonab close to the detectors. Ano

results are pre creases expo

detectors ( come unpract

ysis was done xpect that th uired. Howev he conclusion t in the prev nly 2 differen order of 1%, ailure probability [0, 2 − 1]) for have an accep of the nonsel . Even thou he size of the es of , cou ble growth o size of the n other weakne Negative esented in Fig nentially with ) would be tical for spac

e using the v he former es ver, the expo n remains vali vious =32 e nt nonself str the number y about 1%. Re r different string ptable perfor lf space, and ugh, in the b e self set (as r uld also be ac f the numbe nonself space ess of the NS Selection Al gure 1.9 (blu h . For insta e necessary t ce dimension values obtaine stimates cou onential depe id. example, it w rings (2 / of detectors esults calculated g space defin rmance for th for a low val best case scen

reported in [2 chieved. How er of detector e. In these al SAs is that th lgorithms e dots). It ance, it is o achieve ns of this ed for the uld be an ndency is would be ≈ 1.76). is indeed d from the ned by the he case in lue of the nario, the 2, 3] using wever, the rs. In this lgorithms, he size of

(31)

the complete detector set increases exponentially with the size of the string space , parameterized by the string length - fig. 1.9.

The exhaustive search detector generation algorithm is very time demanding given that the censoring process is completely random. The linear time and the greedy algorithms, improve the detector generation phase but they are still slow because all the possible templates with ( − + 1) positions have to be matched.

The complete negative selection algorithm is still not efficient for two other reasons. First, in NSAs holes can only be counted a posteriori and this has a high computational cost. Secondly, the monitoring phase is not efficient for large detector set’s size. The monitoring requires checking all the detectors one by one which is very time consuming for reasonable detector sets. It should be noticed that in most cases monitored strings do not correspond to intrusions. Consequently, the typical case requires testing all detectors, which can be burdensome.

The matching rules provide a framework for easily matching strings with reduced matching requirements. Regarding the -contiguous matching rule (rcb), for a fixed string space , there is a trade-off between the number of detectors and the failure probability : for a higher matching threshold more detectors are necessary to cover the same nonself space ; for a lower , less detectors are necessary but more holes are created (higher ). This trade-off corresponds to tuning between the specificity and generality of the detector coverage. There is an “optimal” choice of parameters that produces an acceptable number of holes (low ) with a small detector set.

There are also some questions raised about the immunological relevance of these algorithms. Nevertheless, the main purpose of Forrest and collaborators is not to raise hypothesis on immunological foundations. According to the theory T cells would not bind self cells after the thymus. This, however, is not observed, and indeed a wide range of autoimmunity is observed.

Furthermore, an important problem in these approaches is that there is not homeostatic control of self. In NSAs holes are structurally maintained around self (fig. 1.4). Consequently, these structural faults could be a persistent problem that could be used by pathogens using molecular mimicry strategies. This means that if an intruder uses only self patterns or also patterns in holes, then it could proliferate without control. To minor this problem it would have to be considered an extra mechanism for monitoring or counting holes.

Another issue concerns the problem of choosing the best value of . It could be thought that the best strategy would be to consider detectors with different values of . This however raises the problem preferentially select detectors with low , that cover larger regions of space. Otherwise the system would end by having plenty of detectors that cover regions already covered by lower detectors. It would be necessary to have a mechanism of evaluating detectors overlapping, such as in the greedy algorithm. This, however, adds another level of computational complexity in the algorithm.

(32)

Negative selection algorithms can be classified between signature scanners and file-authentication programs [1]. There are some authors that tested specifically the exhaustive search algorithm for real applications of network intrusion detection systems. These authors used high values for , , and also larger coding alphabets, reporting that for some cases it would take years to generate a certain number of detectors [8, 9]. These authors also encourage the use of other algorithms or together with the NSA, such as, e.g., the use of positive selection, clonal selection, other matching rules, etc.

There are many other variations of the NSAs [5] namely, introducing mutation in detector generation, using crossover methods to obtain the holes, using different types of representations (e.g., “negative databases”), making use of other immunological structures, and so on. The field is prone to variations, but qualitatively different results should not be expected.

(33)

21

Chapter 2

Cellular Frustration

The conceptual framework of Cellular Frustration was first developed by de Abreu et al. [10] as a new phenomenological understanding of the adaptive immune system. This mechanism presents an explanation for self/nonself discrimination taking into account “cellular decisions”. Frustration in T cell interactions with self APCs provides a basis for the regulation of the immune responses.

In this chapter, the Cellular Frustration mechanism is presented extending the immunological concepts introduced in the previous chapter. This mechanism is compared with approaches on the same problem by Forrest and collaborators. A model for nonself detection is described by using the concepts of Cellular Frustration. An algorithm based on this model is developed in chapter 4.

2.1 Introduction

The problem of self/nonself discrimination has been tackled by many different theories. The best know approaches include negative selection, idiotypic networks, clonal selection or even the danger theory [4-6, 11, 12]. The common feature that some of these approaches share is that affinity is related to the complementarity of structures in ligand-receptor binding. They also share the knowledge that reactions are essentially instantaneous. These features are “appealing” on a computational point of view for Artificial Immune System (AIS) algorithms, given that they are easy to implement.

However, immune reactions require time: either during recognition processes, or for effector functions to take place [11]. The complex immunological synapse that is established when an APC forms a conjugate with a T cell, is a demonstration of those time lasting processes. The immunological synapse involves complex signalling and intermediate conformational processes preceding T cell activation [13]. The kinetic proofreading models are good examples of multi-step

(34)

22 Cellular Frustration

signalling pathways required for very specific processes, like DNA replication [14] or T cell activation [7, 13, 15].

T cell activation requires that the immunological synapse is preserved for some long characteristic time. Only after T cell activation can effector functions take place. So, in order not to kill self cells, there should be a process to avoid T cell activation when T cells are bound to APCs carrying self antigens. This is necessary to maintain self in absolute tolerance. Likewise, APCs carrying nonself antigens should establish long lasting interactions with T cells, leading to a pathogenic recognition process and triggering an immune response against the pathogen. This would require extreme reactivity only against nonself cells: to increase the probability of long-lasting interactions or to increase reaction speed.

Cellular Frustration accomplishes these two apparently incompatible tasks: to react very specifically with nonself, while maintaining non-reactivity (tolerance) with self. These features are achieved by cellular frustrated systems exhibiting decision-like interactions.

Consider the following example with 3 different cell types (A, B, C). Suppose that each type has different affinities and that they can be described by an Interaction List (IList), such as the one in fig. 2.1a). ILists can be regarded as lists of “preferred interactions” with the highest preferences at the first positions (for each cell type). Cells display different affinities for each different cell type, i.e., have more probability to react with some cells than with other cell types. If a cell is interacting with one other, their interaction can be interrupted if a different cell appears. In this sense, cells become frustrated if their immunological synapses are consequently interrupted to form new conjugates with another cell.

Within the Cellular Frustration framework it is assumed that cells perform decisions. These “decisions” are the output of optimization processes based on the avidity of cell contacts. Cell contacts are mediated by ligand-receptor interactions. This phenomenon has indeed been observed by Depoil et al. [16]. These experiments show that an immunological synapse can be changed towards a new APC, if this one provides a stronger stimulus.

The IList in figure 2.1a) shows how affinities are sensed by cells in a frustrated set. If cell A is prompted with one cell B and one cell C, cell A decides to interact with cell B. In the same way, cell B prefers to interact with cell C than with cell A, and so on. In this system no stable arrangement is achieved, given that affinities are not reciprocally matched. This leads to a frustrated interaction cycle where no conjugate of two cells remains bound for too long. This cycle is illustrated in fig. 2.1b). A cellular frustrated system typically exhibits a majority of short-lived interactions, because no stable configuration exists. The system of 3 cell types, with the configuration of the IList from fig. 2.1a), is maximally frustrated.

(35)

Ce a) Fig fru po sy sa th th giv to Ce lif im co typ A sy po po als sp ellular Frustr A B C g 2.1 Cellular ustration based ossible (frustratio Consider n ystem. Assum ame receptors hem when con he IList of fig

ven that its I owards cell D ell D assume fetimes, given The system mmune system orrespond to pe D is simil longer conju ystem remain ossible interac A multi-ste ossess similar so shown ot pecies’ concen ration A B C B C A C A B Frustration in on the IList of on cycle). now what wo me that cell D s. However, nfronted with g. 2.2a). In th IList remains D, by other ce es intermedia n that B has D m with 4 cell m is activated self APCs, T ar to A but, c ugate time of ns in a tolera ction results ep system ba rities with the ther useful ntrations [19] C A B ABC system. a a). Conjugates ould happen D is similar to it presents d h the decisio his system, c oppositely m lls. This happ ate positions D in the first types (ABCD d in the splee T cells and re corresponds f cells B with

ant state avo from single c ased on the e T cell kinet characteristic . However, t b) a) Interaction li are constantly d if another c o cell type A different ligan on between in cell A will rem matched with pens because in their ILis position of it D) can provi en and lymph egulatory T c to pathogeni cells D trigg oiding auto-im cell destabiliz ABCD dyna tic proofread cs namely, tu this topic is n

ist for ABC sy destabilized by s ell type D is in that it disp nds and so, t nteracting wit main maxima h cells B and e cells B and sts. Therefore ts IList (fig. 2 ide the basis h nodes [11] cells (T-reg), ic APCs, i.e., gers T cell act

mmunity. Fig ations within amics for T ding by McKe

unable speci not the subjec

ystem. b) ABC single cells, mak

s introduced plays the sam he other cell th A or D ce ally frustrated C. However, C do not fru e, BD conjug 2.2a)). for a model . The three c respectively. APCs presen tivation (fig. 2 gure 2.2b) ill n the ABCD s cell activatio ithan [15]. Th ficity contro ct of the prese system exhibit king no stable a in the previ me IList, i.e., l types can d ells. This is fe d (as in ABC , reactivity is ustrate so muc gates will ha l on how the cell types A, In this sense nting nonself 2.2b)). The r lustrates som system. on has also s his kind of sy olled by the ent work. 23 ting cellular arrangement ious ABC it has the distinguish eatured in C system), increased ch cell D. ve longer e adaptive B and C, e, the cell f peptides. est of the me of the shown to ystem has chemical

(36)

24 a Fig AB B) lon br Ac ran in fro in in no m co co se It co de 4 a) A D B B C C - - g 2.2 Cellular F BCD system and . Conjugate AB nger lifetimes w The Cellula reaking with ctually, frustr nge from ne spired by eco In the Cellu om the system teractions (fr the system onself (non-f mechanism fro onstants appli The Cellula oncepts introd Pathogen elf or nonself Cross-reac is also assum onjugate state ecisions are n B C C A D D A B Frustration in AB d two possible d B can be destab which lead to T c ar Frustratio all previous ration is a nat eural network ological matin ular Frustrati m dynamics. rustrated dyn [10, 11]. In frustrated dy om others, i ied. ar Frustration duced in chap mimicry: AP antigens. ctivity: each i med that im e than to be not necessarily b) BCD system. a) destabilization p bilized by a D s cell activation. T n mechanism theories pro tural occurrin ks to spin gl ng dynamics ion mechanis In this frame amics). This n the same w ynamics). On is that all ele

n framework pter 1, namel PCs should a immune cell c mmune cells h alone. Furth y matched by ) Interaction list pathways using s single cell. Ther The interactions m was propo oposed on th ng phenomen lasses [17]. E [18]. sm, the notio ework, self is dynamics is d way, cells tha ne of the m ements are t relies on som ly: all be similar can potential have high av hermore, cell y the other ce

for ABCD syst single cells, lead refore, conjugat are based on th

osed as an he immunolo non in many Even the very

on of self an s defined as th dependent on at keep long ajor differen treated as sim me assumptio r, independen lly interact an vidity. This m l interactions ells. tem. b) Schemat ing to conjugate e BD is more s e IList in a). explanation ogical self/n different syst y idea of Ce d nonself are he set of cells n the ligands g-lived interac nces that clea milar, having

ons that exten

ntly to wheth

nd react with means that, c s can be con

Cellular Fr

tics of all cell co e BD (containin stable than AB to these phe nonself discri tems. Examp ellular Frustra e emergent p s that keep sh and receptor actions are co arly distingui g the same nd the immu

her they are d

a large variet cells prefer to nflicting beca ustration onjugates of ng pathogen , producing enomena, imination. ples could ation was properties hort-lived rs present onsidered ishes this rules and unological displaying ty of cells. o be in a ause cells’

(37)

Cellular Frustration 25

Cellular decisions: cells can only interact with one cell at a time. It is assumed that cells perform decision-like interactions, directing the immunological synapse towards the cell from which they sense higher affinity.

Effector functions: immune responses take place after T cell activation. T cell activation can only be triggered if cells interact for a sufficiently long time (a characteristic time).

At this point, it is possible to draw some differences with the approaches by Forrest and collaborators and their immunological motivation.

Table 2.1 Comparison between important features of the two immunological frameworks: negative selection and cellular

frustration

Negative Selection

(Forrest et al.) Cellular Frustration (Abreu et al.)

Tolerance/

Auto-immunity guaranteed when detectors match no self cell guaranteed if all interactions are short-lived

Cross-reactivity _{affinity domain (limited)}each detector has an each detector can in principle interact _{with any other cell (global)}

Self everything that is presented in the maturation stage _{(before repertoire selection)} Nonself

Recognition

a string (peptide) that lies within a detector affinity domain (detector-dependent property)

long-lived conjugate (emergent property)

2.2 N

System and Maximal Frustration

Returning to the ABC system, explained before, this kind of system can be generalized for cell types instead of the initial 3 cell types. In a similar way, we now consider structurally equal systems by having only one cell type and different cell subtypes (denoted as clusters).

The following considerations assume only one cell per subtype each having different ILists. Thus, assuming that all cells play equivalent roles, their ILists for a maximally frustrated system are obtained by the following relationship:

( ) = ( + 1), = 1, … , − 1 (2.1)

with,

( ) = (2.2)

Here ( ) corresponds to the cell subtype ranked in the th position in the IList of cell subtype ; and [ ] represents an integer modulus . Systems with this particular arrangement of ILists are

(38)

26 Cellular Frustration

called circular frustrated systems. These systems can be represented as illustrated in figure 2.3, the ILists are sequential and interconnected. ILists from consecutive cell subtypes are obtained with minimal shifts assured by the continuity of cell elements.

This kind of system was constructed to be maximally frustrated because each cell’s IList is oppositely matched by the ILists of the other cells. For instance, if cell subtype puts cell subtype in the first position of its IList, then cell subtype has in the last position of its IList. Expression 2.3 holds the generalization that if cell subtype has cell subtype ranked in the th positions of its IList, then ranks cell subtype in ( − )th position of its IList:

= ( − ), = ( )

= 1, … , , = 1, … , − 1, ( ) = (2.3)

As a result of these ILists, no conjugate is stable: there is always one cell within a conjugate that can be destabilized by one other cell of higher affinity (rank). These systems maximize frustration. The maximum lifetimes displayed by maximally frustrated dynamics are the shortest lifetimes possible, for each given system. However, the complexity of interactions in these systems originates a distribution of short lifetimes instead of a single lifetime [11].

Any new cell introduced in the system with an IList not following the system’s IList, decreases the system’s frustration. This does not imply that all cells of the system become less frustrated. This can be straightforwardly understood by considering a simple example: this could be the case if we a new cell that is so little frustrated that it would be bound to a single cell in the system always. In that case, we would have a system composed of a permanent pair and the remaining cells whose ILs follow fig. 2.3a). These other cells would therefore remain highly frustrated and maintain short-lived interactions.

To further understand the concept that is behind nonself discrimination in this kind of systems, the concept of pathogen is clarified. Assume that the self system organizes itself according to some rule. This rule can be assigned by an education process or by a hierarchical system. Each cell can only see its own IList and not the system globally. A pathogen is an invader cell that does not belong to the system and therefore is nonself. The pathogen will try to find one of the system’s IList to be mistakenly identified as self.

This system can be represented by the following situation. Imagine some vault system that has different passwords assigned for every authorized user (self). Every user only knows its own password and not the other users’ passwords or how they were generated. An unauthorized user (nonself) that wants to access the vault will have to find one password that matches the security system as an authorized user. If the unauthorized user has no previous information about the rules of the password assignment, he has to generate random passwords.

Nonself detection in immune-inspired models

André Machado

Lindo

Detecção de elementos estranhos em modelos

inspirados em imunologia

André Machado

Lindo

Detecção de elementos estranhos em modelos

inspirados em imunologia

Nonself detection in immune-inspired models

List of Figures

List of Tables

Contents

Introduction

Chapter 1

Negative Selection Algorithms

1.1 Basic Immunology

1.2 I

Introduct

tion to Ne

egative S

Selection A

Algorithm

ms

1.3 Detector Generating Algorithms

1.3.1

1 The Exh

haustive

Search A

Algorithm

1.3.2 Linear Time Algorithm

1.4 Analysis

1.5 F

Final Rem

marks

Chapter 2

Cellular Frustration

2.1 Introduction

2.2

N

System and Maximal Frustration