• Nenhum resultado encontrado

Counting model

No documento a modeling approach. (páginas 81-84)

II.2 Modeling the distribution of recombination events

II.2.1 Counting model

Map functions such as Haldane’s and Kosambi’s, in equations II.3 and II.4, respectively, describe the relationship between the frequency of recombinants and the genetic length of chromosomes. However, following the advancements in our understanding of the molecular process of meiosis, mathematical models have emerged that address the double-strand break (DSB) process directly.

Counting models consider that DSBs are distributed randomly along chromosomes and their number follows a Poisson distribution of parameter y (the mean number of events). The impact of interference is measured by considering that any two consecutive COs should be separated on average by m NCOs (Foss et al., 1993) (figure II.10). In this class of models, interference depends on the genetic rather than physical length. While the

II.2. Modeling the distribution of recombination events 65 strength of interference is indeed controlled by the density of CO events (genetic map), it can take a wide range of values at the physical scale, in different organisms (Foss et al., 1993; Berchowitz and Copenhaver, 2010). In the model from Foss et al. (1993) the genetic sizes of all intervals are constant.

Figure II.10: Interference Models. The left panel depicts the beam-film demonstration of the mechanical stress model proposed by Kleckner et al. (2004). The beam (chromoso- mal axis; green), film (chromatin fiber; gray), flaws (CO precursors; black dots). Diagrams depicting the stress level are shown under each beam in which the x axis represents beam position and stress level on the y. The center panel depicts the polymerization model proposed by King and Mortimer (1990). Chromatids are shown in green (parent 1) and yellow (parent 2). Small light blue circles represent recombination precursors and CO designates are shown as larger circles marked with ’CO’. The interference polymer is shown as a large arrow emanating from CO sites, and CO precursors removed by the polymer are shown to the right accompanied with a dashed arrow. The right panel depicts the counting model proposed by Foss et al. (1993). Chromatids are shown in green (parent 1) and yellow (parent 2). Small light blue circles represent recombination precursors and CO designates are shown as larger circles marked with ’CO’. In this diagram, m= 3 and intervening NCOs between COs are outlined in a red box. From Berchowitz and Copenhaver (2010).

The counting model can also generate map functions. The general definition of the recombination frequency is the probability that half the number of chromatids contain at least 1 CO, R= 12P(#CO≥ 1). The numberk, of DSB events in an interval, has the Poisson probabilityP(#DSB =k, y) = ykk!e−y andP(#DSB > m) = 1−

k=m

X

k=0

P(#DSB =k, y).

The probability of 1 CO given the number of DSBs andm is:

P(#CO= 1,#DSB =k, m) = ( k

m+1, if k ≤m

1, otherwise (II.9)

Thus, the formula for recombination frequency given the parameter m is defined as:

R = 1 2

m

X

k=0

P(#DSB =k, y) +P(#DSB > m)

= 1−e−y

m

X

k=0

yk k!

1− k m+ 1

(II.10) In an interval of map length g Morgans, the mean number of DSB events, y per tetrad is y= 2(m+ 1)g (Foss and Stahl, 1995). By replacing the expression in equation II.10 a relation is established between R and g, leading to a new map function.

In addition to providing a mapping function, the counting model can also quantify interference along chromosomes. It is also called the chi-square model, as the inter CO distance follows a χ2 distribution with 2(m+ 1) degrees of freedom (Broman and Weber, 2000). Theχ2 distribution is a member of the Γ distribution family. The Γ distribution has two parameters: shape (ν) and rate (λ). In the case of the counting model, the shape parameter is the number of Poisson events (DSBs) needed to ensure a CO, m+ 1. The rate parameter is twice this same number, to account for tetrads 2(m+ 1). The density and cumulative distribution functions of inter CO distances are:

f(x|λ= 2(m+ 1), ν = (m+ 1)) = λν

Γ(ν)xν−1e−λx F(x|λ= 2(m+ 1), ν= (m+ 1)) =

X

k=m+1

e−λx(λx)k

k! (II.11)

When applied to data from Drosophila and Neurospora, with parameter m taking values 4 and 2 respectively, the model adjusts well to the data (Foss et al., 1993). However, in budding yeast many intervals separating successive COs were extremly short leading to incorrect estimations of m (Foss and Stahl, 1995). In many organisms, two types of COs have been identified: interfering and non-interfering (table I.1). The counting model was thus adapted to account for the two types of COs. This has resulted in the two-pathway model (Housworth and Stahl, 2003).

The two-pathway model considers that a fraction p, of COs are not subject to interfer- ence (m= 0). The inter CO distances for the interfering type is given by the Γ distribution as in equation II.11, with parameter λ= 2(1−p)(m+ 1) and ν =m+ 1. Given the series of inter CO distances g0, g1, ..., gn along a chromosome (whereg0 andgn are the distances between the start of the chromosome to the first CO, and from the last CO to the end of the chromosome, respectively), the algorithm considers all 2n possibilities to assign the n COs into the two types. The distributions of g0 and gn are calculated separately under the assumption of stationarity (the start and end of the chromosome do not influence the positions of the first and respectively last COs). The inter CO distances are further

II.2. Modeling the distribution of recombination events 67 divided in two sets: y0, y1, ..., yj for non-interfering, and z0, z1, ..., zk for interfering COs.

The relation between gi, yi, and zi is

n

X

i=0

gi =

j

X

i=0

yi =

k

X

i=0

zi =G, where G is the total genetic length of the chromosome. The probability of the inter CO distances for the two types of COs is calculated separately and their sum over all 2n possible divisions gives the probability of the observed gi sequence under the two-pathway model:

P(g0, g1, ..., gn|p, m) = X

(y0,y1, ...,yj)(z0,z1, ...,zk)

P(y0, y1, ..., yj|p,0)P(z0, z1, ..., zk|1−p, m) (II.12) The product of the above probabilities for a collection of meiotic products generates the likelihood of the model parameters, p and m, given the data. By maximizing the likelihood function, parameters have been estimated in S. cerevisiae (Stahl et al., 2004), A. thaliana (Copenhaver et al., 2002; Lam et al., 2005), maize (Falque et al., 2009), and humans (Housworth and Stahl, 2003; Fledel-Alon et al., 2009).

Another improvement to the counting model consists in considering thatm, the number of NCOs between successive COs, is not constant (Lange et al., 1997). For the Poisson- skip model, the random number of NCOs is chosen according to a Poisson distribution, sn. The number of skipped events is random at each run. The χ2 model is a special case of the Poisson-skip model, with sn = 1. The inter CO distribution for the Poisson-skip model has a cumulative distribution of:

X

m=0

sm

X

k=m+1

e−λx(λx)k

k! , which for sm = 1 gives the relation in equation II.11.

Given the small number of parameters to be estimated, counting models have been widely used to assess the strength of interference at the chromosome level. A recent model based on the χ2 model of (Foss et al., 1993) was proposed that integrates the condition of one obligated CO per chromosome (Falque et al., 2007). The forced initial CO (FIC) model starts by choosing the position of the first obligatory CO from an uniform distribution. Additional COs are generated towards each end of the chromosome according to the counting model. Data from mouse have been fitted by the FIC model, and proved to yield better estimates for the number of COs per chromosome than the standard counting model.

Counting models have the advantage of relying on easy to implement mathematical functions with few parameters. However, from a biological perspective, the model predicts an overall reduction in the number of COs and NCOs following a reduction in the number of DSBs (Berchowitz and Copenhaver, 2010). This is not the case for real data, as CO homeostasis ensures that CO rates are kept at high levels despite a decrease of DSB frequencies (Martini et al., 2006).

No documento a modeling approach. (páginas 81-84)