• Nenhum resultado encontrado

Genomic evidence for gBGC

No documento a modeling approach. (páginas 58-61)

I.3 Biased gene conversion (BGC)

I.3.3 Genomic evidence for gBGC

Under the model of gBGC, GC substitutions are favored, leading to an increase in the local GC-content of sequences, thus resulting in a strong correlation between the recombination rate and the GC-content (Meunier and Duret, 2004). Several genomic analyses favor

this prediction. Studies of single nucleotide polymorphism (SNP) in human populations reveal that AT→GC mutations segregate at a higher frequency than GC→AT in human noncoding regions, consistent with an explanation involving gBGC or selection (Lercher et al., 2002). Furthermore, the association between SNPs and recombination is indicative of an increase in G and C frequencies close to CO hotspots (Spencer et al., 2006). Comparison of human and chimpanzee sequences demonstrate that AT→GC substitutions tend to cluster in regions close to telomeres, characterized by high COR (Dreszer et al., 2007).

The pattern of substitutions in 1 Mb noncoding windows along the human genome is also indicative of a GC bias linked to high recombination rates (Duret and Arndt, 2008).

Moreover, strong correlations have been found between COR and GC-content in mammals, birds, turtle, nematode,Drosophila, paramecium, green alga and plants (reviewed in Duret and Galtier (2009)). Figure I.17 illustrates the strong correlations between these variables in human and chicken.

Chromosome length (Mb) Crossover rate (cM/Mb)

Current GC GC*

Crossover rate (cM/Mb)

R2=0.84 R2=0.35 R2=0.66

Crossover rate (cM/Mb)

R2=0.82 R2=0.81

Human Human Human

Chicken Chicken

A

D

B C

E

Crossover rate (cM/Mb) Current GC

Chromosome length (Mb) Crossover rate (cM/Mb)

Figure I.17: Correlations between chromosome length, crossover rate and GC-content in human and chicken autosomes. The stationary GC-content (GC*), is the GC-content that would be reached by a sequence under a constant substitution pattern. It is a statistics summarizing the matrix of substitution. Chromosome length and crossover rates are plotted in Log scale. Regression lines and Pearson’s correlation coefficients (R2) are indicated.

From Duret and Galtier (2009).

The GC bias in gBGC is also affected by non-allelic homologous recombination (NAHR) (section I.2.2). In mouse and human, multigene histone families which undergo frequent non-allelic recombination have also an increased GC-content compared to single gene families which probably do not experience NAHR (Galtier, 2003). This result holds true for the Hsp70 gene family in human and mouse (Kudla et al., 2004), the multicopies gene HINTW in birds (Backstr¨om et al., 2005) and the Bex gene family in mammals

I.3. Biased gene conversion (BGC) 43 (Zhang, 2008). An interesting example of the impact of recombination on the nucleotide composition comes from the Fxy gene (Galtier and Duret, 2007) (figure I.18). This gene is situated in the X-specific region in human, rat and Mus spretus. But in mouse, Mus musculus, it has been recently (less than 3 million years (Myr)) translocated such that it partially overlaps the pseudosomal region (PAR) on chromosome X. PAR is characterized by a high rate of recombination (Soriano et al., 1987), which has led to a rapid increase in the GC-content of the Fxy portion overlapping it. As hypothesized by the gBGC model, such an increase is the result of a high substitution rate, with all 28 amino acid substitutions in M. musculus being caused by AT→GC nucleotide substitutions.

12/20 2/0 1/163

Amino acid substitutions

Synonymous substitutions

Figure I.18: Evolutionary history of Fxy in mammals. The Fxy gene, 667 amino acids long, was translocated into M. musculus from an X-linked position to a new position, in which it overlaps the pseudoautosomal boundary (inset boxes). The time scale is given in millions of years. For each branch, the numbers of amino acid changes that have occurred in the 5’ and 3’ ends of the gene, respectively, are given. A strong increase in amino acid substitution rate occurred in the M. musculus lineage for the translocated fragment only.

For comparison, the estimated numbers of synonymous substitutions in the Rattus, M.

spretus and M. musculus branches are 12, 2 and 1 (respectively) for the 5’ end of the gene, and 20, 0 and 163 (respectively) for the 3’ end. From Galtier and Duret (2007).

In agreement with a relation between recombination and nucleotide composition, differ- ences in COR reflect differences in the GC-content. The length of chromosomes has been found to explain differences in COR among species (section I.2.4.1). Under the obligate CO condition, the COR is inversely proportional to the length of the chromosome. In chicken and zebra finch, species with a large panel of chromosome sizes, microchromosomes have high COR and are GC-rich, while longer chromosomes have lower COR correspond- ing to a decrease in the GC-content (figure I.17) (Chicken Genome Sequencing Project Consortium, 2004; Groenen et al., 2009; Backstr¨om et al., 2010). At the other extreme, the opossum has only 8 very long chromosomes, with a very small COR and low GC-content (Mikkelsen et al., 2007). Heterochiasmy has also an impact on the relation between COR and nucleotide composition. In humans, the females have more COs than males, however,

the male, rather than the female, COR is a better predictor of the GC content (Webster et al., 2005; Duret and Arndt, 2008). This puzzling result will be discussed in view of our results in chapter IV.

Despite such large amount of evidence in favor of gBGC, several observations con- stitute exceptions to this model. COR correlates negatively with the GC-content along chromosome 4 in Arabidopsis (Drouaud et al., 2006). A set of Y-linked, non-recombining genes have elevated values of GC (Eyre-Walker and Hurst, 2001). A strong correlation between recombination rates and GC-content is also found in yeast (Marsolier-Kergoat and Yeramian, 2009). However, in this species, the AT→GC substitution pattern is not correlated with recombination. This result has been interpreted in favor of the hypothesis that it is the high GC-content of sequences that promotes recombination, such as GC-rich regions might represent sites that favor a chromatin structure that is open to the recombi- nation machinery (Gerton et al., 2000; Petes, 2001; Blat et al., 2002; Petes and Merker, 2002).

No documento a modeling approach. (páginas 58-61)