Abstract: Despite numerous published reports of Quantitative Trait Loci (QTLs) for drought related traits, practical applications of such QTLs in maize improvement are scarce. Identifying QTLs of sizeable effects that express more or less uniformly in diverse genetic backgrounds across contrasting water regimes can complement significantly the conventional drought tolerance breeding efforts. We evaluated three tropical bi-parental populations under water stress (WS) and well-watered (WW) regimes in Mexico, Kenya and Zimbabwe to identify stable genomicregions responsible for grain yield (GY) and anthesis-silking interval (ASI) across multiple environments and diverse genetic backgrounds. Across the three populations, on an average, drought stress reduced the GY by more than 50% and increased the ASI by 3.2 days. We identified a total of 83 and 62 QTLs through individual environment analyses for GY and ASI, respectively. In each population, most QTLs consistently showed up in each water regime. Across the three populations, the phenotypic variance explained by various individual QTLs ranged from 2.6 to 17.8% for GY and 1.7 to 17.8% for ASI under WS environments and from 5 to 19.5% for GY under WW environments. Meta-QTL analysis across the three populations and multiple environments identified seven genomicregions for GY and one for ASI, of which six
In the same physical interval as that of the constitutive mQTL_GY_1a (1.05/0.06 at 161.07–183.29 Mb), a num- ber of studies earlier have reported QTL for GY and ASI, implying the significance of this region not only for WS conditions but also for optimal environments. Using RFLP markers in a F 3 population of tropical maize, Ribaut et al. (1997) identified a QTL on 1.06 for GY across WW and WS environments. Tuberosa et al. (2002) reported a SSR, csu61b, which is located between 180.71 and 181.19 Mb on chromosome 1 to be strongly linked with GY and root traits under both stress and optimal water conditions. More recently, Messmer et al. (2009) evaluating the RILs of CML444 9 Malawi identified a cluster of QTL on bin 1.06 related to GY and other yield contributing traits under drought as well as WW conditions in Mexican and African environments. A stable QTL for GY under WW conditions based on five Brazilian environments was detected in the physical interval of 91.46–185.02 Mb on chrmosome1 in yellow tropical maize germplasm (Lima et al. 2006). Similarly, Lu et al. (2010) using a F 2:3 population, identi- fied a QTL in bin 1.06 (164.55–195.05 Mb) for GY under WW conditions based on means across seven Asian envi- ronments. A recent meta-analysis involving 17 independent QTL mapping studies detected 3 strong genomicregions on chromosomes 1, 7 and 10, of which the mQTL region on chromosome 1 was delimited to the physical interval, between 178.87 and 180.72 Mb in bins 1.05 and 1.06 (Li et al. 2010), reinforcing the evidence for the constitutive effect of this genomic region.
Genome-wide association studies (GWAS) have defined over 150 genomicregions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn’s disease (CD) GWAS, we build protein–protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in line with observations in Mendelian disease.
This study provides the first BAC-based structural genomic information on the important cel- lulolytic fungus T. harzianum. The BAC library provided 12-fold coverage of the T. harzianum genome and permitted the rapid selection of genes and genomicregions associated with bio- mass conversion. We identified regions with a high concentration of CAZy genes in this fungus that were previously unknown. An analysis based on transcriptome data, together with the in- formation provided by the BAC library, permitted the screening of candidate genes that may be related to the major cellulases and hemicellulases in T. harzianum. Transporter genes and transcription factors could play important roles in the expression of CAZymes, as these genes are frequently located in co-regulated genomicregions, flanking CAZy genes. We also analyzed the expression profiles of major cellulase and hemicellulase genes in this strain, revealing a po- tential synergistic relationship among these genes under the three different analyzed condi- tions. To our knowledge, this is the first study focusing on the genomic context of biomass degradation genes from T. harzianum, which is a promising strain for use in
environmental variation. Genotype x Year (GY) interaction is a common feature for quantitative traits, and has been a subject of great concern for breeding programs since it may modify important genetic parameters such as heritability and genetic correlation among traits (Kearsey and Pooni 1996). Through the use of molecular markers, the GY interaction can be further dissected into components of Marker x Year (MY) interaction. MY interaction has great importance in marker-assisted selection since it may distort the expected results of crop genetic improvement (Liu et al. 2006, Emrich et al. 2008, Backes and Østergård 2008). When GY is significant but MY is not, stable genomicregions underlying a given trait are detected which become highly valuable for breeding programs (Kearsey and Pooni 1996).
al., 2005; Pacheco et al., 2005; Spínola et al., 2005a; Branco et al., 2006, 2008a, 2008b, 2008c) have been under- taken, with the aim of characterizing the genetic pool of the Azoreans. These studies report high genetic variability and heterogeneity in the Azorean population, as explained by the history of the settlement of the islands. Recently, Laan et al. (2005) proposed that the evaluation of DXS1225- DXS8082 haplotype diversity constitutes an efficient marker of population genetic history due to its low recom- bination rate. Therefore, in order to unravel possible differ- ences between mainland Portuguese and Azoreans, unob- served in previous works, we re-analysed the data published in Branco et al. (2008a) for the three Azorean geographical groups, as well as for the Azores archipelago and mainland Portugal. In addition, the present research, based on an analysis of the Xq13.3 non-recombining por- tion of the Y-chromosome (NRY) and HLA regions in São Miguel Islanders, also mainly aims at answering questions such as: What is the allelic distribution of HLA class I and II in this island population, and does it reveal the presence of a genetic structure? Does LD extent vary considerably between these three different genomicregions?
CM in the dog is very similar to a condition in humans called Chiari malformation I with a reported frequency of 1 in 1280 [26,27]. Similarly to CM in dogs, a strong association exists between the size of the skull and the development of the disease in humans [28–30]. CMI usually results from a volume discrepancy between the posterior cranial fossa and the neural tissue residing within it resulting in the displacement of the cerebellar tonsil through the foramen magnum. The etiology of CMI is thought to be multifactorial involving genetic factors that remain largely undetermined. Recently, a whole genome linkage study conducted in humans affected with CMI identified multiple associated genomicregions, none of which was syntenic to the CM-associated regions detected in the dog in our study. Results from genetic studies of human CMI should be interpreted cautiously as they are complicated by clinical heterogeneity of the disease and its multifactorial etiology. Additional parallel genetic studies in larger cohorts of human patients and in the dog model are needed to further investigate candidate CM regions identified in both species. It is clear from genetic studies in both humans and dogs that CM or CMI has a complex genetic architecture that would
Identifying regions of the genome that have undergone recent positive selection is central to understanding the causes of evolutionary diversification. Nevertheless, developing efficient and statistically robust methods for distinguishing genomicregions under selection from the neu- tral background expectation remains extremely challenging, particularly under complex, non- equilibrium demographic scenarios. The rapid rise in frequency of a new favorable allele typi- cally leads to a reduced diversity in flanking regions as linked neutral polymorphism accompa- nies the adaptive substitution in a phenomenon known as genetic hitchhiking . Many methods have been developed to detect such “selective sweep” signatures using genome-wide polymorphism data [2–4]. However, the distortions of the site-frequency spectrum (SFS) and/ or extended linkage-disequilibrium accompanying episodes of positive selection can be difficult to distinguish from that produced by neutral processes related to a specific demographic his- tory. For example, coalescent trees produced by population bottlenecks or founder events may be indistinguishable from those generated by selection [5,6], and in general, bottlenecks can generate long haplotypes that mimic those observed in selective sweeps . Furthermore, pop- ulation subdivision can produce counterintuitive and confounding effects [8,9]. Consequently,
We choose to focus on copy number amplifications, and identified 9 genomicregions of common amplification in an initial cohort of 80 urothelial carcinoma specimens, of which 73 gave reliable copy number information by MIP analysis. We then generated a set of MLPA probes to interrogate those nine regions in a validation cohort of 84 samples. We demonstrated that the performance of the MLPA analysis was robust on control blood DNAs, bladder cancer cell line DNAs, and frozen DNA samples from a subset (39) of the cancers initially evaluated by MIP analysis of FFPE DNA. We then performed the MLPA analysis on a separate validation cohort of 84 bladder cancer FFPE DNA samples. In the validation cohort, we found that all genomicregions showed evidence of amplification in two or more samples, with the highest levels of amplification seen for CCND1 and MDM2 (Table 6). The regions with the most frequent amplifica- tion were chromosome 1q23.3, E2F3-SOX4, and CCND1 (Table 6). Amplification was seen significantly more frequently in advanced stage tumors (Ta grade 3, or higher stage) than in early stage tumors (Ta grade 1 or 1–2).
In Europe, especially in Mediterranean areas, the sheep has been traditionally exploited as a dual purpose species, with income from both meat and milk. Modernization of husbandry methods and the establishment of breeding schemes focused on milk production have led to the development of ‘‘dairy breeds.’’ This study investigated selective sweeps specifically related to dairy production in sheep by searching for regions commonly identified in different European dairy breeds. With this aim, genotypes from 44,545 SNP markers covering the sheep autosomes were analysed in both European dairy and non-dairy sheep breeds using two approaches: (i) identification of genomicregions showing extreme genetic differentiation between each dairy breed and a closely related non-dairy breed, and (ii) identification of regions with reduced variation (heterozygosity) in the dairy breeds using two methods. Regions detected in at least two breeds (breed pairs) by the two approaches (genetic differentiation and at least one of the heterozygosity-based analyses) were labeled as core candidate convergence regions and further investigated for candidate genes. Following this approach six regions were detected. For some of them, strong candidate genes have been proposed (e.g. ABCG2, SPP1), whereas some other genes designated as candidates based on their association with sheep and cattle dairy traits (e.g. LALBA, DGAT1A) were not associated with a detectable sweep signal. Few of the identified regions were coincident with QTL previously reported in sheep, although many of them corresponded to orthologous regions in cattle where QTL for dairy traits have been identified. Due to the limited number of QTL studies reported in sheep compared with cattle, the results illustrate the potential value of selection mapping to identify genomicregions associated with dairy traits in sheep.
Models of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility to breakage across genomicregions, and the other is to suppose that susceptibility values are given. Without necessarily supposing their precise localization, we call “solid” the regions that are improbably broken by rearrangements and “fragile” the regions outside solid ones. We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It contains as a particular case the uniform breakage model on the nucleotidic sequence, where breakage probabilities are proportional to fragile region lengths. This is very different from the frequently used pseudouniform model where all fragile regions have the same prob- ability to break. Estimations of rearrangement distances based on the pseudouniform model completely fail on simulations with the truly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherent distance estimations, especially with the pseudouniform model, and to a lesser extent with the truly uniform model. This incoherence is solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragile regions is surprisingly small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairs of genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomicregions in the cell.
The majority of ZIKV sequences available were ob- tained using NGS (Barjas-Castro et al. 2016, Leguia et al. 2017, Metsky et al. 2017), and this methodology pro- vide valuable information on viral diversity, being piv- otal in the analysis of viral quasispecies (van Boheemen et al. 2017). However this tool may be cost effective in specialised core laboratories working with high qual- ity samples and bioinformatics support, a situation not commonly available in clinical and public health labo- ratories, especially in resource constrained settings. The use of cell culture isolates obtained from small serum samples and the nested RT-PCR followed by Sanger se- quencing presented here was a suitable low-cost meth- odology to sequenced relevant regions of ZIKV genome. Moreover, in the context of outbreaks, where high num- bers of samples need to be processed quickly and accu- rately, these types of tailored strategies can significantly impact operations (Leguia et al. 2017).
The significant rise in drug resistant strains of Mycobacte- rium tuberculosis has highlighted the need for new drug targets. Here, we present a novel method of defining genetic elements required for optimal growth, a key first step for identifying potential drug targets. Similar strate- gies in other bacterial pathogens have traditionally defined a set of essential protein-coding genes. Bacterial genomes, however, contain many other genetic elements, such as small RNAs and non-coding regulatory sequences. Protein- coding genes themselves also often encode more than one functional element, as in the case of multi-domain genes. Therefore, instead of assessing the quantitative requirement of whole genes, we parsed the genome into comprehensive sets of overlapping windows, unbiased by annotation, and scanned the entire genome for regions required for optimal growth. These required regions include whole genes, as expected; but we also discovered genes that contained both required and non-required domains, as well as non protein-coding RNAs required for optimal growth. By expanding our search for required genetic elements, we show that Mycobacterium tuberculo- sis has a complex genome and discover potential drug targets beyond the more limited set of essential genes.
Background. MicroRNAs (miRNAs) are short non-coding RNAs that regulate differentiation and development in many organisms and play an important role in cancer. Methodology/Principal Findings. Using a public database of mapped retroviral insertion sites from various mouse models of cancer we demonstrate that MLV-derived retroviral inserts are enriched in close proximity to mouse miRNA loci. Clustered inserts from cancer-associated regions (Common Integration Sites, CIS) have a higher association with miRNAs than non-clustered inserts. Ten CIS-associated miRNA loci containing 22 miRNAs are located within 10 kb of known CIS insertions. Only one CIS-associated miRNA locus overlaps a RefSeq protein-coding gene and six loci are located more than 10 kb from any RefSeq gene. CIS-associated miRNAs on average are more conserved in vertebrates than miRNAs associated with non-CIS inserts and their human homologs are also located in regions perturbed in cancer. In addition we show that miRNA genes are enriched around promoter and/or terminator regions of RefSeq genes in both mouse and human. Conclusions/Significance. We provide a list of ten miRNA loci potentially involved in the development of blood cancer or brain tumors. There is independent experimental support from other studies for the involvement of miRNAs from at least three CIS-associated miRNA loci in cancer development.
(light blue); PROT = protease (green); RT = reverse transcriptase (blue); RH = ribonuclease H (yellow); INT = integrase (purple). Numbers on the right are element lengths (in bp)............57 Figure 5: Schematic representation of some TE clusters (arrowed) found in four genomicregions of Passiflora edulis. Numbers on the right are genomic region lengths (in bp). Colors indicate different orders, as follows: red (LTR-RT), blue (LINE), orange (DIRS), and green (LARD).....................................................................................................................................58 Figure 6: LTR-RT elements of Passiflora edulis: a) Percentage of elements from Copia and
The T. equigenitalis genome contains six strain-specific regions (Regions 10–15; Figure 4 and Table S1) containing 91 unique CDSs; another 106 T. equigenitalis-specific CDSs are distributed over the T. equigenitalis genome (Table S1). Region 10 encodes five proteins, two of which are annotated as hemagglutinin, able to induce the agglutination of erythrocytes and thus to be involved in virulence in the phylogenetically related Bordetella genus . This region has therefore been classified as a hemagglutinin-related region (Table 2). Region 11 encodes six hypothetical proteins and five proteins potentially involved in transmembrane transport, including three ABC transporter-related proteins. It was thus classified as an ABC transporter-related region. Region 12 contains three putative efflux system transmembrane RND (Resistance-Nodulation-cell Division) proteins, previously deter- mined as being involved in virulence and resistance to antimicro- bial compounds . Region 13, composed of four hypothetical proteins and a protein containing a relaxase domain (pfam03432) potentially involved in the horizontal transfer of genetic informa- tion, has been classified as a region of unknown function. Region 14 is the longest T. equigenitalis-specific region, with 57 specific CDSs. It encodes type IV secretion system (T4SS). These systems are membrane-associated transporter complexes used by various bacteria to deliver substrate molecules to a wide range of target Figure 2. Venn diagram illustrating the number of putative
compelling is the break-up of co-adapted gene complexes. It is possible that in the first generation of hybrids, no problems arise because they all have the necessary complement of genes, but in the F2s the genes will have re-segregated and the problems begin. I used to suppose that finding the right balance between inbreeding and outbreeding, minimising the costs of both, is best achieved by careful mate choice. However, genomic imprinting achieves the goal of minimising at least one of the costs of outbreeding by ensuring that all the genes that are required for building an intricate structure such as the brain come from one parent. The more that this happens, the more it will add evolutionary momentum to imprint the important genes needed for that structure from the parent that had already acquired dominance. The linking process could involve the genes of either parent so long as those that work together are held together. The effects are likely to be especially great in the case of genes that are regulatory and expresses early in development. The young organism is particularly likely to be disrupted by the lack of co-ordination between regulatory genes, just as it especially likely to be disrupted by environmental agents. The idea is, then, that genomic imprinting reduces the costs of outbreeding. Whether or not this conjecture is correct, the ways in which genomic imprinting has been co-opted for specific uses will attract theorists for years to come.
Polymerase chain reactions were carried out in a final volume of 20 μL containing 15 ng genomic DNA, 0.2 mM dNTPs, 1.5 mM MgCl 2 , 0.5 μM of each primer, 1X Taq buffer (Invitrogen, California, USA), and 1 U of Taq recombinant polymerase (Invitrogen). Samples were subjected to the following thermal profile: 5 min denaturing at 94 °C and five cycles of three steps: 1 min denaturing at 94 °C, 1 min annealing at 35 °C, and 1 min elongation at 72 °C; for the following 35 cycles, annealing temperature was elevated to 50 °C with a final elongation step of 10 min at 72 °C.
Besides the analysis of the candidate genes in the 7q33 affected region, it is also important to take into account the patients described in DECIPHER database [ 19 ], with dele- tions and duplications that partially overlap the 7q33 affected region, summarized in Tables 4 and 5 . Regarding the dele- tions, there are two patients (DECIPHER 280233 and 331287) with small inherited deletions affecting only the EXOC4 gene. Even though for patient 331287 the submitters classified it as likely pathogenic, the phenotypic description of the transmitting progenitor is not provided. Additionally, we became aware of the existence of at least two more patients (unrelated, one with speech delay and the other with ID and hypotonia) carrying small deletions affecting only EXOC4 gene that are inherited from the presumably healthy parents (personal communication by Audrey Briand-Suleau, Cochin Hospital, Paris, France). Concerning the duplications, there are two DECIPHER patients (255520 and 251768) carrying duplications affecting EXOC4, inherited from normal parents. As mentioned before, in these cases, it is important to deter- mine if the duplicated region is located in tandem or not, in order to fully understand the impact of the duplication in the expression of the contained genes. For this reason, the inherited duplications in DECIPHER cases 255520 and 251768 must be interpreted with caution. In the literature, there are few reports of duplication affecting the 7q33 cytoband [ 2 , 13 ]. Although their size is significantly larger than that of the duplication in patients 5, 6, and 7, the patients with duplications in this region reported by Malmgren and colleagues appear to have a lighter phenotype than those with the corresponding deletion. As for the report of Bartsch and colleagues, both reported patients have a very severe presen- tation, which might be due to the duplicated region being very large, encompassing the entire genomic region from 7q33 until the telomere. The difference in size makes the cases reported in these two publications very difficult to compare with patients 5, 6, and 7.
The data deluge phenomenon is becoming a serious problem in most genomics centers, as it can be see by the growing number of the fully sequenced and re-sequenced genomes from large-scale projects such as the 1000 Genomes Project 2 , The Cancer Genome Atlas 3 , The 10k Genomes 4 , among many others. Moreover, the prizes that reward cheaper, faster, less prone to errors and higher throughput sequencing methodologies 5 , help to increase this scenario. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example for medium and long term storage. In fact, for several genomic sequences they attain results worst than the 2 bits per base. To face this, a competition has been proposed for the achievement of better compression algorithms 6 . A number of algorithms have been proposed for the compression of genomics data, but