Two ‘‘non-contiguous finished’’, mercury-methylating, sulfate- reducing bacteria were used in development of the method described herein to finish and close the genomes. The model organism for mercury (Hg) methylation, Desulfovibrio desulfuricans strain ND132, was isolated from the Chesapeake Bay, and generates high levels of toxic methylmercury (MeHg) [16,17]. The ND132 genome was sequenced by the DOE Joint Genome Institute (JGI) to support a comparative genomics approach in identifying genes responsible for Hg methylation [18,19]. Se- quence was generated using Illumina and Titanium 454 DNA technologies [20,21] and provided 1226 coverage and 406 coverage, respectively. Sequencing was followed by extensive standard and ‘‘high GC’’ finishing that produced a genome consisting of one scaffold and six contiguous segments (GenBank CP003220.1). The genome of D. africanus was sequenced with similar goals , and similar effort resulted in one scaffold and one contiguous segment separated by a single gap (GenBank CP003221.1). A process is presented herein to resolve such recalcitrant regions. The ability to sequence all nucleotides within genomes eliminates gaps that could contain open reading frames, ensuring accurate genome annotation. Faster and less laborious procedures for complete genome closure will allow for more confident comparative and functional genomic studies that can benefit many groups facing similar situations.
Many organisms present in vertebrate microbiomes are not culturable; therefore, high throughput sequencing of 16S ribo- somal RNA gene amplicons is often used for characterization of natural communities. However, 16S ribosomal RNA gene sequencing only provides information about identity and abun- dance of community members without considering the substantial additional information available in the bacterial genomes. Additionally, 16S rRNA gene methods cannot be used to simultaneously detect both bacteria and the non-bacterial Figure 8. MBD-Fc separates human DNA from human-malarial DNA mixtures. (A) Graph of the percentage of 75 bp Illumina reads mapping to the Plasmodium falciparum reference sequence in the mixture before enrichment (Unenriched), after enrichment (Enriched) and the bound fraction following wash and elution as described above (Bound). (B) Evenness of coverage analysis metrics show enriched malaria reads (Enriched Plasmodium) in line with pure malaria DNA reads (Plasmodium Only). Unlike the unenriched input sample (Human + Plasmodium) showing 60% of the Plasmodium genome uncovered (zero depth), enriched sample (Enriched Plasmodium) showed even coverage of the genome with no regions lacking coverage. The amount of Plasmodium DNA retained in the pellet (Bound Human) following wash and elution was insignificant. (C) GC- content and bias analysis. No base bias was detected in the enriched sample. Average GC content of the enriched sample (Enriched Plasmodium) matches the pure malaria sample (Plasmodium Only) and is very closeto the theoretical GC coverage (Theoretical).
Infectious disease outbreaks often involve isolation of the causative agent in multiple laboratories within a country or even from multiple countries. Early detection of out-breaks thus, often requires rapid comparison of data from different laboratories. Next-generation sequencing shows great promises to improve the routine characterization of infectious disease agents in microbial laboratories and sequencing data are attractive because they both provide high resolution as well as a standardized data format (the DNA sequence) that may be exchanged and compared between laboratories and over time. A number of different sequencing technologies are however, available and more are expected to become available in the future. Thus, the problem with systematic biases in SNP calling between platforms may be a problem especially when, as often the cause in outbreak detection, it is necessary to identify clusters within highly similar strains.
DNA samples for metagenomics were prepared for 150 bp and 100 bp single-end sequencing using the Illumina GAIIx and HiSeq 2000 instrument (Illumina, San Diego, CA), respectively. Numerically coded aliquots of approximately 0.5–1 mg DNA per sample were used to create sequencing libraries. First, genomic DNA was fragmented using a Covaris TM S220 Sonicator (Covaris, Inc., Woburn, MA) to approximately 300 base pairs (bp). Fragmented DNA was used to synthesize indexed sequencing libraries using the TruSeq DNA Sample Prep Kit V2 (Illumina, Inc., San Diego, CA), according to manufacturer’s recommended protocol. Cluster generation was performed on the cBOT using the TruSeq PE Cluster Kit v3 – cBot – HS (Illumina). Libraries were sequenced with an Illumina HiSeq 2000 at Nationwide Children’s Hospital (NCH) Biomedical Genomics Core (Colum- bus, Ohio) using the TruSeq SBS Kit v3 reagents (Illumina) for paired end sequencing with read lengths of 100 base pairs (bps) (200 cycles) and at CosmosID with an Illumina GAIIx for 150 base pairs (bps) single read using the TruSeq SBS Kit v5 reagents (Illumina). Primary analysis (image analysis and basecalling) were performed using HiSeq Control Software (HCS) version 18.104.22.168 and Real Time Analysis (RTA) version 1.13.48. Secondary Analysis (demultiplexing) was performed using Illumina CASAVA Software v1.6 Post processing of GAIIx reads was performed with RTA/SCS v22.214.171.124 and CASAVA 1.8.0 software. High through- put sequencing reads were quality filtered using the fastq_quali- ty_filter program provided with the FASTX-Toolkit (http:// hannonlab.cshl.edu/fastx_toolkit/index.html) (v. 0.0.13). Only those reads with a quality score $17 for at least 80% of the read length (i.e., probability of correct base call ,98%) were retained. Ion Torrent (Life Technologies, NY) sequencing was also performed using amplicons specific to the V4 region of the 16S rRNA gene. Sequence reads are available under NCBI BioProject ID PRJNA231652.
The KEGG analysis indicates that a large number of expression changed genes participate in metabolic pathways, especially in lipid and xenobiotics/drugs metabolism (Fig. 1C and S2 Table). Genes that target peroxisome proliferator- activated receptor (PPAR), proliferation and apoptosis signaling pathways were also significantly up-regulated (Fig. 1C and S2 Table). Furthermore, the mRNA expressions of Rad51b, Rad51 and Rad51ap1 were significantly increased, indicating disturbances in homologous recombination (HR) in the liver of mice exposed to TCE (S2 Table). For genes involved in the regulation of DNA methylation, only Uhrf1, which serves as a fidelity factor for the maintenance of the DNA methylation pattern, showed over two fold significant expression change (S2 Table). Three key genes responsible for DNA methylation or demethylation (Tet2, Dnmt3a and Dnmt3b) showed significant but less than 2 fold expression change (S2 Table).
Internal transcribed spacer (ITS) refers to the spacer DNA (non-coding DNA) situated between the small-subunit ribosomal RNA (rRNA) and large-subunit rRNA genes in the chromosome (Coleman, 2007). There are two ITS’s in eukaryotes; ITS1 is lo- cated between 18S and 5.8S rRNA genes, while ITS2 is between 5.8S and 25S (in plants) rRNA genes.
400 µM of dGTP, 400 µM dATP, 400 µM dCTP, 800 µM dUTP, uracil DNA glycosilase (UDG)], and 5.5 µL of distilled water free from RNAses (RNAse/DNAse free water system, Merck-Millipore, Darmstadt, Germany) were used. The qPCR reactions were conducted in a thermal cycler iQ5™ Bio-Rad (Biorad™, Hercules, USA). Each reaction was composed of a denaturation cycle of 95 °C by 10 min. followed by 40 cycles composed of one step of 95 °C for 20 s, and combined annealing/ extension steps at 55 °C for one minute. After, a melting curve was built to check the specificity of amplification products. For generating standard curves, 10-fold serial dilutions of standard controls from 10 -1 to 10 -5 were
Cases were selected from the 1,001 BP cases and 1,033 controls of European-American descent genotyped through the GAIN consortium by the Bipolar Genome Study (BiGS) . All cases were interviewed with the Diagnostic Interview for Genetic Studies (DIGS) and best-estimate diagnoses were made by two research psychiatrists or PhD psychologists. Among the BP cases, we initially selected for sequencing the 189 subjects from the GAIN BP sample who had a lifetime history of mood- incongruent psychosis as previously defined . Briefly, subjects were classified as cases with mood-incongruent psychotic bipolar disorder if they had a lifetime history of running commentary auditory hallucinations, or passivity delusions such as delusions of being controlled, or delusions of thought insertion, withdrawal, or broadcasting. Subjects were also included if their psychotic symptoms during their most severe depression or mania were judged by the interviewer to be ‘‘inconsistent’’ with typical depressive or manic themes. Of the 189 subjects, one subject was sequenced in duplicate, and DNA for one subject was unavailable, leading to a final count of 188 subjects sequenced across the ERBB4 gene.
were selected for complete sequencing. After checking the quality of sequences, 12 ambiguity sequences removed from final analysis. 641 complete mtDNA sequences were included in the final analysis for the present study and the results of 97 sequences published elsewhere [24,43,75]. Complete sequencing was done using 24 pairs of both forward and reverse primers . Sequences were assembled, and edited using SeqScape 2.5. Mutations were scored relative to the revised Cambridge Reference sequence . Deviations from the rCRS were confirmed by manual checking of their electropherograms. Phylogenetic relationships among the sequences were determined by Median-joining net work analysis with the help of Networking 4.1 software. Most parsimonious trees of the mtDNA haplogroups were reconstructed manually following a parsimony approach, and confirmed by the program Networking 4.1. The founder ages and time of TMRCA have been calculated as implemented in . The age of the founder mtDNA type has yielded a time estimate for its arrival in the continent. It includes the ancestral nodes that were shared by its variants in the tree. The ages of haplogroups M are estimated from 736 lineages based on mutation rate 1.2660.08610 28 . The ages also calculated by using substitution rate estimate for protein-coding synonymous change of 3.5610 28  manually using Rho estimate . The variance of Rho was estimated  for both the methods. Nevertheless, all ages calculated without evidence to sustain the assumption of the molecular clock mean that estimation of the associated error values  is only an approximation. AMOVA was performed to evaluate the amount of genetic structure among the tribal population using Arlequin var 3.11 .
Clinical samples were taken from teeth with pulp necrosis that were diagnosed by clinical and radiographic analyses, in addition to pulp sensibility tests. All selected patients failed to present acute periapical symptoms at the appointment time. Teeth were isolated using a rubber dam followed by a complete asepsis, as previously described (Brito et al., 2007). Cleaning and shaping of the root canals were completed using ProTaper NiTi files (Dentsply Mailleffer) in conjunction with 5.2% sodium hypochlorite. The clinical procedures were performed as follows. Briefly, the samples were collected immediately after root canal cleaning to characterize the mRNA expression profiles of cytokines, chemokines and T cell genes. After cleaning and drying, three paper points were introduced into the root canal, passing through the root apex (2 mm) for one minute. After withdrawal, the paper points were cut 4 mm from the tip and dropped into a microcentrifuge tube, and the samples were stored at - 70 o C. Using this procedure, RNA was extracted from the periapical interstitial fluid. No endodontic dressing was inserted into root canals. The coronal accesses of the teeth were restored with eugenol-based cement. Seven days later (Day 7), the teeth were opened and sampled to characterize the cytokine and T cell expression during the healing process (restrained root canal bacterial load). In teeth with multiple canals, the first (Day 0) and second (Day 7) samples were collected from the same canal. At this time, no teeth presented clinical signs or symptoms, and root canals were filled by lateral condensation technique.
conventional biological treatments. On the other hand, considerable attention has been paid towards the development of new strategies for the biological treatment, such as anaerobic reactors aiming at the conversion of organic matter into biogas (Kennedy et al., 1991). For instance, the anaerobic bath reactor has simple operation regarding instrumentation, and it is also an important tool for the fundamental research focusing several important aspects of anaerobic biodegradation (Zaiat et al., 2001). Biomass retention within such reactor results from the growth of microorganisms attached to a support material or from self-immobilization process such as granules that remain in the interstices of the polyurethane matrices by the formation of biofilm (Varesche et al., 1997). It has been reported that the immobilization process enhances the thermo- stability of the system by providing an extra biomass buffer for a sudden change in temperature (van Lier, 1996). The largest problem regarding the applicability of thermophilic reactors is the availability of thermophylic anaerobic sludge for the reactor’s inoculation and start-up procedure (Ahring, 1994). Therefore, the choice of a mesophilic sludge with high microbial diversity is crucial to ensure the development of a thermophilic biomass. Chen (1983) evaluated the metabolic adaptation of mesophilic anaerobic sludge to thermophilic temperatures and from that only 9% were thermophiles and 1% were obligate thermophiles from the total bacterial population in the mesophilic reactor.
Our study presents several limitations. First, despite using several DNA extraction methods in parallel to detect as many microbial groups as possible, we cannot exclude biases in the proportions of microbiota members due to varying efficiencies of DNA extraction. This issue will require the analysis of samples spiked with a precise amount of a given species and/or the analysis of large numbers of samples known to contain a given amount of bacteria and fungi, as determined by another reference method, such as CFU determination. Such experiments will help to assess extraction efficiency for different microorganisms, and to improve the procedure. Second, uneven sequencing coverage resulting from varying GC content may have led to incorrect estimation of the abundance of some members of the microbiota, although the Illumina technology has been reported to be one of the less biased in this respect . Third, the shortness of the reads may have provoked some imprecise classifications which may explain differences in the proportions of some genera between the three methods of classifications, e.g. Neisseria spp. and Haemophilus spp. in sputa 1 and 2, respectively. Fourth, our WGS procedure also sequenced DNA from dead cells, which may mask variations in microbiota composition. This can be avoided by the use of propidium monoazide (PMA), which only destroys the DNA of dead cells because this toxic compound is excluded from viable ones . Comparison with and without PMA treatment might prove useful in the future to understand CF microbiota evolution. Another limitation is that we did not look for reads from DNA viruses. Viruses may play an important role in CF lung disease , and we plan to analyse them in the future. Finally, the use of spontaneous sputa rather than bronchoalveolar lavage (BAL) might constitute a limitation because of a possible contamination of sputa by upper-airway flora. There is an ongoing debate with conflicting results about the sample to use in order to obtain the best representation of lung microbiota: spontaneous sputum, induced sputum, throat swabs, and/or BAL. BAL presents the major problem in that it generally cannot sample all the different
The ultimate goal of genome and metagenome analysis is the biological interpretation of the genome sequences in terms of biochemical capabilities of organisms and their niche-specific adaptations including generation of testable hypotheses about their physiological characteristics. This process entails associating genes with functional roles which describe their enzymatic activities, involvement in various macromolecular interactions and regula- tory processes. These functional roles are interpreted in relation to the functional context in which these genes operate and which can be represented by pathways, ontologies or other types of functional classifications. Several genome analysis resources, such as SEED , MicrobesOnline , PATRIC , KEGG , and MetaCyc  support biological interpretation of microbialgenomes and/or metagenomes by integrating diverse data ranging from nucleotide and protein sequences to various catalogs of protein families and functional roles, to databases of chemical compounds and reactions. Most of these resources maintain computational pipelines that assign functional roles to genes and infer the presence of reactions and pathways. Some resources, such as MicroScope , also support manual data curation by domain experts. Due to the diversity of data models, annotation procedures and curation techniques employed by these resources, the results of analyzing the same genome or metagenome may vary greatly between resources. Finding an explanation for these discrepancies often represents a challenge for scientists due to the use of resource-specific object identifiers (e.g., resource-specific
Human cytomegalovirus (HCMV) is a ubiquitous virus that can cause serious sequelae in immunocompromised patients and in the developing fetus. The coding capacity of the 235 kbp genome is still incompletely understood, and there is a pressing need to characterize genomic contents in clinical isolates. In this study, a procedure for the high-throughput generation of full genome consensus sequences from clinical HCMV isolates is presented. This method relies on low number passaging of clinical isolates on human fibroblasts, followed by digestion of cellular DNA and purification of viral DNA. After multiple displacement amplification, highly pure viral DNA is generated. These extracts are suitable for high-throughput next- generation sequencing and assembly of consensus sequences. Throughout a series of validation experiments, we showed that the workflow reproducibly generated consensus sequences representative for the virus population present in the original clinical material. Additionally, the performance of 454 GS FLX and/or Illumina Genome Analyzer datasets in consensus sequence deduction was evaluated. Based on assembly performance data, the Illumina Genome Analyzer was the platform of choice in the presented workflow. Analysis of the consensus sequences derived in this study confirmed the presence of gene-disrupting mutations in clinical HCMV isolates independent from in vitro passaging. These mutations were identified in genes RL5A, UL1, UL9, UL111A and UL150. In conclusion, the presented workflow provides opportunities for high-throughput characterization of complete HCMV genomes that could deliver new insights into HCMV coding capacity and genetic determinants of viral tropism and pathogenicity.
A large variety of organisms are capable of synthesizing hard matter in a process called biomineralization . The transfor- mation of a genetic blueprint into minerals such as, for example, calcium phosphate in bones and calcium carbonate in eggs or seashells provides a mechanical support for organismic growth and protection against predators, respectively. Iron oxides formed by fishes and birds provide them with magnetic properties used for magne- toreception and orientation [2,3]. The biomineralization processes are remark- able for numerous reasons: organisms, contrary to engineers, have to form these biological materials with a limited subset of biologically available chemical ele- ments and at physiological conditions. Still, these reduced means are not at the detriment of their function, which often surpasses man-made materials based on equivalent elements . Therefore, un- derstanding how biomineralizing organ- isms process chemical elements based on their genetic program is of primary interest. However, the biological mecha- nisms behind biomineralization have re- mained unclear, partly because of limited genetic knowledge: model organisms are limited to a few unicellular organisms [5,6]. Therefore, the question has arisen of what genetic approach to use to get genetic information about the large ma- jority of organisms that have remained intractable.
The Jinhua and Pietrain breeds are significantly dif- ferent in carcass and meat quality phenotypes. As a Chinese domesticated breed, the Jinhua pig has a reputation for its thin skin, fine bones and tender meat, but has higher BFT and IMF content than other breeds. In contrast, the Pietrain type, which has a relatively lower BFT and IMF content, is famous for its high quality and high proportion of lean meat vs. fat content, and has been used worldwide as a sire breed. Observed differences in carcass and meat quality pheno- types are probably related to protein accretion and lipid synthesis (Renaudeau and Mourot, 2007). In this regard, it has been suggested that certain genes may play important physiological roles in fat-related pathways (Agarwal and Garg, 2006). Therefore we hypothesized that fat-related genes that are at considerable linkage disequilibrium may produce distinct phenotypes in a Pietrain x Jinhua F2 (PJF2) population. In this study, we found that genotype and allele frequencies for two analyzed polymorphisms dif- fered between the Chinese native pig and the European pig, indicating that the two types of breeds had been subject to different genetic selection regimes.
We also investigated intraspecific rDNA ITS polymor- phisms of isolates from 17 Candida species and 1 non- Candida species (Lodderomyces elongisporus). Some au- thors have addressed the degree of variability within the rDNA ITS region among fungi and have also observed higher intraspecific diversity within the ITS1 region than within the ITS2 region [50,51]. Based on haplotype and network analysis, a high intraspecific variability of ITS sequences was observed in species that primarily represent endogenous sources of infection, including C. albicans, C. tropicalis, C. glabrata and C. lusitaniae . Species with high genetic diversity are most fre- quently human commensals, and this finding could explain the existence of additional genetic adaptation within normal microbiota with older evolutionary ori- gins. Although the natural mode of reproduction of C. albicans is known to be clonal, other mechanisms that increase the genetic variability of this species could occur, including recombination [52,53]. Pfaller et al.  have tested 47 C. lusitaniae isolates, obtaining 28 differ- ent karyotype profiles and 25 different types of restric- tion endonuclease analysis of genomic DNA (REAG) profiles. Our data confirm the great diversity among C. lusitaniae haplotypes, possibly indicating the exist- ence of a non-clonal form of propagation. In contrast, less intraspecific variation was observed in the isolates of C. parapsilosis and C. orthopsilosis, in which the primary mode of infection is thought to be exogenous . Previous studies have shown a lower sequence variability of C. parapsilosis (sensu stricto) compared with C. orthopsilosis and C. metapsilosis isolates [56,57].
A number of key results of the study were derived from our automated pipeline. Correlation analysis against clinical annota- tion led to a few significant findings: The mutation rate in each melanoma sample was identified and found to vary widely between tumours, where melanomas arising in severely sun damaged skin have significantly higher mutation loads than non- severely sun damaged melanomas. BRAF/NRAS wild-type tu- mours were also found to have a higher average mutation rate compared to BRAF/NRAS mutant tumours. Furthermore, transi- tion/transversion analysis led to a novel finding that tandem CC. TT/GG.AA mutations (UV damage signature) were more common in tumours arising in severely sun damaged skin and in BRAF/NRAS wild-type tumours. Pathway analysis suggested that potentially actionable mutations in wild-type tumours, including NF1, KIT and NOTCH1, were spread over various signalling pathways. Importantly, TREVA has been successful in the molecular subtyping of melanomas, which may direct novel therapeutic options for BRAF/NRAS wild-type patients.
In this study, 68 rare exonic variants in 68 genes were identified. Of these genes, one gene (TMEM132B) was significantly differentially expressed in IA versus control tissue. Further studies are needed to confirm and explore the TMEM132B variant, as well as the possible con- tribution of the other 67 variants. Replication and/or meta-analysis with similar sequencing studies using larger sample sizes could be used to gather further evidence for specific genes on this list. Additionally, a subset of these variants, which can be prioritized through any of the methods discussed in this study, could be explored through functional studies in models where vascular phenotypes can be easily observed, such as zebrafish. Targeted gene editing, such as through the CRISPR-Cas system, could help test whether a given variant disrupts the normal functioning of the relevant gene and whether such a disruption leads to a phenotype of interest. Ultimately, such a model should also enable investigation of whether the disrupted phenotype can be rescued by reintroduction of the wild type allele or interference with the variant allele. For comprehensive exploration of the variants identified in this study, multiple methods of ex- perimental validation may be necessary. This study represents a necessary first step in the eval- uation of role of rare variants in a common complex disease. Further evaluation in other familial and sporadic samples, as well as multi-ethnic samples, will be essential.
The geographic distribution and taxonomy of lemurs is still under debate, and this is true for mouse lemurs (Weisrock et al., 2010; Markolf et al., 2011). Moreover, several regions of Madagascar are known to harbour more than one mouse lemur species. As noted above only two individuals from the Daraina region have been sequenced to date and both have been assigned to the M. tavaratra species together with individuals from surrounding regions (Weisrock et al., 2010). In order to determine whether our samples belonged to the same clade or species, phylogenetic reconstructions were carried out. Four mtDNA sequences belonging to four mouse lemur species (M. tavaratra, M. mittermeieri, M. sambiranensis and M. simmonsi) downloaded from GenBank (NCBI) were used as the reference sequences. These four species were specifically selected based on the fact that they are geographically distributed around and closeto the Daraina region (Mittermeier et al., 2010; Weisrock et al., 2010). Loci cyt b and COII were concatenated for all the sampled individuals and for the GenBank sequences. (see Annex, table 3 for details on GenBank sequences)Thehylogenetic tree was drawn with the aim to find out to which group our samples fit best. Phylogenetic trees per loci were also performed in order to avoid lost of information due to short sequences or missing sites (see Annex – figures 1-3).