RNA-Seq transcriptome data for teleost species with ARTs, no clear pattern emerges. Whereas in some species sneaker males exhibit the most distinctive transcriptome (e.g. Lepomis macrochirus (Partridge et al., 2016); Tripterygion delaisi (Schunter et al., 2014); present study), in other species nest-holder males are the most differentiated phenotype (e.g. Symphodus ocellatus (Nugent et al., 2016), Thalassoma bifasciatum (Todd et al., 2018)). Interestingly, in the bluehead wrasses (Thalassoma bifasciatum) the authors show not only that the dominant terminal-phase males are the most differentiated phenotype at the forebrain level, but also a greater gonadal investment is made by sneaker males that entailed pervasive down-regulation of androgenesis genes (Todd et al., 2018), which is consistent with low androgen production in males lacking well-developed secondary sexual characters. Moreover, the lists of differentially expressed genes for functionally equivalent phenotypes (e.g. sneakers) across species do not share significant numbers of transcripts and genes, suggesting that ARTs may have evolved in different species through species-specific genetic architectures. This might be partially explained by the presence of distinct developmental windows and differences in shifts for behavioural traits among the studied species so far. For example, in Tripterygion delaisi, if the opportunity to acquire a nest arises during the breeding season, a sneaker male is capable of change into the territorial phenotype, which involves changes in body colouration and behavioural repertoire (De Jonge and Videler, 1989; Wirtz, 1978). Contrarily, in the peacock blenny, this shift in behaviour and morphology takes a longer period of time and usually occurs between breeding seasons, with the appearance of an intermediate developmental male morph (i.e. transitional males), where a trade-off between somatic growth and reproduction is present (Saraiva et al., 2010). Given the amount of RNA-Seq data available from these studies, one way forward would be to do a comparative analysis, controlling for developmental factors, so that the shared genetic modules underlying the different behavioural phenotypes could be determined and more precise inferences made. Along this line, Renn et al. (2017) obtained gene expression signatures for cichlids of the tribe Ectodini from Lake Tanganyika, that suggest the existence of deep molecular homologies underlying the convergent or parallel evolution of monogamy in different cichlid lineages from this tribe, which clearly demonstrates the usefulness of this approach to understand evolutionary processes for plastic traits.
A better understanding of the molecular interactions between S. homoeocarpa and creeping bentgrass will be essential for the development of more sustainable and practical management strategies, including the use of plant defense activators and the development of cultivars with increased resistance to S. homoeocarpa. The introduction of next generation sequencing, also termed massively parallel sequencing (MPS), has enabled researchers to sequence the genomes and transcriptomes of organisms at a relatively low cost in return for a vast amount of data with quantitative properties [16,17,18,19]. In this paper, two MPS technologies were used to generate sequence data for RNA- Sequence (RNA-Seq) analysis: Illumina’s sequencing-by-synthesis (SBS) and Roche’s 454-pyrosequencing. The 454 reads were used for the de novo assembly of S. homoeocarpa and creeping bentgrass transcriptome libraries. SBS reads were mapped to the 454 assemblies to calculate transcript levels from S. homoeocarpa and creeping bentgrass during dollar spot disease development. The objective of this study was to identify transcripts that may be important for fungal virulence and creeping bentgrass defense. The results of the analysis will be used to form testable hypotheses for future studies on dollar spot etiology and turfgrass defense mechanisms.
We randomly selected 20 false positive (FP) and several false negative (FN) splicing junctions for experimental validation by RT-PCR followed by Sanger sequencing (Table S2). False positive splicing junctions refer to those covered by at least 2 reads but with p-value .0.01 in our statistical method. False negative splicing junctions refer to those covered by 1 read but with p-value#0.01. We chose splicing junctions with relatively higher RNA-seq read coverage for RT-PCR analysis. Of the 20 FPs tested, all were confirmed, yielding a validation rate of 100%. An example was shown in Figure S5. In contrast, false negative is difficult to verify with RT-PCR/Sanger sequencing because those transcripts usually have extremely low expression level. As an alternative, we analyzed the EST data in public domain and found that at least 30% of the false negatives were confirmed (see discussion). Two examples were shown in Figure S6.
Apesar de útil, a técnica de microarray utilizada nesse estudo, apresenta algumas limitações quando comparada ao novo método sequenciamento de RNA em larga escala (RNA-Seq). Dentre essas limitações, pode-se citar a necessidade do conhecimento prévio das sequências, a expressão de um gene é mensurada de maneira relativa por meio da intensidade de luz emitida em cada condição de interesse, problema de susceptibilidade à hibridização cruzada, baixa reprodutibilidade de resultados entre laboratórios e diferentes plataformas e ineficiência no estudo de genes raros e de isoformas (ESTEVES, 2007; FERREIRA FILHO, 2009; NEVES, 2010). Assim, o estudo do desenvolvimento muscular na fase pré-natal em suínos por meio de dados provenientes da técnica de RNA-Seq torna-se interessante.
Abstract – In Brazil, the sugarcane culture has expanded to areas with prolonged drought seasons, which is constraining sugarcane production. In this study, the gene expression profiles of two sugarcane cultivars (Saccharum spp.), one tolerant and other sensitive to water stress, were assessed by the RNA-Seq technology. The de novo assembly of leaves transcriptome was held aiming to identify and to analyze differentially expressed genes between the cultivars involved with molecular response during water stress periods. In order to accomplish this, the two cultivars were planted in a greenhouse and 60 days after planting them, they were submitted to three potential controlled water soil (control, moderate drought and severe water deficit) and evaluated at 30, 60 and 90 days after treatments application. Using the RNA-Seq method were generated over a billion sequences, which allowed to obtain a total of 177,509 and 185,153 transcripts sequences for the tolerant and sensitive cultivars, respectively. The set of expressed transcripts were assembled using the Trinity program. These transcripts were aligned with Sorghum bicolor, Miscanthus giganteus, Arabidopsis thaliana sequences and sugarcane sequences available in public databases. This analysis allowed the identification of the set of sugarcane genes shared with other species, as well as led to the identification of novel transcripts not cataloged yet. The transcripts were annotated and categorized by the terms of the Gene Ontology (GO) into three categories: cellular component, molecular function and biological process for the two cultivars. The terms "enzyme regulator" and "transcription regulator" that lie within the molecular function category are highlighted within the terms associated with the differentially expressed genes between the contrasting cultivars, suggesting the importance of gene regulation for the studied cultivars. This study found different molecular patterns of the involved cultivars, which provided hypotheses on plant response to drought and provided important information on the identification of genes involved in drought tolerance response.
In this paper, we identify stably expressed genes from RNA-Seq data sets based on a numerical measure—the sum of three variance components estimated from a mixed-effect model. For microarray data, there have been many efforts to numerically find stably expressed genes by quantifying the variation of measured expression levels across a large number of microarray data sets. For example, Andersen, Jensen & Orntoft (2004) used a linear mixed model to estimate the between-group and within-group variances from expression profiles of microarray experiments, and then quantified expression stability by combining the two variance components using a Bayesian formulation. Czechowski et al. (2005) measured the expression stability of each gene using the coefficient of variation (CV). Genes with lower CVs are considered more stably expressed. By investigating 721 arrays under 323 conditions throughout development, Czechowski et al. (2005) suggested stably expressed (reference) genes under different experimental conditions for Arabidopsis. Stamova et al. (2009), Dekkers et al. (2012), Gur-Dedeoglu et al. (2009), and Frericks & Esser (2008) screened a large number of microarray data sets to identify stably expressed genes in human blood, Arabidopsis seed, breast tumor tissues, and mice respectively. Validation experiments (Czechowski et al., 2005; Dekkers et al., 2012; Huggett et al., 2005; Stamova et al., 2009) showed that these genes are more stably expressed than traditional house-keeping genes.
RNA-Seq (RNA sequencing) is the most widely used method to study transcription. In this methodology a population of RNA is converted to cDNA fragments (library preparation) with adaptors attached to one or both ends, and the sequenced (Wang et al., 2009). There are many other technologies focused on transcription. With NET-Seq (Native Elongating Transcript Sequencing) is possible to monitor transcription at u leotide esolutio , y deep se ue i g ’ e ds of as e t t a s ipts (Churchman et al., 2011). In 3P-Seq (Poly(A)-Position Profiling by Sequencing) application, a RNA:DNA oligonucleotide is hybridized with thymines and ligated to the mRNA polyadenylated tail to prevent internal priming (Jan et al., 2011). GRO-Seq (Global Run-On Sequencing) methodology is applied to map position, amount, and orientation of transcriptionally engaged RNA polymerases by nuclear run-on RNA molecules (Core et al., 2008).
experiments with biological replicates, these methods were later modified to accommodate analysis of unreplicated experiments as well, but their performance relative to ASC, GFOLD and NOISeq remains unclear. We did not include two methods with high citations per year: Cuffdiff2 and DEGSeq, based on conclusions from recent method comparative analyses. For example, Cuffdiff2 was found to have very low precision when replicate size increased in the analysis of two large RNA-seq data sets from mouse and human (Seyednasrollah, Laiho & Elo, 2015). Furthermore, Zhang et al. (2014) showed that edgeR had slightly superior performance in the receiver operating characteristic curve compared to DESeq and Cuffdiff2. Another comparative study involving DESeq, DEGseq, edgeR, NBPSeq, TSPM and baySeq showed that DEGseq had the largest false positive rate among them (Guo et al., 2013).
Over the past decade, a number of different approaches for quantifying microRNAs have been described, including cDNA arrays [19–22], a modified Invader assay  and real-time PCR measurement of miRNA precursors [24,25]. RNA-seq is a more effective method for identifying significant changes in the levels of microRNAs between healthy people and individuals with active disease and for identifying microRNAs with potential as biomark- ers for TB diagnosis. Here, we identified and quantified the expression of a total of 904 microRNAs in serum from four experimental groups, namely individuals with active TB, individ- uals with latent TB infections, and healthy individuals, with or without prior BCG inoculation.
Black pepper (Piper nigrum) is a biotechnologically interesting plant, research target for both metabolism exploration and improvement related to phytopathological problems, in addition to understanding the evolution of basal angiosperms, ancestral group to which it belongs. With the technological revolution, the next generation sequencing offered access to genetic heritage of non model plants enabling the opening of new biotechnological perspective. The identification of non homologous genes restricted to certain species, called taxonomically restricted genes (TRGs), is a primary biotechnological target, especially in species and groups that are divergent and ancestral. This study aims to establish a method for TRGs identification from RNA-seq data and to validate the approach a dataset for black pepper. The method consists in filtering the transcripts in several stages, so that the annotated transcripts and false positives are removed, and the remaining data without molecular information are classified as potential TRGs. The application of this approach to a black pepper transcriptome dataset (35,631 transcripts) resulted in 22,661 transcripts annotated by similarity. The transcripts that were not annotated in this first analysis were processed in the TRAPID tool, resulting in 12,895 transcripts not annotated. The evaluation of transcripts for false positive detection resulted in 245 true transcripts that were analyzed for the presence of non-coding RNA, resulting in 204 unidentified transcripts. At the end of the method application 71 non annotated transcripts remained with coding regions of protein, indicating potential TRGs. The characterization of these potential TRGs in black pepper can provide new information about the molecular mechanism of this specie and perhaps elucidate pathways for the establishment of cultivars tolerant to disease.
Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene- centered function prediction to isoform-level predictions.
and to identify the important genes that play key roles in wax synthesis in the Welsh onion. The high-throughput sequencing results will also reveal other pathways that are related to wax synthesis and will identify a larger number of polymorphism molecular markers (SSRs and SNPs), which are scarce in the Welsh onion. Based on the RNA-seq data, the closely related gene expression patterns were investigated to illustrate the function of these genes in the wax synthesis pathway. This research will provide additional evidence of waxy gene expression in wax synthesis and can be used to develop methods for mapping waxy genes and other genes in the Welsh onion.
To further investigate the regulation of target gene ex- pression by USP22, HeLa cells were infected with control lentivirus or lentivirus for USP22-specific shRNA expres- sion. Western blot analysis indicated that infection with the lentivirus for UPS22-specifc shRNA reduced the relative levels of USP22 expression by near 75% (Figure 4A,B). RNA-seq analysis identified that the relative levels of 1,390 mRNA transcripts were altered by at least 1.5-fold (Table S5). There were 907 down-regulated genes and 483 up- regulated ones in the UPS22-silenced cells (Figure 4C). Further RT-PCR analysis revealed that the relative levels of MKK6, MMP15, WNT11 and RUNX3 mRNA transcripts, but not the control b-actin, were significantly reduced in the UPS22-silenced cells, as compared with that in the control cells (Figure 4D).
At least one other RNA-Seq analysis method, baySeq, has used a Bayesian framework to estimate the likelihood of differential gene expression. There are several differences between the Bayesian methods implemented in ALDEx and those implemented in baySeq. Firstly, baySeq estimates the posterior likelihood of differential gene expression in the context of a negative binomial model. Secondly, baySeq assumes that genes with 0 reads are not expressed, while ALDEx uses the more general assumption that genes with 0 reads are either not expressed or are expressed below the threshold of detection. Thirdly, ALDEx normalizes the estimated proportion vector using the isometric log transformation while baySeq and all other existing methods do not. It is worth noting that a recent comparison of baySeq, DESeq and edgeR using a deeply- sequenced and validated dataset showed that DESeq and edgeR were more discordant with each other than either was with baySeq, suggesting that baySeq exhibits a lower false positive rate than either of the other methods. Finally, baySeq can deal with more complex study designs than can ALDEx.
To illustrate our analyses, we used data from publically accessible RNA-seq experiments where we define an experiment as a set of samples. The majority of the work and analyses were per- formed using the following datasets: ENCODE (GSE35584) , Marioni et al (GSE11045) , Xu et al (GSE26109)  and BrainSpan . The ENCODE dataset and the Marioni et al. dataset were downloaded from SRA and converted to fastq files using “fastq-dump” from the SRA Toolkit . Following quality control using the fastX tools, we mapped the reads using bowtie2  and estimated the RPKM and FPKM expression levels using RSEQtools  and Cufflinks  respectively, and the GENCODE annotations file  (version 18). We used the processed data provided in the Xu et al. supplement and the online Brainspan data. For our assessment of publically available data, we looked at those accessible both via Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). We used the URL from the GEO browser [http://www.ncbi.nlm.nih.gov/geo/browse/?view=samples&display=500&type= 10&tax=9606&sort=type], filtering on human samples and those in SRA as of January 2015. Of the resulting 39K samples, we then filtered on samples only in a single experiments (unique to one GSE ID, close to 1.4K experiments), and then calculated the sample size for each of these. For a final follow up, we then obtained 83 RNA-seq experiments from the Gemma database  that were processed using RSEM . These experiments varied in experimental design, sample size, read depth, sample type and conditions.
Today there is no unique best solution to these RNA-Seq assembly problems but several software packages have been proven to generate contig sets comprising most of the expressed transcripts correctly reconstructed. Trinity (Grabherr et al., 2011) and Oases (Schulz et al., 2012) are good examples. The assembled contig sets produced by these packages often contain multiple copies of complete or partial transcripts and also chimeras. Chimeras are structural anomalies of a unique transcript (self-chimeras) or multiple transcripts (multi-transcripts chimeras). They are called ‘‘cis’’ if the transcripts are in the same direction and ‘‘trans’’ if they are in opposite directions. Natural chimeric transcripts exist in some cancer tissues but are rare (Frenkel-Morgenstern et al., 2013). Yang & Smith (2013) have shown the tendency of de novo transcriptome assemblers to produce self-chimeric contigs. The prevalence of the phenomenon depends on the assembly parameters. Multi-transcript chimeras distort contig annotation. The functions of the transcripts merged in the same contig can be very different and therefore the often-unique annotation given to such a chimeric contig does not reflect its content. Assemblies include also contigs corresponding to transcription or sequencing noise a phenomenon often referred as illegitimate transcription (Chelly et al., 1989). These contigs have often low coverage and are not found in the different replicates of the same condition.
Como alternativa aos microarranjos, surgiram modernos métodos de sequenciamento em larga escala, os quais permitem sequenciar o transcriptoma em uma única corrida (RNA- Seq). De modo geral, as metodologias de RNA-Seq se caracterizam pela conversão de RNA em uma biblioteca de fragmentos de cDNA, de forma que cada molécula pode ser sequenciada por um sequenciador de nova geração (plataformas) gerando pequenas sequências (reads) com tamanho variando entre 21 e 500 bp (WANG et al., 2009). Assim, após a normalização, que é a padronização dos dados obtidos, quanto maior a contagem de reads correspondentes a um determinado gene, em certa condição experimental (tratamento), maior a sua expressão. Deste modo, técnicas como o sequenciamento de RNA mensageiro (RNAm) em plataformas de sequenciamento de nova geração (RNA-Seq) se tornaram uma ferramenta importante, auxiliando a busca de genes responsáveis por caracteres de interesse. O RNA-Seq permite obter o perfil do transcriptoma, além de ser ponto de partida para a identificação de transcritos novos e/ou raros e é uma ferramenta poderosa para estudos de expressão gênica.
expressed genes using three different aligners, a single quantifica- tion method (HT-Seq), and five differential expression methods by comparing the results to the ones obtained with microarrays. Other studies have assessed how different methods for identifying differentially expressed genes [13–15] perform when applied to the same input data sets. More recently, the RGASP consortium presented two broad evaluations of RNA-seq data analysis methods: one on the performance of several spliced alignment programs by focusing on the quality of the alignments ; the second study evaluated computational methods for transcript reconstruction and quantification from RNA-seq data . In this work we address complementary yet fundamental questions: do different analysis pipelines affect the gene expression levels inferred from RNA-seq data? And, how close are the expression levels inferred to the ‘‘true’’ expression levels?
RNA samples were sent to the Centre for Genomic Research at the University of Liverpool for further analysis. The total RNA was depleted with the Ribo-Zero Low Input Gold Kit (Human/Mouse/Rat) from Epicentre using 500-1000 ng of starting material. The success of the depletions was assessed using Qubit and Bioanalyzer for each sample. RNA–Seq libraries were prepared from the enriched material using the Epicentre ScriptSeq v2 RNA-Seq Library Preparation Kit. The rRNA depleted RNA was used as input and following 15 cycles of amplification, libraries were purified using AMPure XP beads. Each library was quantified using Qubit and the size distribution assessed using the Agilent 2100 Bioanalyser. These final libraries were pooled in equimolar amounts into 6 pools using the Qubit and Bioanalyzer data. The quantity and quality of each pool was assessed by Bioanalyzer and subsequently by qPCR using the Illumina Library Quantification Kit from Kapa on a Roche Light Cycler LC480II according to manufacturer's instructions. Each pool of libraries was sequenced on one lane of the HiSeq 2500 at 2x50 bp paired-end sequencing in the rapid run mode.
fragmentos de cDNA de tamanho entre 35 a 300 pares de bases (pb) são selecionados e colocados para serem ampliados por PCR (conjunto de reações químicas que multiplica a quantidade de material genético disponível em uma solução). Esses cDNAs amplificados são colocados para serem sequenciados em uma máquina específica. Após o sequenciamento, os fragmentos de cDNA são alinhados a um genoma (ou transcriptoma) de referência, caso exista. E assim, finalmente pode-se calcular o nível de expressão de cada gene. Para isso, deve-se contar quantos fragmentos de cDNA alinharam-se com o gene em questão (Nunes, 2014). A Figura 1 exemplifica como é feito o cálculo da abundância (expressão) de um gene qualquer. Para obter informações sobre como planejar apropriadamente experimentos de RNA-Seq, consulte Auer & Doerge (2010).