Toxicology Research

(1)

REVIEW

Cite this:DOI: 10.1039/c9tx00088g

Received 4th April 2019, Accepted 18th July 2019 DOI: 10.1039/c9tx00088g rsc.li/toxicology-research

Methods for the analysis of transcriptome dynamics

Daniela F. Rodrigues, âVera M. Costa, âRicardo Silvestre, ^b,c Maria L. Bastos âand Félix Carvalho *â

The transcriptome is the complete set of transcripts in a cell or tissue and includes ribosomal RNA (rRNA), messenger RNA (mRNA), transfer RNA (tRNA), and regulatory noncoding RNA. At steady-state, the transcriptome results from a compensatory variation of the transcription and decay rate to maintain the RNA concentration constant. RNA transcription constitutes the ﬁrst stage in gene expression, and thus is a major and primary mode of gene expression control. Nevertheless, regulation of RNA decay is also a key factor in gene expression control, involving either selective RNA stabilization or enhanced degradation.

Transcriptome analysis allows the identiﬁcation of gene expression alterations, providing new insights regarding the pathways and mechanisms involved in physiological and pathological processes. Upon perturbation of cell homeostasis, rapid changes in gene expression are required to adapt to new conditions.

Thus, to better understand the regulatory mechanisms associated with gene expression alterations, it is vital to acknowledge the relative contribution of RNA synthesis and decay to the transcriptome. To the toxicology ﬁeld, the study of gene expression regulation mechanisms can help identify the early and mechanistic relevant cellular events associated with a particular response. This review aims to provide a critical comparison of the available methods used to analyze the contribution of RNA transcription and decay to gene expression dynamics. Notwithstanding, an integration of the data obtained is necessary to understand the entire repercussions of gene transcription changes at a system-level. Thus, a brief overview of the methods available for the integration and analysis of the data obtained from transcriptome analysis will also be provided.

Introduction

The transcriptome is defined as the total repertoire of RNA transcripts in a cell. RNA steady-state levels are a consequence of multiple processes within the cell, mainly regulated by RNA synthesis and degradation.¹ RNA transcription is the process by which RNA polymerase (RNAP) copies the genomic DNA into a RNA transcript. RNA transcription constitutes the first stage in gene expression, being a major and primary mode of gene expression control.² After transcription and RNA processing, newly synthesized RNA is exported to the cytoplasm to be translated or to perform other noncoding functions, being degraded at the end of its useful life.³

At steady-state, the transcription and decay rates vary in a compensatory manner to maintain the RNA levels constant.^4,5 Nevertheless, perturbation of cell homeostasis requires changes in the gene expression to adapt to new conditions, and analysis of steady-state levels hinders the distinction between the transcriptional and post-transcriptional strategies used for that adaptation.^6,7 The major constraint of current methods used to evaluate RNA steady-state levels is their poor temporal resolution for kinetic changes.^7,8The analysis of the relative contribution of RNA synthesis and decay to the steady- state levels is critical for better understanding the regulatory mechanisms associated with gene expression. In fact, it is important to explore if changes in RNA steady-state levels are a consequence of altered RNA synthesis, stability or both, and which of the two processes, transcription or degradation, con- tributes the most to shape cellular RNA levels.¹Evaluation of the influence of transcription and/or decay to RNA steady state levels allows the characterization of the transcriptional and kinetic strategies used by diﬀerent genes under given environmental conditions.⁹

Toxicogenomics is defined as the study of altered gene expression involved in a toxicological response and provides a

aUCIBIO, REQUIMTE, Laboratory of Toxicology, Faculty of Pharmacy, University of Porto, Rua Jorge Viterbo Ferreira, 228, 4050-313 Porto, Portugal.

E-mail: [email protected], felixdc@ﬀ.up.pt

bLife and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal

cICVS/3B’s-PT Government Associate Laboratory, Braga/Guimarães, Campus de Gualtar, 4710-057 Braga, Portugal

Published on 26 July 2019. Downloaded by KEAN UNIVERSITY on 7/26/2019 6:41:25 PM.

View Article Online

View Journal

(2)

detailed snapshot of the biological system response towards the exposure to xenobiotics.^10,11 Gene expression alterations may represent the early and mechanistic relevant cellular events that are associated with xenobiotic-induced toxicity.

Therefore, toxicogenomics can be applied to identify the mechanisms associated with xenobiotic toxicity and to dis- cover biomarkers of effect related to those xenobiotics.¹² In particular, the assessment of the contribution of transcription and degradation to RNA steady state levels can shed light into the mechanisms of gene expression regulation following exposure to different xenobiotics. In fact, in the toxicology field, transcriptome analysis has been applied in a broad range of applications from side effect prediction to drug repur- posing.¹⁰ As an example, transcriptome analysis has been applied to characterize the expression of cytochrome P450 CYP1A1 and CYP1B1 in human tumors such as bladder, color- ectal and endometrial, uncovering a potential use of CYP1 enzymes as targets for cancer therapy.^13,14

Here, we provide a comparative discussion on the available methods for the analysis of transcription, decay and RNA steady-state levels. Nevertheless, to understand the underlying mechanisms regarding the changes in gene expression as a system-level response to diﬀerent stimuli, an integration and more deep analysis is required. Thus, a brief overview of the methods available for the integration and analysis of the data obtained from transcriptome analysis will also be provided.

RNA steady state levels quanti ﬁ cation

Transcriptome analysis provides insights into the biological pathways and molecular mechanisms that regulate physiological and pathological conditions. RNA steady-state levels quantification is usually achieved by reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR), microarrays or RNA-sequencing (RNA-seq).

Reverse transcription quantitative real-time PCR

RT-qPCR allows the quantification of RNA levels by combining PCR amplification and detection of the amplification products in a single step.^15,16 RT-qPCR starts with the conversion of RNA to cDNA by reverse transcription (RT). Thereafter, cDNA is amplified, and due to the use of fluorescent reporter molecules, PCR amplification products emit fluorescent signals during each PCR cycle, which correlate initial template concentration with fluorescence intensity (Fig. 1).^15,16 RT-qPCR can be performed as a one-step reaction, where RT and PCR occur in a single tube or as a two-step reaction, when RT and PCR occur in separate tubes.¹⁵

RT-qPCR quantification is based on the cycle threshold, which is the number of PCR cycles needed to create a signal greater than the background fluorescence.¹⁵Thus, the initial number of copies of cDNA is inversely correlated with the number of cycles required for accurate detection and quantification.^16,17 RT-qPCR enables the quantification of absolute and relative RNA levels. Absolute quantification can

be performed through the use of DNA standards and a cali- bration curve.¹⁷Nonetheless, most quantitative RNA data are not absolute, but relative. Relative quantification is based on the ratio of the amount of the gene under analysis regarding the expression of a reference gene.^17,18 Notwithstanding, the expression analysis accuracy is dependent on the choice of reference genes and normalization methods.¹⁹ While, RT- qPCR has a large dynamic range (>10⁶ fold),²⁰the amplification step poses a challenge for accurate qPCR quantification, since it requires that all samples amplify with equal efficiency.²¹In a reaction with 100% amplification efficiency, the number of molecules doubles each replication cycle and a properly designed assay should present at least 90% amplification efficiency.^22,23 Additionally, RT-qPCR specificity is dependent on the fluorescent detection chemistries employed which can be classified in two main groups: (1) double- stranded DNA intercalating molecules (e.g.SYBR Green) that detect specific and non-specific amplification products and (2) fluorophore-labeled oligonucleotides [( primer-probes (e.g. Scorpions), hydrolysis-probes (e.g. TaqMan), hybridization-probes (e.g. FRET, Molecular Beacons), and analogs of nucleic acids)], which only detect specific amplification products [detection chemistries were thoroughly reviewed by Navarroet al.²⁴].

Microarrays

Microarrays can be defined as a large group of gene specific DNA fragments with known sequences that are aligned in an Fig. 1 Key steps in RT-qPCR. Following RNA isolation, RNA is converted to cDNA by reverse transcription. Thereafter, the cDNA is amplified and due to the use of fluorescent reporter molecules, PCR amplification products emit fluorescent signals during each PCR cycle. Thefluor- escence signal inversely correlates with initial template concentration.

(3)

orderly fashion on a solid surface.¹⁶Gene expression analysis by this method is based on the hybridization of labeled cDNA or cRNA samples and can be performed by examining each sample on a diﬀerent microarray (single-color array) or comparing the expression levels between a pair of samples in a single array (two-color array).²⁵ Thereafter, the array is scanned, and the quantification relies on the amount of signal (usually fluorescence) measured at each specific location. That signal correlates with the amount of RNA with complementary sequence in the sample being analyzed (Fig. 2).^26,27

Microarray provides relative data quantification, since the signal obtained may not truthfully represent the concentration of the target sequence hybridizing onto the array.²⁶ In fact, due to kinetics of hybridization, the equilibrium favors no binding for targets present at low concentrations, while for targets at high concentrations, the array will become saturated.²⁶ Therefore, microarrays have a small dynamic range, up to a few-hundredfold, being diﬃcult to detect genes expressed at low or very high levels.²⁸Moreover, the signal produced may not be proportional to the concentration of the transcripts, because the aﬃnity of the probe also depends on the hybridization conditions.²⁹ Due to the hurdle to design arrays where related sequences will not bind to the same probe on the array, cross-hybridization (signal produced by transcripts that share some sequence similarity with the probe set) and background (signal produced as a result of interaction with probes without significant sequence similarity) restrict the sensitivity and specificity for some genes and hinder the distinction between low and absence of expression.^26,29 Despite this major drawback, microarrays present a relatively low cost and have well defined protocols for hybridization and analysis.³⁰

RNA sequencing

RNA-sequencing (RNA-seq) enables the profiling and quantification of the entire transcriptome using high-throughput next generation sequencing technologies.^28,31 Generally, RNA-seq starts with RNA conversion (total, poly(A) selected or rRNA depleted) to a library of cDNA fragments, through either RNA fragmentation and RT or cDNA fragmentation. Thereafter, cDNA fragments are linked with adaptors (synthetic oligonucleotides of known sequences) and with or without amplification, each molecule is sequenced. The sequence reads are then compared to a reference genome or assembled to produce a genome-scale transcription map (Fig. 3).^28,31

The detection of transcripts by RNA-seq does not require previous knowledge of the sequence. RNA-seq provides absolute measurement of RNA levels as the quantification is based on the number of reads aligned to each gene.^26,32,33 Notwithstanding, the sensitivity of RNA-seq is determined by the depth of sequencing, that is the number of times that a sample is sequenced, however, the cost of a RNA-seq experi- ment increases with the depth of sequencing.^34,35Since rRNA represents∼80% of the total RNA pool, rRNA depletion can be particularly important for RNA-seq experiments as removal of rRNA reduces the number of reads to maximize the coverage of less abundant transcripts, leading to a lower cost per sample.^31,32 For diﬀerential gene expression analysis, 40 million reads should suﬃce to quantify transcripts with moderate to high abundance.³⁶ In fact, 80% of genes with more than ten fragments per kilobase of exon per million reads mapped (FPKM) can be accurately quantified using approximately 36 million reads.³⁵ However, for genes expressed at lower levels (i.e. with fewer than ten FPKM) the sequencing depth has to be increased to 80 million reads.³⁵

Fig. 2 Schematic representation of the key steps of gene expression analysis in microarrays either in single-color array (A) or in a two-channel color array (B) approach. After the hybridization of the labeled samples, the array is scanned. Quantiﬁcation relies on the amount of signal at each speciﬁc location that correlates the amount of RNA with complementary sequence in the sample.

(4)

Moreover, to obtain coverage over rare and lowly-expressed transcripts, the sequencing depth may have to rise up to 500 million reads.³⁷Technical issues associated with RNA-seq protocol such as RT, fragmentation and amplification may result in a non-uniform coverage of transcripts and in data analysis biases.^38–40 Additionally, RNA-seq experiments produce large and complex data sets, which pose a challenge for data analysis and storage.^33,41

Comparative discussion of the methods applied for the measurement of RNA steady state levels

Diﬀerent methods can be applied for the quantification of RNA steady-state levels with their own strengths and limitations.^11,42 While RT-qPCR is very sensitive, the low throughput restricts its use for genome-wide studies.

Microarrays and RNA-seq present higher coverage, being more adequate to study system-level alterations. RNA-seq provides the higher gene coverage, as this methodology does not require previous knowledge of the sequence, being the best method to perform a whole-scale screening analysis. Moreover, RNA-seq provides a direct measurement of RNA levels, with the sensitivity being only limited by the depth of sequencing.^26,34Microarrays enable the expression analysis of hundreds to thousands of genes simultaneously.²⁵ However, the data provided by this method only reflects the results of the group of genes previously selected for analysis. In a study performed by Zhaoet al., comparing RNA-seq and microarray in transcriptome profiling, the gene expression profiles provided by the two technologies were highly correlated.²⁹ Nevertheless, RNA-seq was better in the detection of transcripts expressed at low levels, and demonstrated broader dynamic range than microarrays.²⁹This is justified as technical issues inherent to microarray technologies are avoided, namely cross-hybridization and limitations associated with kinetics of hybridization (as binding is not favored for targets at low concentrations, while for targets at high concentrations the array can become saturated).^28,29Also, RNA-seq is able to detect closely related gene sequences, splice isoforms and RNA editing that may be missed due to cross-hybridization on microarrays.²⁶Nevertheless, RNA-seq sequencing technology is more expensive than microarrays, data storage is more challenging, and analysis is more intricate.^29,41Moreover, the sample preparation protocol in RNA-seq experiments may result in a non-uniform coverage of transcripts, which biases data analysis.^38–40As in microarrays, RT-qPCR also requires selec-

tion of the target genes to be analyzed. Regarding gene coverage, high-throughput RT-qPCR methods have been developed that may come closer to the gene coverage displayed by microarrays. In fact, Schmittgenet al., reported the use of RT-qPCR configurations with 384-well reaction block, with a gene coverage comparable with a low-density array.⁴³ Nevertheless, RT- qPCR has a better dynamic range than microarrays.^20,28 Despite the lower gene coverage displayed by RT-qPCR towards RNA-seq, Nonis et al. suggested a model to help to choose between RT-qPCR and RNA-seq in terms of direct expenses and gene expression data requirement. At the time, they identified that only when the number of genes to analyze is higher than 258, RNA-seq should be used instead of RT-qPCR.⁴⁴ Notwithstanding, once the barriers to widespread use of RNA- seq are overcome, it is expected that this technique will become the principal tool for transcriptome analysis.^26,28,29

RNA polymerase II transcription analysis

RNA transcription is the process by which a RNAP copies the genomic DNA into a RNA transcript. Transcription in eukaryotes relies on three diﬀerent RNAPs.⁴⁵Of the three polymerases, RNAP II is responsible for the transcription of non-coding RNAs and all protein-coding genes through mRNA production. Thus, transcription by RNAP II is a key step in gene expression and a fundamental process in a living cell.^16,46

The transcription process is divided in three main stages:

initiation, elongation and termination.^47,48RNAP II lies at the center of the transcription machinery, interacting with the general transcription factors for transcript initiation, breaking those interactions upon initiation and promoter clearance, and associating with another set of factors during elongation and termination.⁴⁹ Additionally, RNAP II recruits the factors that promote chromatin modification, RNA synthesis, processing, and export.^49,50 RNAP II is composed by 12 subunits (Rpb 1–12), where ten of the subunits form a catalytic core and the remaining two subunits, Rpb4 and Rpb7 (Rpb4/7), form a subcomplex located on the periphery of the enzyme, the Rpb4/

7 heterodimer.^51–53The largest subunit of RNAP II – Rpb1 – has a regulatory role on RNAP II activity through the transcription cycle, due to the presence of a C-terminal domain (CTD).⁵⁴The CTD extends from the core enzyme to form a tail- Fig. 3 Key steps in RNA-seq. Following the conversion of RNA to a library of cDNA fragments, these fragments are linked to adaptors. Thereafter, with or without ampliﬁcation, each molecule is sequenced. The sequence reads are then assembled to produce a genome-scale transcription map or compared to a reference genome.

(5)

like structure, functioning as a landing pad providing a binding site for various factors involved in RNA transcription and processing, such as transcription factors, chromatin modi- fiers and RNA processing enzymes.^55–57The CTD is an unique feature of RNAP II and it is composed of tandem heptad repeats, with the consensus sequence Tyr1-Ser2-Pro3-Thr4- Ser5-Pro6-Ser7.⁵⁸ The CTD suﬀers extensive modification through phosphorylation, glycosylation, ubiquitination, and methylation.⁵⁵Nevertheless, CTD phosphorylation is the most studied and best-characterized CTD modification.55,57,59,60

CTD modifications and structural plasticity enable CTD to serve as a binding platform for a variety of factors that regulate the transcription and maturation of nascent transcripts.^60,61 The ability of the CTD to be modified at each residue can gene- rate a wide range of distinct combinations, which have been linked to a readable code, the CTD code^62–64that orchestrates the recruitment and interaction of factors with RNAP II.⁵⁹

By determining the status and distribution of RNAP II across the genome, transcription analysis methods provide important information for interpreting the mechanisms involved in transcription regulation.^65,66Moreover, the quantification of nascent transcripts allows the identification of genes that respond to specific signals.^65,67 For transcription analysis, a specific enrichment step is needed and can be achieved by the incorporation of uracil analogs in cells maintained in culture (metabolic labeling)⁶⁸ or in isolated nuclei (nuclear run-on),^6,69 by the identification of single-stranded DNA associated with RNAP II ( permanganate footprinting),⁷⁰ by fractionation of chromatin-bound RNAP II (chromatin immunoprecipitation)⁷¹ or by immunoprecipitation or cell fractioning of RNAP II bound RNA (native elongating transcript sequencing).^72,73 Notwithstanding, the specific analysis of RNAP II transcription requires either the use of antibodies targeting specific sequences of RNAP II in immunoprecipitation methods or the specific selection of RNAP II transcripts.

Additionally, these methodologies can provide a genome-wide assessment by analyzing the nascent RNA transcripts by high- throughput sequencing (RNA-seq) or, alternatively, specific transcripts can be pre-selected and analyzed with methods which possess lower coverage (e.g.microarrays or RT-qPCR).

Metabolic labeling

Quantification of nascent RNA can be accomplished by labeling nascent RNA in cells maintained in culture with uracil analogs that afterwards allow the isolation of nascent RNA from the total RNA pool in a method called metabolic labeling (Fig. 4). Yet, metabolic labeling can also be applied for profiling RNA processing and degradation.^1,8,68,74 Nascent RNA labeling can be achieved with uracil analogs such as 4-thiouri- dine (4sU), 5-etyniluridine (EU) and 5′-bromo-uridine (Bru) [reviewed by Taniet al.⁷⁵]. Isolation of labeled RNA is then performed by diﬀerent methodologies: 4sU and EU labeled RNA are biotinylated (by thiol specific biotinylation or by copper- catalyzed cycloaddition reaction–click chemistry, respectively) and isolated by aﬃnity separation with streptavidin, whereas Bru labeled RNA is isolated with anti-Bru antibodies.⁷⁵When

choosing the uracil analog for nascent RNA labeling, it is important to consider the time and concentration required for uracil analogs to be incorporated in whole cells and label nascent RNA, due to possible cytotoxic eﬀects of the analogs.

In fact, exposure of A549 cells to concentrations above 150 μM of 4sU and EU for 48 h was shown to result in a cell viability decrease.⁷⁶Nevertheless, these uracil analogs are rapidly incorporated in nascent RNA and exposure to 4sU for short periods has minimal adverse effects on gene expression, RNA decay, protein stability and cell viability.⁷⁵In fact, the labeling time should be kept to a minimum to minimize collateral side effects of uracil analogs, as exposure of U2OS cells to 4sU 100 μM for either 3 or 6 h caused inhibition of rRNA synthesis.⁷⁷ Additionally, incubation of 4sU 100 μM for 6 h resulted in the induction of a nuclear stress response with p53 induction and inhibition of proliferation.⁷⁷ Bru has less harmful effects than 4sU and EU, since at concentrations lower than 500 μM it does not interfere with cell viability in A549 cells after a 48 h exposure. Still, Bru requires higher incu- bations periods (at least 24 h) for effective incorporation.⁷⁶

To provide further information regarding the mechanisms of RNAP transcription regulation, modifications to metabolic labeling were developed and include the use of UV light⁷⁸or the inhibition of transcription by dichlorobenzimidazole1-b-D- ribofuranoside (DRB)^3,79prior to RNA labeling. In the BruUV- seq approach, UV light inflicts DNA lesions randomly through the genome, which causes RNAP II transcription elongation blockage.⁷⁸While elongating RNAP II is stalled by UV-induced lesions, transcription initiation near active transcription active sites and enhancer elements is expected to continue, which results in an increase in RNA read intensity at these locations. Thus, BruUV-seq allows to identify transcription start sites and active enhancer elements genome-wide in cultured cells.⁷⁸Transcription inhibition by DRB (a reversible inhibitor of CDK9) blocks the progression of RNAP II from transcription start sites to the gene body, thereby causing an Fig. 4 Key steps in metabolic labeling. Cells maintained in culture are exposed to uracil analogs for nascent RNA labeling. The incorporation of uracil analogs then allows the separation of the newly synthesized RNA from the total RNA pool.

(6)

accumulation of RNAP II at transcription start sites. Removal of DRB results in transcription reinitiation in a relatively synchronized manner.³ Labeling nascent RNA after exposure to uracil analogs occurs at the time of DRB removal or a few minutes before. RNA harvesting at several time points after DRB removal enables to verify transcription progression within each individual gene at each time point and quantify the elongation speed.³ Labeling nascent RNA after transcription blockage with DRB has been applied to determine the role of transcription elongation rates in gene expression and in the regulation of alternative splicing.⁷⁹Additionally, the 4sUDRB- seq protocol reported by Fuchset al.also provides an approach for genome-wide measurement of transcription elongation speeds and enables to measure the contribution of transcription elongation speed and the rate of RNAP II transition to active elongation to the overall rate of RNA production.^3,80 Moreover, 4sUDRB-seq method allows to study the impact of specific transcription elongation factors in transcription elongation and to address the eﬀects on transcription elongation in cell adjustments to changing conditions.³

Nuclear run-on

Nuclear run-on (NRO) enables the quantification of nascent transcripts by labeling nascent RNA in isolated nuclei. The NRO protocol starts by pausing transcription followed by the isolation of cell nuclei. This occurs at low temperatures to ensure that the nuclear membrane remains intact and that RNAP remains paused in the genome. The anionic detergent sarkosyl is often added to the NRO reaction to remove physical impediments that can block elongation as it causes the disrup- tion of protein–protein and protein–DNA interactions but does not aﬀect transcriptionally engaged RNAP.⁶⁹ Nascent RNA labeling occurs by increasing the temperature and allowing transcriptionally engaged RNAP to continue in the presence of uracil analogs.^69,81Following isolation of the labeled RNA transcripts, changes in abundance of nascent transcripts are directly correlated with transcriptional activity (Fig. 5).

NRO allows to measure the contribution of transcription initiation rate to gene expression regulation. Since transcription occurs in isolated nuclei, NRO transcripts reflect more specifically the primary rate of transcription rather than latter maturation processes. In fact, NRO is highly robust for labile sequences with high turnover rates, since the results obtained with this method are largely independent on the effects of RNA stability.^82,83 Thus, this method is particularly useful to distinguish transcriptional from post-transcriptional gene regulatory mechanisms.⁶However, NRO relies on the efficient restart of transcription under non-physiological conditions, which depends on the experimental procedure and on the transcriptional status of RNAP.⁸⁴Notably, arrested/backtracked RNAP due to a nucleosome barrier are unable to elongate in NRO since nascent RNA is misaligned with the polymerase active site and will escape detection.⁸⁵Additionally, the analysis of small amounts of cells can be difficult since NRO requires a high number of cells (a minimum of 1 million isolated nuclei) and transcriptional active nuclei.^6,81

Core et al., provided the global run-on sequencing (GRO- seq) protocol, an adaptation of NRO to map and quantify engaged RNAP genome-wide.⁶⁹Nevertheless, the resolution of this methodology is in the order of tens of bases and higher resolution is required to understand the molecular mechanisms of transcription regulation.⁶⁵Precision nuclear run-on (PRO-seq), a GRO-seq based method, enables to achieve base- pair resolution by using biotin labeled nucleotide triphosphates that inhibit further incorporation of these labeled nucleotide triphosphates into nascent RNA.⁶⁵ Afterwards, 3′ end sequencing of nascent transcripts allows the identification of the last incorporated nucleotide triphosphate and reveals the precise location of the active site of RNAP during nascent RNA transcription.⁶⁵Due to the higher base-pair resolution, PRO-seq enables to understand the mechanism of transcriptional elongation and promoter-proximal pausing, and to grasp how DNA sequences, nucleosomes or other DNA-binding factors aﬀect RNAP transcription.⁶⁵ However, with GRO-seq and PRO-seq approaches it is diﬃcult to localize RNAP transcription initiation, since the nascent RNA might not be long enough to be aligned to the reference genome.⁶⁵ The sensitivity and specificity of the GRO-seq or PRO-seq approaches for detecting transcription initiation can be increased by an enrichment step and sequencing of 5′capped RNAs (GRO-cap or PRO-cap).^65,86 GRO-cap and PRO-cap can be applied to detect enhancer transcripts, upstream antisense transcription or other unstable transcripts.⁶⁵

Chromatin immunoprecipitation

Chromatin immunoprecipitation (ChIP) is applied to map protein–DNA interactions across the entire genome. By applying antibodies targeting RNAP II, this technique provides a Fig. 5 Key steps in nuclear run-on. Nuclear run-on starts with a halt in the transcription process and nuclei isolation. The transcription process in isolated nuclei is then allowed to restart in the presence of uracil analogs to label nascent RNA. The uracil analogs enable the isolation of the newly synthesized RNA from the total RNA pool.

(7)

measure of this polymerase transcriptional activity through the estimation of the amount of RNAP II associated with a gene being transcribed.⁸⁷ ChIP involves protein–DNA crosslinking, after which cells are lysed, nuclei disrupted, and chromatin fragmented and solubilized. Protein–DNA complexes are then immunoprecipitated using antibodies against RNAP II. After removal of unspecific bound chromatin, precipitated chromatin is eluted, crosslinks reversed, proteins digested, and DNA is purified and quantified (Fig. 6).^71,88

The crosslinking step, usually performed with formal- dehyde, preserves physiologically relevant interactions in the cell. Still, it can also cause misidentification of RNAP II genomic regions, through the interaction of RNAP II-unbound with RNAP II-bound genomic regions.^65,87In addition, a high number of cells is required, due to the loss of cells during the chromatin crosslinking step and the relative low amount of recovered immunoprecipitated DNA.⁸⁸ Moreover, data obtained in ChIP experiments highly depend on the specificity of the antibody used and the degree of enrichment accomplished in the immunoprecipitation step.⁸⁹ In fact, the obtained data usually have high background levels due to unspecific DNA precipitation caused by antibody cross-reactivity or by the unspecific binding of DNA to the beads used for immunoprecipitation.⁷²

Through the use of antibodies that target RNAP II, this method determines the distribution and density of RNAP II along the genome, which is proportional to the amount of

transcription taking place.67,71,84,90 Transcriptional dynamics assessed with ChIP can be performed by mapping the global distribution of RNAP II.⁹⁰Nevertheless, RNAP II presents a dis- tinctive feature in its largest subunit, a CTD that has a regulatory role on RNAP II activity throughout the transcription cycle.⁵⁸The CTD can be modified at each residue producing a huge variety of distinct combinations that have been linked to a readable code, the CTD code.^59,62 Of the possible modifications, some phosphorylation marks have been associated with specific steps of the transcription process. Thus, mono- clonal antibodies targeting phosphorylation marks of CTD residues are powerful tools to evaluate the transcriptional status of RNAP II.^59,64 These antibodies may react with an unphosphorylated CTD, with a single phospho-residue or with a double phosphorylation mark [reviewed by Heidemann et al.⁵⁹]. Still, ChIP data has to be interpreted with caution because the complete modification status of RNAP II at each step of the transcription cycle is unknown.⁶⁷ Likewise, the incomplete knowledge of how CTD modifications aﬀect the aﬃnity of these antibodies for their target epitope can also influence antibody binding, as accessibility to the epitope can be prevented by other adjacent modifications.^59,67 Furthermore, absence of signal can either indicate physical absence or epitope masking, since the signal strength depends exclusively on the number of accessible CTD-marks and not on the number of modifications actually present in the CTD.⁵⁹ Moreover, ChIP resolution is usually limited by the size of DNA. Nevertheless, the use of exonuclease digestion to remove DNA not directly bound to the protein of interest by ChIP-exo⁹¹ or by ChIP-nexus,⁹²leads to higher sensitivity and nucleotide resolution.^65,72,84 Finally, as bound DNA is analyzed rather than the nascent RNA, the transcriptional status of RNAP is not revealed, thus, ChIP serves as an indirect evidence of transcription.^82,84

Permanganate footprinting

Permanganate footprinting detects single-stranded DNA regions that occur in the transcription bubble of transcriptionally engaged RNAP.⁷⁰Permanganate footprinting is based on the hyperreactivity that thymine residues, within the single- strand DNA region, display towards oxidation by potassium permanganate. Permanganate footprinting starts with the oxidation of single-stranded thymine by potassium permanganate. DNA is then extracted, purified and oxidized thymine residues are converted to strand breaks by piperidine cleavage.

Cleaved DNA is extracted, purified and DNA breaks are mapped with ligation mediated-PCR (LM-PCR) (Fig. 7).⁷⁰

Permanganate footprinting reveals the location of RNAP II with a 12-nucleotide resolution, at approximately the size of the transcription bubble.⁷⁰Nonetheless, single-stranded DNA can arise from other sources than the transcription bubble, such as other DNA–RNA hybrids, DNA replication forks and intra-strand DNA hairpins. Thus, permanganate footprinting analysis is an unspecific method for the detection of RNAP transcription bubble.^65,84 Moreover, permanganate footprinting is a low-throughput technique as the readout involves Fig. 6 Key steps in ChIP. To preserve physiologically relevant inter-

actions, protein and DNA are crosslinked usually by treatment with for- maldehyde. Thereafter, cells are lysed, nuclei disrupted, and chromatin fragmented and solubilized. Protein–DNA complexes are then immunoprecipitated using antibodies against RNAP II. Precipitated chromatin is eluted, crosslinks reversed, proteins digested, and DNA is puriﬁed and quantiﬁed. By applying antibodies targeting RNAP II this method provides a measure of this polymerase transcriptional activity through the estimation of the amount of RNAP II associated with a gene being transcribed.

(8)

LM-PCR on individual genes.⁶⁷ Nevertheless, to map the cleaved ends of single-stranded DNA at a genomic scale, Li et al., developed a method called permanganate–ChIP-seq that joins the strength of permanganate footprinting in the assessment of single-stranded DNA associated with RNAP II transcription bubble at genomic scale with ChIP-seq.⁹³

Native elongating transcript sequencing

Native elongating transcript sequencing (NET-seq) is a genome wide approach that maps the 3′end of nascent transcripts associated with RNAP by high-throughput sequencing to monitor transcription at nucleotide resolution.^72,73,94 The selection of nascent RNA associated with elongating RNAP can be achieved by cell fractionation of transcribing RNAP II due to the extraordinary stability RNAP II elongation complex as described by Mayeret al.(human NET-seq)^72,95or by capturing nascent RNA associated with diﬀerent CTD phosphorylation states by immunoprecipitation as reported by Nojima et al.

(mammalian NET-seq).^73,96

The human NET-seq protocol, described by Mayer et al., starts with cell lysis and nuclei isolation.⁷²The risk of transcription run-on during cell fractionation is minimized by usingα-amanitin, by performing all experiments on ice and by a rapid depletion of the nucleotide pool by a fast and eﬃcient cell lysis.⁷²Isolation of the transcripts associated with RNAP II is based on the high stability of the DNA–RNA–RNAP ternary elongation complex, in the presence of salts, detergents, and urea, which remove other chromatin-bound proteins. After removal of histone proteins, the chromatin-associated nascent

RNA is purified, and the remaining DNA degraded. The 3′ends of purified nascent RNA are converted into a cDNA sequencing library. High-throughput sequencing of the generated DNA library provides a quantitative measure of RNAP density with single-nucleotide resolution (Fig. 8).⁷²

Human NET-seq enables the study of the regulation of transcription directionality (sense and antisense), as NET-seq maps RNAP II DNA strand specifically.⁷²As NET-seq captures RNA as it is being produced, this method can detect unstable RNA species such as promoter upstream transcripts, upstream antisense RNA and enhancer-derived non-coding RNA.⁷² However, RNAP II in the preinitiation complex (PIC) or involved in transcription initiation cannot be mapped since PIC formation occurs before nascent RNA production and during transcription initiation nascent RNA may be too short to be sequenced and align to the reference human genome.⁷² Moreover, NET-seq signal can not only be a result of nascent RNA associated with elongating RNAP II but also of RNA processing intermediates.⁷²

Alternatively and by applying antibodies with diﬀerent spe- cificities for RNAP II, the mammalian NET-seq protocol enables to profile RNAP II CTD phosphorylation-specific nascent RNA genome-wide.⁷³ In the protocol described by Nojima et al., the chromatin is isolated and solubilized by micrococcal nuclease digestion. Thereafter, specific RNAP II complexes associated with nascent RNA are immunoprecipi- Fig. 7 Key steps in permanganate footpriting. To detect single-stranded

DNA regions that occur in the transcription bubble of transcriptionally engaged RNAP, potassium permanganate is applied to oxidize the thymine residues in single-stranded DNA. DNA is then extracted and puriﬁed. Oxidized thymine residues are converted to strand breaks by piperidine cleavage. Quantiﬁcation of DNA breaks is then accomplished with ligation mediated-PCR (LM-PCR).

Fig. 8 Key steps in human NET-seq as described by Mayeret al.⁷²After cell lysis and nuclei isolation, transcripts associated with RNAP are sep- arated based on the high stability of the DNA–RNA–RNAP ternary elongation complex. Following removal of the histone proteins, the chromatin-associated nascent RNA is puriﬁed, and the remaining DNA degraded. The 3’ends of puriﬁed nascent RNA are converted into a cDNA sequencing library. High-throughput sequencing of the generated DNA library provides a quantitative measure of RNAP density with single-nucleotide resolution.

(9)

tated using antibodies with diﬀerent aﬃnities for RNAP II.

The 3′ends of nascent RNA associated with RNAP II are converted into a cDNA sequencing library that will be sequenced and aligned to the human reference genome. Mammalian NET-seq allows the characterization of genome-wide transcript profiles based on CTD phosphorylation and coupled to RNA processing.⁷³ The mammalian NET-seq protocol would also enable the determination whether RNA processing factors associated with RNAP II are involved in RNAP II elongation and pausing.⁷³

Comparative discussion of the methods for transcription analysis

Diﬀerent methods can be applied to identify RNAP II and unravel RNA regulation mechanisms (Table 1). Eight major steps have been identified as rate limiting in the transcription cycle: (1) RNAP II access to the promoter; (2) PIC formation; (3) initiation;

(4) promoter escape/clearance; (5) escape from pausing; (6) productive elongation; (7) termination; (8) recycling.⁶⁶

Methodologies that involve nascent RNA 3′end sequencing do not provide information about the localization of RNAP II during transcription preinitiation or initiation, since PIC formation occurs before nascent RNA is produced. Moreover, during transcription initiation, nascent RNAs are too small to produce mappable sequencing reads, since a minimum of 18 nucleotides read lengths are needed to align to the human reference genome.⁷²Nevertheless, the sensitivity and specificity for detecting transcription initiation can be augmented by sequencing 5′end cap in either the GRO-cap or PRO-cap approaches.^65,86Additionally, BruUV-seq allows to identify transcription start sites, in a genome-wide manner, in cultured cells.⁷⁸

Determination of the pattern of CTD changes that occur through the transcription cycle by ChIP is a valuable approach for the identification of the status and distribution of RNAP II.

While RNAP II is unphosphorylated before transcription initiation, when incorporated into the PIC, RNAP II becomes phosphorylated in the Ser5 residues.⁹⁷ Ser5 phosphorylation (Ser5-P) levels remain high as RNAP II transcribes the first hundred nucleotides, but decay further downstream.⁶⁴ Progression for transcription elongation is marked by Ser2 phosphorylation (Ser2-P) and its levels rise from this point through the transcription process.59,64,87,97 Thus, antibodies targeting RNAP II phosphorylation on residues Ser2 or Ser5 can be applied to discriminate between the elongation or initiation steps of transcription.⁹⁷Notwithstanding antibodies targeting other changes in the CTD can be applied for transcription analysis.⁵⁹

Transition to productive elongation occurs with the formation of the transcription bubble and RNA extension.

Permanganate footprinting allows the identification of single- stranded DNA arising from the transcription-bubble by LM-PCR, with a 12 nucleotide resolution, approximately the size of a transcription bubble.⁷⁰However, permanganate footprinting is an unspecific method for the detection of RNAP

transcription bubble, as single stranded DNA can arise from ^T^able

1Transcriptionanalysismethods(CTD–C-terminaldomain,RNAP–RNApolymerase) Nuclearrun-onMetaboliclabelingPermanganatefootprintingChromatinimmunoprecipitationHumannativeelongating transcriptsequencing Analysisof RNAPactivityCaptureoflabelednewly transcribedRNAfrom isolatednuclei Captureoflabelednewly transcribedRNAfrom isolatednuclei Detectionofoxidizedthymine residuesfromsingle-stranded DNA ImmunoprecipitationofDNAassociated withRNAPIISelectionofRNAfromthe RNA–DNA–RNAPIIternary complex StrengthsHighlyrobustfor transcriptswithhigh turnoverrates

Simultaneousmeasurement oftranscription,processing anddecay LocationofRNAPIIwitha 12-nucleotideresolutionDeterminationofthestageof transcriptionbytargetingdiﬀerent phosphomarksoftheCTD Preciseinvivolocationof RNAPIIcomplexes LimitationsBiasassociatedwithnuclei isolationToxicityandeﬀective incorporationofuracil analogs

Single-strandedDNAcanalso occurinotherDNA–RNAhybrids andreplicationforks Interferencewithantibodybinding3′endcaptureofRNA processingintermediates Non-physiologicalrestartof transcriptionBasedonDNAanalysis

BasedonDNAanalysis

(10)

other sources.^65,84Productive elongation characterized by the lengthening of RNA transcripts associated with RNAP II can be identified with NRO,^6,65 metabolic labeling^1,3 or NET-seq.⁷² While NRO assays reveal actively transcribed regions, the extensive manipulation of this method limits the resolution and depends on the eﬃcient restart of transcription under non-physiological conditions.^69,94 Metabolic labeling evades the biases associated with nuclei isolation and transcription restart under non-physiological conditions. Nonetheless, the toxicity and the eﬀective uptake of the uracil analogs can be of concern.^6,76,77Human NET-seq avoids the requirement of the restart of transcription under non-physiological conditions to label and purify nascent RNA. Additionally, it also evades the potential biases from epitope masking and cross-reactivity associated with immunoprecipitation methods used for RNAP II purification.⁷²NET-seq is a powerful approach for the study of transcriptional pausing regulation, as prominent peaks of RNAP II density reveal pausing sites.⁷²While RNAP II recovering from transcriptional pausing cannot be detected by NRO, since the nascent RNA is misaligned with the polymerase active site, comparison between data obtained from NRO with NET-seq enables the identification of active, paused and pause-recovering RNAP II.^72,85 ChIP-exo and Chip-nexus approaches still lack single-nucleotide resolution and DNA strand specificity, but they allow to determine the location of RNAP II during transcription preinitiation and initiation, com- plementing data obtained from NET-seq and NRO.^72,91,92ChIP gives more information regarding the transcriptional regulation of a gene than NRO;⁷¹however NRO (PRO-seq) provides higher sensitivity than ChIP, since a larger fraction of usable sequence reads is generated.⁶⁵Nevertheless, as in ChIP bound DNA is analyzed rather than the nascent RNA, the transcriptional status of RNAP is not revealed.^84,94

While most methods presented for RNAP II transcription analysis were adaptations of genome-wide measurements, requirements for genome-wide analysis should be evaluated concerning the purpose of the study, and the number of genes and samples to be analyzed. For that matter, Robertset al., provided a comparison in terms of cost concerning the choice between RNA-seq and RT-qPCR for the analysis of nascent transcription by NRO.⁶The authors reported that the estimated cost of a GRO-seq analysis is $350 per sample not including the cost of data analysis. On the other side, the approximated cost of RT-qPCR analysis is $10 per sample with the possibility to be even lower in a optimized setting, allowing to run 145 samples and one reference gene at the cost of one GRO-seq sample.⁶

RNA decay assessment

Regulation of RNA decay is a key factor in the control of gene expression, leading to either selective RNA stabilization or enhanced degradation towards an adaption to new conditions.^75,98Since RNA stability plays a key role in the regulation of transcript abundance, it is important to ascertain its contribution to transcriptome dynamics. Three strategies are

usually employed for the assessment of RNA decay: (1) measurement of RNA decay rate after transcription inhibition;

(2)in vivopulse with uracil analogs and chase of labeled RNA and (3) indirect determination of RNA decay rate after measurement of RNA steady-state levels and transcription rate.^4,98

Measurement of RNA decay rate after transcription inhibition The principle of transcription blockage for assessing RNA decay lies on the fact that when transcription is inhibited, only degradation machineries act, and the RNA concentration is progressively reduced at a rate that directly depends on RNA half-life.⁴ The determination of RNA decay by this methodology is accomplished by measuring RNA concentration, with methods applied for the quantification of RNA steady-state levels, over a time course after transcription blockage by chemical agents.^4,98,99 For transcription inhibition, chemical agents such as actinomycin D, 5,6-dichloro-1-D-ribofuranosyl- benzimidazole (DRB), α-amanitin, flavopiridol and triptolide can be applied [subject reviewed by Bensaude et al.⁹⁹].

Actinomycin D is a DNA intercalating agent causing a fast inhibition of transcription but with low selectivity; α-amanitin is highly selective for RNAP II but inhibition of transcription occurs at a slow rate; DRB and flavopridirol inhibit CDK9 resulting in a fast inhibition of transcription but some genes escape transcription inhibition; and triptolide triggers rapid degradation of RNAP II resulting in a fast and selective inhibition of transcription.⁹⁹ This strategy is an easy way to measure stability changes of endogenous RNAs given that no construction and cellular transfection of exogenous genes is involved. The major disadvantage of using chemical agents to measure RNA decay is that they have a profound impact on cellular physiology and homeostasis resulting in the induction of a stress response or cellular death, with associated alterations in the expression of several genes.^75,98,99In fact, transcription inhibition may enhance the stability of some RNAs, such as growth arrest DNA damage (GADD) genes, which can overall in data analysis biases.¹⁰⁰ Moreover, transcription blockage has been reported to cause p53 accumulation and to induce apop- tosis.¹⁰¹Although these disadvantages limit the usage of transcription inhibitors, it is still the most used methodology for measuring RNA decay rate given its technical simplicity.

Metabolic labeling–pulse with modified uracil analogs and chase of labeled RNA

Metabolic labeling can be applied for the determination of global transcript stability by assessing RNA decay rates using a pulse and chase strategy.¹⁰² In the metabolic labeling pulse and chase strategy, labeling of the majority of cellular RNAs occurs after exposure to uracil analogs. The mean age of newly transcribed RNA increases with the duration of the exposure to the uracil analogs.⁷Thereafter, uracil analogs are removed and RNAs are chased along diﬀerent time points, fractioned and quantified. Decay curves are then used to determine RNA half- life.⁴Notwithstanding, this method can lead to the erroneous estimation of RNA degradation rates due to an incomplete

(11)

chase caused by the internal recycling of uracil analogs from old labeled RNA. This pitfall may result in an overestimation of RNA half-life.^4,103,104

Assessment of RNA decay rate after measurement of steady- state levels and transcription rate

The combined measurement of RNA levels and transcription rate allows to study the influence of the latter on total RNA amount and the indirect determination of RNA decay rates.⁹⁸ Any of the aforementioned techniques can be applied for the measurement of RNA transcription and steady-state levels.

Decay rates are indirectly calculated after the determination of the transcription rate and RNA levels. This procedure assumes a steady-state situation for RNA levels, where the transcription rate and decay rate vary in a compensatory manner to maintain the RNA concentration constant.^7,98However, under stress conditions, this is not often the case. To circumvent the limitations when assuming steady-state conditions for RNA decay determination, other mathematical formulas were developed to take in account non-steady-state conditions.^4,105 Nonetheless, mathematical corrections that take into account the non-steady-state conditions may result in mathematical amplification of experimental errors¹⁰⁵ and the impact of those errors must not be waived.

Comparative discussion of the methods to access RNA decay The success of the experimental determination of RNA half- life is highly dependent on the number and choice of time- points. The fewer and further apart the time points, higher are the odds of error to obtain the true half-lives, especially in the case of very short or very long-lived RNAs.⁹⁸Furthermore, analysis of certain genes has to be performed on synchronized cell populations, since cell cycle genes present diﬀerent RNA half- lives, depending on the stage of the cell cycle at which they are measured.⁹⁸

Comparison of RNA half-life obtained by distinct techniques revealed that data sets obtained from diﬀerent methods present poor or even no correlation.^5,106The correlation between similar techniques is positive but not very strong, being better correlated before mathematical processing.^4,106 The metabolic labeling pulse and chase approach has been proposed as a promising strategy for resolving these discrepancies, since it allows the direct estimation of RNA degradation with minimal bias.

Nonetheless, this approach may be challenging for highly induced genes.^1,105Despite the poor overall statistical correlation, RNA stability ranks and classification of RNAs half-lives according to short, medium, or long half-lives generally agree well among methods, allowing the classification of genes and the study of how cells use RNA stability to react to environmental signals.^1,4

Integration of transcriptome analysis data

The transcriptome analysis workflow includes the experimental procedure and the subsequent analysis of the obtained

data. Notwithstanding, due to the enormous amount of data generated by transcriptome analysis, data integration to extract meaningful information is very important to understand the mechanisms involved in the changes in gene expression in a system-level response. Different methods are available for this data integration, such as differential gene expression analysis, and other more advanced methods to obtain a deeper understanding of the biological context of the differently expressed genes such as pathway analysis, signature matching, biological networks and co-expression networks.^10,107 This integrative analysis allows a better understanding of the regulatory mechanisms associated with gene expression and how transcription and/or degradation shape RNA levels. A brief outline of some of the methods available for the integration of transcriptome analysis data will be provided, but for further information regarding the methods, algorithms and databases, more specific reviews to this matter are available.^10,11,107

Identification of differentially expressed transcripts can provide the number and the extent of the genes involved in a particular response.^11,107Differential gene expression is based on a statistically significant difference between conditions, where the number of copies of a transcript is either signifi- cantly increased or decreased regarding to a control condition.^11,108 Differential gene expression analysis represents a good start for determining the transcriptional effect associated with diverse biological conditions and treatments.

However, the identification of differentially expressed genes does not provide information on the specific biological response affected.¹¹ Moreover, in some circumstances, gene expression is not strongly affected, and measured transcriptional responses are too noisy to be informative.¹⁰⁹ Thus, additional analyses are required to help distinguish signal from noise and integrate this data to understand the biological context of the differently expressed genes.

In pathway analysis, interpretation of gene expression data is based on the individual function of genes and their role in cellular pathways, integrating the diﬀerentially expressed genes into biological functions.^10,107Pathway analysis provides insight into the specific biological functions aﬀected, rather than giving a long list of altered genes without any findable correlation.^11,107 In fact, in some cases and even if a small expression change is not significant at a single gene level, minor changes of several genes may be relevant in a pathway and have dramatic biological consequences.¹⁰⁷While pathway analysis contextualizes the experimental findings with cur- rently available biological insights, this also means that this method does not provide novel mechanisms of action, as some understanding about the genes involved needs to be provided.¹⁰Furthermore, this method tends to be biased towards the most important genes or pathways studied, as they have more data entries.¹⁰ Because of these limitations, other methods are available to provide broader integration of gene expression data.

Gene expression signature matching allows to identify previously cataloged gene expression changes from a variety of biological conditions with expression changes similar to the

(12)

one provided by comparison, retrieving the collections of signatures that have the highest similarity.¹¹⁰In addition, compound signature matching approaches allow to make predic- tions about a potential toxicity by comparing compound- induced gene expression signatures with a pre-existing signature library.¹⁰By using signature libraries, the similarity in the transcriptome responses is evaluated.¹¹Notwithstanding, use of signature matching must be cautious since the gene expression signatures may vary due to diﬀerences in cell line response, the concentrations and time points analyzed.¹¹¹

Biological interaction networks, such as protein–protein interaction or signaling networks can also be applied to eluci- date the mechanisms of transcriptome regulation and provide a broader integration of data. Biological networks can be classified as directed or undirected. Directed biological networks are applied when the way the information flows from one node ( protein, miRNA, gene) to the other is known, and undirected analysis are used when this information is unknown or has no clear meaning.¹⁰ Biological networks, especially directed signaling networks, allow to follow the cellular response after a treatment, from the induced changes in the transcriptome.¹⁰Network biology tools enable to construct an outcome pathway from the gene expression profile and establish a signaling network associated with a specific process.¹¹²

In a co-expression network the entire transcriptome is used to determine the gene function and mode of action, allowing to reduce large amounts of data down to informative genes and to meaningful biological modules, tightly associated within a specific biological process.^107,113 The co-expression network is represented as an undirected graph, where each node corresponds to a gene, and links are established between nodes with significant co-expression relationship.¹⁰⁷ Co- expression network analysis methods rely on the assumption that highly correlated genes are biologically related.

Nevertheless, the existence of correlation does not mean that there is a causality relation, and this must be considered when establishing the modes of action.¹⁰

Conclusion

The study of the mechanisms of gene expression regulation is a fundamental issue. The analysis of the contribution of transcription and decay to RNA steady-state levels is crucial to acknowledge the transcriptional and kinetic strategies used by diﬀerent genes following perturbation of cell homeostasis. In the toxicology field this study can help identify the early and mechanistic relevant cellular events associated with a particular response, which ultimately aids to identify mechanisms and biomarkers associated with cell responses for the preven- tion of adverse health eﬀects. The selection of methods for transcriptome analysis mainly depends on the purpose of the study. Nevertheless, other factors such as the cost, sensitivity, gene coverage and the number of samples must also be considered. Additionally, it is important to recognize the strengths

and limitations of each method, an objective that we specifically aimed in the present review, before any selection has to be made. Notwithstanding, due to enormous amount of data generated by transcriptome analysis, an integration of data to extract meaningful information is also very important to understand the mechanisms of changes in gene expression in a system-level response.

Con ﬂ icts of interest

There are no conflicts of interest to declare.

Acknowledgements

This work was supported by FEDER funds through the Operational Programme for Competitiveness Factors – COMPETE and by national funds by the FCT within the project PTDC-DTP-FTO-4973-2014 – POCI-01-0145-FEDER 016545.

VMC acknowledges Fundação da Ciência e Tecnologia (FCT) for her grant (SFRH/BPD/110001/2015).

References

1 M. Rabani, J. Z. Levin, L. Fan, X. Adiconis, R. Raychowdhury, M. Garber, A. Gnirke, C. Nusbaum, N. Hacohen, N. Friedman, I. Amit and A. Regev, Metabolic labeling of RNA uncovers principles of RNA production and degradation dynamics in mammalian cells, Nat.

Biotechnol., 2011,29, 436–442.

2 L. J. Core, J. J. Waterfall, D. A. Gilchrist, D. C. Fargo, H. Kwak, K. Adelman and J. T. Lis, Defining the status of RNA polymerase at promoters, Cell Rep., 2012, 2, 1025– 1035.

3 G. Fuchs, Y. Voichek, M. Rabani, S. Benjamin, S. Gilad, I. Amit and M. Oren, Simultaneous measurement of genome-wide transcription elongation speeds and rates of RNA polymerase II transition into active elongation with 4sUDRB-seq,Nat. Protoc., 2015,10, 605–618.

4 J. E. Perez-Ortin, P. Alepuz, S. Chavez and M. Choder, Eukaryotic mRNA decay: methodologies, pathways, and links to other stages of gene expression,J. Mol. Biol., 2013, 425, 3750–3775.

5 M. Sun, B. Schwalb, D. Schulz, N. Pirkl, S. Etzold, L. Lariviere, K. C. Maier, M. Seizl, A. Tresch and P. Cramer, Comparative dynamic transcriptome analysis (cDTA) reveals mutual feedback between mRNA synthesis and degradation,Genome Res., 2012,22, 1350–1359.

6 T. C. Roberts, J. R. Hart, M. U. Kaikkonen, M. S. Weinberg, P. K. Vogt and K. V. Morris, Quantification of nascent transcription by bromouridine immunocapture nuclear run-on RT-qPCR, Nat. Protocols, 2015,10, 1198–1211.

7 L. Dolken, High resolution gene expression profiling of RNA synthesis, processing, and decay by metabolic label-