DNAmethylation plays a central role in regulating many aspects of growth and development in mammals through regulating gene expression. The development of next generation sequencing technologies have paved the way for genome-wide, high resolution analysis of DNAmethylation landscapes using methodology known as reduced representation bisulfite sequencing (RRBS). While RRBS has proven to be effective in understanding DNAmethylation landscapes in humans, mice, and rats, to date, few studies have utilised this powerful method for investigating DNAmethylationin agricultural animals. Here we describe the utilisation of RRBS to investigate DNAmethylationinsheep Longissimus dorsi muscles. RRBS analysis of ,1% of the genome from Longissimus dorsi muscles provided data of suitably high precision and accuracy for DNAmethylationanalysis, at all levels of resolution from genome-wide to individual nucleotides. Combining RRBS data with mRNAseq data allowed the sheep Longissimus dorsi muscle methylome to be compared with methylomes from other species. While some species differences were identified, many similarities were observed between DNAmethylationpatternsinsheepand other more commonly studied species. The RRBS data presented here highlights the complexity of epigenetic regulation of genes. However, the similarities observed across species are promising, in that knowledge gained from epigenetic studies in human and mice may be applied, with caution, to agricultural species. The ability to accurately measure DNAmethylationin agricultural animals will contribute an additional layer of information to the genetic analyses currently being used to maximise production gains in these species.
The differential methylation we uncovered in endometriosis suggests that several different mechanisms are at work. Unlike cancer cells, there was significant hypo- and hypermethylation in endometriotic cells, and these were distributed in a variety of patterns. The majority of differential methylation was observed intragenically and at sites distal to CGIs. While the gene-centric bias in the array may explain the over-representation of intragenic CpGs, the array was similarly biased in favor of island CpGs, suggesting the increased incidence in differential methylation across shore and open sea regions is biologically relevant. Moreover, shore and open sea CpGs were much more likely to be negatively correlated with gene expression, particularly when they occurred near the TSS (such as the TSS200 and 1stExon groups). Recent work from several groups suggests that intragenic methylation, in particular near the first exon, is important for coordinating tissue-specific nucleosome positioning and gene expression [53,54]. Although the well-spaced coverage of the 450K beadchip makes it possible to detect many of these differences, larger data sets are needed for the extrapolation of unique methylationpatterns which correlate with gene expres- sion. We anticipate that our present data will continue to provide valuable insight into gene regulation in endometriosis as future studies decode the spatial and genomic context through which DNAmethylation can affected transcription.
The genome can be considered a large file which is regulated in a meticulous way to minimize the damage caused by mutations. Most genes are present in identical amounts in all cells, i.e., one copy per haploid cell and two copies per diploid cell. Importantly, the level of gene expression, indicated by the number of mRNA copies, can vary widely, ranging from no expression to hundreds of copies. Additionally, the expression levels of the same gene may still vary and tends to account for the cell response to microenvironmental stimuli (Berg et al. 2008). Many genes presented in eukaryotic cells are considered as housekeeping genes, since they are constitutively expressed at low levels in all cells, being essentially responsible for encoding metabolism enzymes or cellular components (Hartl 2014). The expression levels of the remaining genes differs according to the cell type or stage of the cell cycle, being regulated by the control of transcription (Hartl 2014). There are different mechanisms involved in the transcription regulation, and certainly several of them are still not known. From these, mechanisms linked to epigenetics have been extremely relevant in this area.
Several lines of evidence suggest that cytokines are associated with animal and human aggression [78–80] and IL-6 was causally linked to aggression in mice by gene knockout evidence . Our analyses of males whose level of aggression were followed for a 22 year period revealed an association between cytokine levels in plasma and chronic physical aggression: young adult men with a history of chronic physical aggression during childhood have lower baseline concentrations of two pro (IL-1a, IL-6 and IL-8) and two Figure 4. DNAmethylation differences between CPA (n = 8) and control (n = 12) groups in cytokine’s transcription factor STAT6 in T cells. Expanded views from the UCSC genome browser of STAT6 (A) locus located on chromosomes 12 is depicted (top panel). In both panels, the first track shows the average MeDIP probe log2-fold differences (scale top panel: 20.4 to 0.4, botom panel: 21.0 to 1.0) between chronic physical aggressive (CPA) and control groups for T cells. In black are probes that are more methylated andin gray are those that are less methylated in the CPA group. Highlighted in blue are regions significantly differentially methylated between the groups. The second track shows Pearson’s correlation coefficient (scale top panel: 20.4 to 0.4, botom panel: 20.6 to 0.6) calculated between MeDIP microarray probe intensities and the plasma IL-4 (first) and IL-10 (second) levels obtained from the same subject. These correlations did not reach significance after correcting for multiple testing in T cells. In red are probes whose methylation levels correlate positively with the cytokine level in plasma andin green are those that correlated negatively. In the top panel, the last track shows the average methylation level for all the subjects in T cells estimated from the microarray data (n = 20). The bottom panel shows the two regions close to the TSS of each STAT6 isoform where the CPA group is found significantly more methylated than the control group. The regulatory elements from ENCODE identified in these regions (see methods) are shown in the additional tracks. First, shown with black lines, is the location of individual CpG sites. Second, is the location of DNase hypersensitive clusters where black indicate strong signal and grey a weaker signal from ChIP-seq data in 24 cell lines. Third is the location of transcription factors (TF) identified from ChIP-seq data in 24 cell lines where black indicate a strong presence and grey weaker signal occupancy. The letter next to the TF boxes identified the cell line where it was found enriched (see Table S1 for the full legend). The last tracks, identified the level of enrichment of three histone marks determined from ChIP-seq assay, histone 3 lysine 4 tri- and mono-methylation as well as histone 3 lysine 27 acetylation in two cell lines, GM12878 (pink) and K562 . doi:10.1371/journal.pone.0071691.g004
chromosomes corresponded to CTAs. These genes are commonly expressed in testis and/or placenta, but infrequently expressed in non-germline normal tissues , and their expression is reactivated by 5-aza-dC . The second evidence was that a significant proportion of the candidate genes located in autoso- mal chromosomes contained a canonical 59 CpG island (76%), whereas we estimated that only 53% of CCDS genes interrogated in the HU133 plus 2.0 microarray would contain one. In addition, the identification of DNAmethylationin a high proportion of the genes analyzed in MCL cell lines (80%) strongly supports the effectiveness of our approach. Gene methylation events may occur during the establishment of cancer cell lines, but in some models there is a good relationship between the methylation events detected in cell lines and the methylation pattern of the primary tumours from which the cell lines were derived . We observed a good correlation between methylation events in cell lines and primary MCL since we found partial methylationin 80% of the genes analyzed in primary MCL. This association suggests that MCL cell lines are a good model to study epigenetic aspects of MCL. All these results confirmed that our approach is robust and reliable, and suggests that we may be able to identify additional epigenetically regulated genes in the initial set of 252 methylated candidates. In addition, the pathway analysis showed that the top molecular and cellular functions overrepresented in our candidate gene list were cell death, cell cycle and cellular growth and proliferation. Recently, Leschenko et al. have reported the global methylation of a series of primary MCL using the HELP assay . The authors described that the phenomenon of gene hypomethylation was more frequent than the hypermethylation in primary MCL. Furthermore, they identified a group of genes that were hypermethylated in primary MCL compared with naı¨ve B-cells. Our approach, based in the pharmacological reversion of DNAmethylation, only allow us to identify hypermethylated genes precluding the evaluation of hypomethylation in our series. When we compared our list of potentially hypermethylated genes with the genes described as hypermethylated in MCL by Leschenko we observed that 28 genes of our study were also described as methylated in MCL by Leschenko et al.
the IGFBP3 and ZNT5 promoter regions was quantified (Table 1). CpG site 4 in ZNT5 was very highly methylated with little inter-individual variation (median methylation = 100%) and was excluded from further statistical analysis. CpG sites 5 and 6 in ZNT5 were dropped from further statistical analysis due to poor success rates (<80%) in both maternal and neonate samples. The remaining CpG sites (3 within IGF2, 5 within IGFBP3 and 2 with ZNT5) demonstrated high success rates in both maternal and neonate samples (Table 1) and were taken forward for analysis. In addition, there were modest to strong correlations (average rho: 0.64-0.82) demonstrated between methylation levels at the CpG sites mapping to the IGF2 and the IGFBP3 loci in both mothers and infants (Table 1). Hence, mean methylation levels across these two loci were calculated and these means were also used in further analyses. Weaker correlations (rho<0.6) were found between methylation levels of the CpG sites measured within the ZNT5 locus therefore only methylation values for individual CpG sites were assessed further. Measurement of global DNAmethylation was less successful in the maternal samples (74%) compared to that in the neonate samples (90%). However, we chose to retain this measure for analysis but acknowledge that any findings must be viewed with caution.
The p-values and fold-changes (relative expression of Duroc to Pietrain allele of origin) for significant eQTL were input into the Ingenuity Pathways Analysis software (Ingenuity Systems, Red- wood City, CA, USA) for further data mining of pathways subject to genetic control. Three gene networks were enriched for differentially expressed genes between alleles of alternative breed origin (Table 3). This suggests that the corresponding eQTL genes influence loin muscle tissue accretion via common metabolic pathways. One network associated with lipid metabolism includes two members of the cytochrome P450 4F family of genes which are involved in the metabolism of long chain fatty acids, and these two genes were overexpressed in animals carrying the Duroc allele. This network also contains several genes including aldo-keto reductase 7A2 (AKR7A2), thioredoxin domain containing 12 (TXNDC12) and translocase of inner micochondrial membrane 44 (TIMM44) that have functions related to oxidative stress and which were overexpressed in animals carrying the Pietrain allele. A second network associated with the cell cycle and lipid metabolism includes three members of the glutathione S-transferase mu family as well as
Although it was unexpected to find that FRT-like sequences can originate from more than one location in LINE1 (Figure 6A), on average, FRT-like sequences are more rare in repeated genomic DNA than in unique DNA: repetitive DNA constitutes about 50% of the human genome but only about 10% of all FRT-like sequences are part of repeats. This observation is not surprising taking into account that the majority of repeated DNA elements are relatively short. Indeed, only infrequent full-length or near full- length units of LINEs or LTR transposons are long enough to have a high probability of occurrence of FRT-like sequences (Figure 5), while the majority of copies of LINEs are shorter than 1 kb and SINEs are only 100–400 bp long . Moreover, since different members of a family of repeated DNA elements are usually not identical due to random mutations, one repeated element might have an FRT-like sequence but another might not. Multiple identical copies of duplicated FRT-like sequences can, in theory, present a challenge for targeting unique FRT-like sequences of interest if the duplicated sequences have a high degree of homology to a chosen unique FRT-like site. In this scenario, a Flp variant evolved to recombine the unique FRT-like site could, to a certain degree, recombine the repeated FRT-like sequences. To prevent this and therefore minimize the off-target effects of the recombination system, the repeated FRT-like sequence should be used as one of the counter-selection targets during evolution of a Flp variant.
This report will attempt to meet these challenges, at least in part. To achieve this, we first should define a good measure of gene-gene interaction. Despite current enthusiasm for investigation of gene- gene interactions, published results that document these interac- tions in humans are limited and the essential issue of how to define and detect gene-gene interactions remains unresolved. Over the last three decades, epidemiologists have debated intensely about how to define and measure interaction in epidemiologic studies [7,8,11– 15]; The concept of gene-gene interactions is often used, but rarely specified with precision . In general, statistical gene-gene interaction is defined as departure from additive or multiplicative joint effects of the genetic risk factors . It is increasingly recognized that statistical interactions are scale dependent . In other words, how to define the effects of a risk factor and how to measure departure from the independence of effects will greatly affect assessment of gene-gene interaction. The most popular scale upon which risk factors are measured in case-control studies is odds- ratio. The traditional odds-ratio is defined in terms of genotypes at two loci. Similar to two-locus association analysis where only genotype information at two loci is used, odds-ratio defined by genotypes for testing interaction will not employ allelic association information. However, it is known that interaction between two loci will generate allelic associations in some circumstances . Since they do not use allelic association information between two loci, the statistical methods based on the odds-ratio that is defined in terms of genotypes will have less power to detect interaction. To overcome this limitation, we will define odds-ratio in terms of a pseudohaplotype (which is defined as two alleles located on the same paternal or maternal chromosomes) for measuring interac- tion, and then we will investigate its properties and develop a statistic based on pseudohaplotype defined odds-ratio for testing interaction between two loci (either linked or unlinked).
Apc Min/+ adenomas displayed few changes inDNAmethylation when compared with intestinal stem cells (2.7% of the canonical AIMS-Seq amplicons), and a large subset of them were shared with differentiated cells (in the case of hypermethylations), and the non-tumor tissue adjacent to the adenoma (in the case of hypomethylations). It has been previously reported that the dy- namics of hypermethylation and hypomethylation are independent in human colorectal cancer [51,64,65]. In this context, we can postulate that hypomethylation would be an early phenome- non preceding the morphological emergence of the adenoma and consistent with a field effect  in which the adjacent tissue already displays a large fraction of the molecular changes bear by the tumor. Moreover, hypermethylation in adenomas would recapitulate in part the ISCs differentiation process and incorporate tumor specific changes. The conservation of DNAmethylation profiles may explain the retention of cell differentiation capacity in Apc Min/+ microadenomas . In fact most Apc Min/+ adenomas maintain the crypt architecture and do not progress to carcinoma . The subsequent cell transformation would require additional changes consistent with the stepwise model of cancer progression in which genomic and epige- nomic instability drive subsequent alterations.
It is not clear at present whether there is some sort of signal from an inactive gene that would result in silencing through DNAmethylation. One attractive possibility is that chromatin structure may be informative to the methylation machinery. Lysine acetyla- tion at the histone tails by histone acetyl transferases facilitates the access of trans- cription factors to the gene. Histone dea- cetylases (HDAC) reverse the process, re- ducing the transcription rate of the gene (27). It is worth mentioning at this point that mutations in the HDAC genes may result in cancer (1). This implies that the covalent modifications of the core histones are inti- mately associated with the transcriptional activity and could be read by the methyla- tion machinery. Studies in Neurospora, Dro- sophila and other organisms have indicated a clear association between histone methyla- tion andDNAmethylation, as particularly shown by the demonstration that inactivat- ing mutation in the gene of a histone methyl- transferase [with activity towards Lys9 of histone H3 (H3K9)] abolished genome me- thylation. In mammals andin yeast, Lys9 methylationin histone H3 is associated with transcriptionally repressed heterochromatin. If DNAmethylationin mammals is proven
The AP2/ERF transcription factor family, one of the largest families unique to plants, per- forms a significant role in terms of regulation of growth and development, and responses to biotic and abiotic stresses. Moso bamboo (Phyllostachys edulis) is a fast-growing non-tim- ber forest species with the highest ecological, economic and social values of all bamboos in Asia. The draft genome of moso bamboo and the available genomes of other plants provide great opportunities to research global information on the AP2/ERF family in moso bamboo. In total, 116 AP2/ERF transcription factors were identified in moso bamboo. The phylogeny analyses indicated that the 116 AP2/ERF genes could be divided into three subfamilies: AP2, RAV and ERF; and the ERF subfamily genes were divided into 11 groups. The gene structures, exons/introns and conserved motifs of the PeAP2/ERF genes were analyzed. Analysis of the evolutionary patternsand divergence showed the PeAP2/ERF genes under- went a large-scale event around 15 million years ago (MYA) and the division time of AP2/ ERF family genes between rice and moso bamboo was 15–23 MYA. We surveyed the puta- tive promoter regions of the PeDREBsand showed that largely stress-related cis-elements existed in these genes. Further analysis of expression patterns of PeDREBs revealed that the most were strongly induced by drought, low-temperature and/or high salinity stresses in roots and, in contrast, most PeDREB genes had negative functions in leaves under the same respective stresses. In this study there were two main interesting points: there were fewer members of the PeDREB subfamily in moso bamboo than in other plants and there were differences in DREB gene expression profiles between leaves and roots triggered in response to abiotic stress. The information produced from this study may be valuable in overcoming challenges in cultivating moso bamboo.
monooxygenases, well known for their roles in metabolism of fatty acids, steroids, and other lipophilic molecules (De- nisov et al., 2005). The M. tuberculosis genome sequence revealed an unexpectedly high number of CYP450s (Cole et al., 2001). Among these, the second largest of the M. tu- berculosis CYP450s is CYP128 (53,313 Da) encoded by cyp128 that is predicted to metabolize menaquinone as a step towards its sulfation (Holsclaw et al., 2008). The cre- ation of genome-wide transposon libraries enabled the clas- sification of CYP128 as a gene required for optimal growth of M. tuberculosis, and as upregulated in cell starvation (McLean et al., 2007). CYP124 encoded by cyp124 is found in pathogenic and nonpathogenic mycobacteria spe- cies, actinomycetes, and some proteobacteria, which sug- gests that it has an important catalytic activity (Ouellet et al., 2010). It is located adjacent to a three-gene operon con- taining a sulfotransferase (Sft3, Rv2267c) that catalyzes the PAPS-dependent sulfation at the w-position of menaqui- none MK-9 DH-2 (Holsclaw et al., 2008; Mougous et al., 2006). The biochemical characterization of CYP124 in- cludes identifying a series of substrates consistent with w-hydroxylase activity and, importantly, a marked prefer- ence for lipids containing methyl branching (Johnston et al., 2009). To date, gene disruption and gene deletion stud- ies have shown that M. tuberculosis cyp128 is an essential gene for cell growth and viability (McLean et al., 2008). Cyp138 are induced at elevated temperatures (Stewart et al., 2002). Some studies have reinforced the fact that M. tu- berculosis P450s play important cellular roles and are most important in the pathogen’s response to environmental stimuli and immune/chemical abuse (McLean et al., 2007). The upregulation of the M. tuberculosis cytochrome P450 enzyme genes may be a adaptive response to environmental changes to survive. The trigger for the induced transcrip-
Currently available methods to assess DNA damage include electrophoretic techniques such as pulse-field gel electrophoresis (PFGE) or single-cell electrophoresis (Comet assay) for a global assessment of DNA fragmentation. Ligation-mediated polymerase chain reaction (LM-PCR) is also commonly used for quantitatively displaying DNA lesions in mammalian cells because it combines nucleotide-level resolution with the sensitivity of PCR but is limited by the use of sequence-specific primers. All of the aforementioned approaches suffer from one important limitation as they they do not allow mapping of DNA strand breaks on a genome- wide scale and cannot identify new sensitive sites or hotspots harboring such break sites. Therefore, a reproducible method for the genome-wide mapping of DNA strand breaks would be useful to study their global distribution all at once and monitor any alteration in damage profile under different experimental conditions. Here, we provide a detailed description of a straightforward strategy, termed ‘‘damaged DNA immunoprecipitation’’ or ‘‘dDIP’’. This method uses the immunoprecipitation of biotin-modified nucleotides added by the terminal deoxynucleotidyl transferase (TdT) at sites of DNA damage (see Figure 1). Although a similar approach has been used recently to map nuclear receptor-dependant tumor translocations , we describe for the first time its genome-wide application resulting from the development and optimization of this method by our group over the past three years. Because of its potential widespread use ingenome research, we provide the important experimental details and key findings for the reliable capture and enrichment of damaged DNA sequences in the form of strand breaks.
All data were expressed as mean 6 standard deviation. Statistical analysis was performed using Student’s t-test to compare two variables of microarray data. For example, the statistical significance of a microarray result was analyzed by fold change, and a difference with P,0.01 was considered statistically significant. The false discovery rate was also calculated to correct the P value. The threshold value we used to screen differentially expressed lncRNAs and mRNAs was a fold change $2.0 (P ,0.01). Furthermore, a differential expression of each lncRNA between hepatoblastoma and the paired distant noncancerous tissues was analyzed using Student’s t-test with SPSS (Version 16.0 SPSS, Chicago, IL, USA). P ,0.05 was considered significant.
This experiment serves as a case study of the effects of environmental heterogeneity on genome-wide variation in a simplified setting, demonstrating distinct differences between environmental heterogeneity in space versus time and between sites likely to be closely linked or not to sites under differential ecological selection (Figure 5). Yet this experiment is only a single test of environmental heterogeneity at one time point; different environments, different spatial or temporal scales of heterogeneity, or different organisms could yield different patterns because the results will depend on the genetic basis of adaptation and the nature of selection . A recent study in yeast suggests that antagonistic effects may be common . But other field studies in plants found that patterns of conditional neutrality are more common [45,46]. A recent study in Brassicaceae quantified the proportion of conditional neutral and EAS QTLs across genomeand found that conditional neutrality is more common than EAS (8% vs. 3% of the genome, ). The ultimate challenge remains to determine how environmental variation affects patterns of diversity and quantitative genetic variation in nature in different organisms (e.g., ). Because the constraints of real systems make it difficult to cleanly test these effects in nature, experimental evolution provides a helpful step towards testing the key principles .
The LCR (nt 7008–124) of the HPV16 reference clone (sequence identical to that published in the GenBank under the accession number NC_001526) was PCR-amplified with Pfu polymerase (Stratagene) and specific primers containing restriction enzyme recognition sequences for HindIII (5 9-ATCAAGCTT- GACCTAGATCAGTTTCCTTTAGGAC-39), and BamHI (59- ATCGG ATC CTCCTGTG GGTCCTGAA ACATT GCAG-39). Restriction-digested PCR products were cloned into pGLuc-Basic Vector (New England Biolabs) giving pGLuc-LCR16 reporter vector construct. The constructs were verified by DNA sequenc- ing. Site specific modifications within the LCR HPV16 of pGLuc- LCR16 reporter vector construct at the 7450-acCGaattCGgt - 7461 site was generated using modified oligonucleotides purchased from Thermo Scientific. Modified oligonucleotides contained either a mutated (5 9-GCTTCAACTGAATTTGGTTGCAT- fGC-3 9), methylated (59-GCTTCAAC M CGAATT M CGGTTG- CATGC-39) or an unmethylated (59-GCTTCAACCGAATTCG- GTTGCATGC-39) CpG dinucleotides at the site of interest and spanned 25 bps on either side of the site. PCR was carried out according to the protocol in the QuickChange site-directed mutagenesis kit (Stratagene) using 1 ng/reaction of the LCR HPV16 luciferase reporter as template. Following PCR the reaction was treated with DpnI for one hour to ensure that the plasmid used for template was digested. Products were purified using the PCR purification kit (Qiagen). Concentrations of the product were determined by ethidum bromide gel electrophoresis quantification. These products were then used directly for transfection using Fugene HD method (Roche Applied Science).
Quantitative analysis of different NCAM transcripts revealed that deletion of L1 5 9 UTR (or SP) increased exon 9 skipping about 5-fold (Figure 6C, cf. lanes 3, 5 and 6), consistent with the RT-PCR experiments (Figure 6B). Deletion of L1 59 UTR resulted in a small change of the ratio between intron-containing and Fl transcripts, i.e. there were slightly more (about 2-fold) intron-containing transcripts per Fl transcripts in Fl construct experiment than in DL1 experiment (cf. lanes 9 and 11). Similarly to ABCA (Figure 5A), potential activation of the L1 SP (or exonization of this region) by readthrough transcription initiated from the SV40 promoter was observed (Figure 6C, cf. lanes 16 and 17). However, in this case no activity of L1 SP was detected when SV40 promoter was deleted. L1 SP also inhibited SV40 transcription, because its deletion increased SV40 promoter activity about 2-fold for Fl transcripts and about 8-fold for alternatively spliced transcripts (Figure 6C, cf. lanes 3 and 6). Properly spliced Fl and intron-containing transcripts were found mostly in the cytoplasmic fraction (Figure 6D). Our bioinformatic analysis suggested that cryptic polyadenylation in intron 9 may be due to the presence of L1. Indeed, a small increase of polyA- containing transcripts was detected for Fl transfected construct (compared to DL1) in RT-PCR experiment (data not shown). However, due to the very low abundance of these transcripts, we were unable to quantitate them by RPA. In conclusion, our analysis revealed that L1 5 9 UTR could strongly interfere with alternative splicing and to a lesser extent affect the intron retention in NCAM.
Total RNA was isolated from porcine eye cells using the TRNzol reagent (TIANGEN, Beijing, China) according to the instructions of the manufacturer. The RNA was treated with DNase I (Fermentas) and reverse transcribed to cDNA using the BioRT cDNA first strand synthesis kit (Bioer Technology, Hangzhou, China). Quantitative real- time PCR was performed to determine the expression of H19 and IGF2. The primer sequences are listed in Table 1. Quantitative gene expression analysis was carried out ac- cording to the manufacturer’s instructions using the BIO- RAD iQ5 Multicolor Real-Time PCR Detection System with the BioEasy SYBR Green I Real Time PCR Kit (Bioer Technology, Hangzhou, China). The thermal cycling con- ditions were 95 °C for 3 min, followed by 40 cycles of de- naturation at 95 °C for 10 s, annealing at 55 °C for 15 s, and extension at 72 °C for 30 s. The 2 -DDCT formula was used to determine relative gene expression, which was normalized to the quantity of GAPDH mRNA. All experiments were repeated three times for each gene. Data are expressed as means ± S.E.M.
Genome-wideanalysis revealed the existence of 78 full-length Dof genes, and multiple sequence alignment of the GmDof proteins showed strong conservation of four cysteine residues and the other amino-acid residues in the Dof domains. Phylogenetic analysis revealed that all GmDofs were clustered into nine distinct subgroups. The exon/intron structure and motif composition of the Dofs were highly conserved in each subfamily, indicating their functional conservation. The Dof genes were non-randomly distributed within and across 19 chromosomes, and a high proportion of GmDofs were preferentially-retained duplicates located on duplicated blocks. Soybean-specific segmental duplications of the genome contributed significantly to the expansion of the soybean Dof gene family. The comparative phylogenetic analysis of soybean Dof proteins with Arabidopsis and rice Dof proteins revealed four Major Clusters of Orthologous Groups and nine well- supported clades. The global expression profile analysis provided insight into the soybean-specific functional divergence among members of the Dof gene family. A majority of GmDofs showed specific temporal and spatial expression patterns, based on RNA-seq data analyses. The expression patterns of duplicate genes were partially redundant or divergent. The cis- regulatory element analysis of the predicted Dof genes revealed differences in common cis-elements across these promoter regions including both their number and distance from the start codon. The results presented here provide information useful for the functional characterization of soybean gene families by combining phylogenetic analysis with global gene expression profiling.