• Nenhum resultado encontrado

CONTENT ARE RELATED TO THE BACTERIAL GENOME EVOLUTION

No documento OF GENOME REGULATION (páginas 159-162)

Suslov V.V.1*, Safronova N.S.1, 2, Orlov Y.L.1, 2, Afonnikov D.A.1, 2

1Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia;

2Novosibirsk State University, Novosibirsk, Russia e-mail: valya@bionet.nsc.ru

*Corresponding author

Key words: evolution, prokaryotic genomics, GC content, non-specifi c adaptation

Motivation and Aim: GC content, genome size, and other genome characteristics are integral genome features limiting potential spectrum of licenses (habitats) of prokaryotic organisms. There are other specifi c and non-specifi c adaptations which are important for evolution. Most studied non-specifi c adaptations are inducible, such as stress response.

Traditionally, GC content is treated as non-inducible non-specifi c adaptation maintaining genome stability. In average, the more habitat agents damaging DNA or codon-anticodon interactions are, the higher is GC content of the given genome. At extreme habitat environments this trend lost due to other specifi c molecular mechanisms maintaining genome stability and gene expression. But Wu et al. [1], studying eubacteria, assume that GC content is semineutral trait, related to features of replication/reparation machinery in different prokaryotics taxa (dimeric combinations of DNA polymerase III alpha subunits:

dnaE1/ dnaE2/dnaE3 groups).

Methods and Algorithms: We have downloaded from NCB ftp-site 1214 annotated and 1586 assembled complete genome sequences of prokaryotic organisms [2]. We counted correlation of GC content and genome size for all genomes and for groups – 3 archeal taxa and joint archeal group, and 19 bacterial taxa and 1 joint eubacterial group.

Results: Correlations of genome size and GC content are high for Archaea (104 species, r=0,35), Eubacteria (1478 species, r=0,59) and all Prokaryota (r=0,46). The same correlation trend is present after separation of the genome sample by taxa for all Archaea, and most Eubacteria excluding Deinococcus-thermus, Fusobacteria and Planctomycetes. Trend for Eubacteria agreed with Pol III classifi cation in [1], but for 3 exceptions. We argue that for Deinococcus-thermus and Planctomycetes specifi c GC content maintained adaptively in opposite to replication/reparation features. Comparing genome size, GC content and preferable habitats we show that for eurybiont species genome size is larger, but for complex ecosystems (such soil environment) GC content is larger because extremophiles, highly specialized prokaryotes don’t hold high GC content.

In particular only ecology grouping “prokaryotes of soil ecosystems” is characterized in average by both high GC content and large genome size.

References:

H. Wu et al. (2012) On the molecular mechanism of GC content variation among eubacterial genomes, Biol Direct, 7: 2.

V.V. Suslov, D.A. Afonnikov, N.L. Podkolodny, Yu.L. Orlov (2013) Genome features and GC content in prokaryotic genomes in connection with environmental evolution. Paleontological Journal, 47(9), 1056-1060.

1.

2.

SSRFace: AN IDENTIFICATION AND SEARCH TOOL FOR GENOMIC AND TRANSCRIPTOMIC SSR

Tan H., Hou X.*

State Key Laboratory of Crop Genetics and Germplasm Enhancement, Key Laboratory of Biology and Germplasm Enhancement of Horticultural Crops in East China, College of Horticulture, Nanjing Agricultural University, Nanjing, P.R. China

e-mail: tanhuawei1991@163.com

*Corresponding author

Key words: SSR, identifi cation, genomic, transcriptomic

Motivation and Aim: Simple sequence repeats ( SSRs ) have proven to be highly polymorphic, easily reproducible, co-dominant markers. The advent of the genomics age has resulted in the increasingly massive amounts of available genomic and tran- scriptomic DNA sequence data, while the existing development tools are dispersive and inconvenient.

Results: Here, we developed an identifi cation tool for genomic and transcriptomic SSR. After DNA submission by users, our combined pipeline would hunt for SSRs and design relevant primers, especially in transcriptomic analysis. Moreover, a SSR search interface containing over 20 plants was exploited in advance to search SSRs rapidly. For example, we identifi ed 25682 SSRs in ~94Mb Arabidopsis thaliana genomic sequence and 3302 SSRs in its 35386 gene CDS sequences. We also identifi ed 145842 and 4568 SSRs in Brassica rapa ~284Mb genomic and 41019 gene CDS sequences, respectively.

The proportion of SSR unit sizes in B. rapa CDS was extremely unevenly distributed:

177 (3.87%) were mononucleotide, 195 (4.27%) dinucleotide, 4155 (90.96%) tri-nu- cleotide, 3 tetra- , 1 penta- and 37 hexa-nucleotide SSRs. In contrast, 100591 (68.97%) were mononucleotide, 33887 (23.24%) dinucleotide, 10051 (6.89%) tri-nucleotide, 962 tetra- , 190 penta- and 161 hexa-nucleotide SSRs in B. rapa genomic sequence. A similar distribution also exists in other plants.

Conclusion: We designed an identifi cation and search tool for genomic and transcrip- tomic SSR development. Plant genomic sequences contain much more SSRs than tran- scriptomic sequences. Interestingly, mononucleotide and tri-nucleotide were the most abundant type founded in plant CDS sequences and genomic sequences, respectively.

Availability: SSRFace tool and plant SSRs are available at: http://nhccdata.njau.edu.

cn/SSRFace .

GRAPH ANALYSIS OF E. COLI TRANSCRIPTION REGULATION

Temlyakova E.A.*, Sorokin A.A.

Institute of Cell Biophysics RAS, Pushchino, Russia e-mail: evgenia.teml@gmail.com

*Corresponding author

Key words: transcription regulation, graph analysis, E.coli, RegulonDB

Motivation and Aim: We have constructed a graph database containing information about Escherichia coli genome, proteome, metabolome, transcription and translation regulation, etc. All the data were taken from external well-known databases (Genbank, Uniprot, RegulonDB, etc). To show advantages of data storage in graph structures, we analyzed E.coli transcription regulation using subgraph based upon well-known RegulonDB data.

Methods and Algorithms: Experimental and analytical data from RegulonDB [1]

were used to create a graph structure that was stored in Neo4j graph database [2]. The analysis was performed using Cypher queering language and Python.

Results: Based on RegulonDB data about E.coli transcription regulation we construct a colored attribute graph with 13 nodes (operon, transcription unit, gene, promoter, TFBS, TF, etc) and 8 relations (contains, encodes, binds, initiates, etc) types. Each node contains node-specifi c properties for analysis and reference to original data source. A few specifi c regulation patterns were revealed and discussed.

Conclusion: Graph representation of molecular biology data allows researchers to overview data at various scales: starting from an object characteristics and up to whole network of a particular mechanism in an organism.

Availability: Available upon request.

Acknowledgements: This work was supported by RFBR grant 14-04-31793 mol_a.

References:

H. Salgado et al. (2012) RegulonDB (version 8.0): Omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more, Nucleic Acids Research, 41(D1): D203- D213.

Neo4j web-site (Neo Technology): http://www.neo4j.org/

1.

2.

RECONSTRUCTION OF ASSOCIATIVE GENE NETWORKS

No documento OF GENOME REGULATION (páginas 159-162)

Documentos relacionados