• Nenhum resultado encontrado

Current Computer-Aided Drug Design - PPM GUMed

N/A
N/A
Protected

Academic year: 2023

Share "Current Computer-Aided Drug Design - PPM GUMed"

Copied!
12
0
0

Texto

(1)

Polska Platforma Medyczna

Polish Platform of Medical Research https://ppm.edu.pl

Repozytorium Gdańskiego Uniwersytetu Medycznego Repository of Medical University of Gdańsk

https://ppm.gumed.edu.pl

Publikacja / Publication

Characterizing the Zika virus genome : a bioinformatics study,

Nandy Ashesh, Dey Sumanta, Basak Subhash C., Bielińska-Wąż Dorota, Wąż Piotr

DOI wersji wydawcy / Published version

DOI http://dx.doi.org/10.2174/1573409912666160401115812

Adres publikacji w Repozytorium URL /

Publication address in Repository https://ppm.gumed.edu.pl/info/article/GUM84aa1e22ccd24232b8681a107e7581f5/

Data opublikowania w Repozytorium /

Deposited in Repository on 1 lip 2020

Rodzaj licencji / Type of licence Attribution CC BY

Cytuj tę wersję / Cite this version

Nandy Ashesh, Dey Sumanta, Basak Subhash C., Bielińska-Wąż Dorota, Wąż Characterizing the Zika virus genome : a bioinformatics study, Current Piotr:

Computer-Aided Drug Design, vol. 12, no. 2, 2016, pp. 87-97, DOI:10.2174 /1573409912666160401115812

(2)

Current Computer-Aided Drug Design

Current Computer-Aided Drug Design

ISSN: 1573-4099 eISSN: 1875-6697

Impact Factor:1.155

Current Computer-Aided Drug Design, 2016, 12, 87-97

Characterizing the Zika Virus Genome – A Bioinformatics Study

87

1875-6697/16 $58.00+.00 © 2016 Bentham Science Publishers

Ashesh Nandy

*,1

, Sumanta Dey

1

, Subhash C. Basak

1,2

, Dorota Bielińska-Wąż

3

and Piotr Wąż

4

1Centre for Interdisciplinary Research and Education, 404B Jodhpur Park, Kolkata 700068, India;

2University of Minnesota Duluth-Natural Resources Research Institute and Department of Chemistry and Biochemistry, University of Minnesota Duluth, 5013 Miller Trunk Highway, Duluth, MN 55811, USA; 3Department of Radiological Informatics and Statistics, Medical University of Gdańsk, Tuwima 15, 80-210 Gdańsk, Poland; 4Department of Nuclear Medicine, Medical University of Gdańsk, Tuwima 15, 80-210 Gdańsk, Poland

Abstract: Background: The recent epidemic of Zika virus infections in South and Latin America have raised serious concern on its ramifications for the population in the Americas and spread of the virus worldwide. The Zika virus disease is a relatively new phenomenon for which sufficient and comprehensive data and investigative reports have not been available to date.

Objective: To carry out a bioinformatics study of the available Zika virus genomic sequences to characterize the virus.

Method: 2D graphical representation method is used for visual rendering and compute sequence parameters and descriptors of the African and Asian-American groups of the Zika viruses to characterize the sequences. We also used MEGA5.2 and other software to compute various biological properties of interest like phylogenetic relationships, transition-transversion ratios, amino acid usage, codon usage bias and hydropathy index of the Zika genomes and virions.

Results: The phylogenetic relationships show that the African and Asian-American Zika virus genomes are grouped in two clades. The 2D plots of typical genomes of these types also show dramatic differences indicating that the gene sequences at the 5’-end coding regions for the structural proteins are rather strongly conserved. Among other characteristics, the transition/transversion ratio matrices for the sequences in each of the two clades show that analogous to the dengue virus, the transition rates are about 10 to 15 times the transversion rates.

Conclusion: These findings are important for computer-assisted approaches towards surveillance of emerging Zika virus strains as well as in the design of drugs and vaccines to combat the growth and spread of the Zika virus.

Keywords: Zika virus, Zika virus phylogeny, Zika virus characterization, African and Asian-American clades, 2D graphical representation, amino acid changes, cladewise transition-transversion ratios, Zika sequence descriptors.

Received: February 28, 2016 Revised: March 21, 2016 Accepted: March 31, 2016

INTRODUCTION

The sudden emergence of an epidemic of Zika virus infections in South America has raised concerns on its virulence and transmission potential, especially in view of mass gatherings at carnivals in that part of the world, the planned Olympic Games in Rio, and even at Hajj in Saudi Arabia which many citizens of South America plan to experience in this year [1]. While the virus is not contagious and causes mild to no symptoms in the average male, it is suspected to cause microcephaly in new born babies of mothers infected with the Zika virus [2] and has led Colombia, Ecuador, El Salvador and Jamaica to caution their women against pregnancy until more was known about it [3, 4]. Although no scientific causal links have yet been established between the reported cases of microcephaly and the Zika infections, considering the severity of the possible consequences of the Zika virus infections, the World Health Organization (WHO) in a statement dated 1st February 2016 [5] declared the outbreak of microcephaly cases and other neurological disorders as constituting Public Health Emergency of International Concern.

The Zika virus, a vector borne disease, is spread by bites of the Aedes aegypti mosquito that is widespread in the

*Address correspondence to this author at the Centre for Interdisciplinary Research and Education, 404B Jodhpur Park, Kolkata 700068, India;

Tel: +919433579452; E-mail: anandy43@yahoo.com

tropics, and to a lesser extent by the Aedes albopictus that ranges in the Americas up to the Great Lakes. The Aedes aegypti vector is expected to be even more active than normal and spread further north due to the effects of climate change related to recent global warming [6, 7]; Brazil’s efforts at controlling the mosquito that is also the vector for dengue are yet to produce acceptable results [8] and adds to the overall concern. Coming soon after the scare of Ebola virus, and with the background of the 2009 H1N1 pandemic, the Middle East Respiratory Syndrome (MERS) and Severe acute Respiratory Syndrome (SARS) coronavirus epidemics, the near-pandemic of H5N1 bird flu, fatalities associated with the H7N9 and a recent avian epidemic of H5N2 virus in the USA [9], the Zika virus has raised issues of containment of viral activity and of further zoonotic diseases that may arise.

First isolated in 1947 from a rhesus monkey in the Zika forests of Uganda near Lake Victoria, the Zika virus was found in human hosts in Africa and Asia through serologic evidence. The first epidemic of the Zika virus infections was found in Micronesia in 2007 [10]; documentary evidence is found of a total of 14 human infections prior to this period [10]. Subsequently the infection seems to have been spread to the western hemisphere by travelers returning from Senegal and elsewhere [11]. The detection of very large number of microcephaly cases in newborns in Brazil in late 2015 raised suspicions of link to Zika, and although there are questions about the procedures in the statistics [2], there are

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(3)

enough circumstantial evidence in the spread of the virus and the incidences of microcephaly, Guillain-Barré syndrome and other neurological disorders reported by Brazil and from experiences in French Polynesia [12] to have led the WHO to declare the Emergency mentioned above [5]. It is in relation to these incidents that serious concerns have been raised on the virulence and transmission potential of Zika virus, especially in view of ease of international travel and mass gatherings at carnivals, Olympic Games, and other events where visitors from other countries may get infected and become involuntary carriers of the new disease to their countries around the globe [1].

The Zika virus (ZIKV) is a non-segmented single- stranded positive sense RNA molecule around 10800 bases long. The translation product of the genome consists of three structural proteins – capsid (C), pre-membrane (prM) and envelope (E) – and seven non-structural proteins: NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5, in order from 5’

to 3’ end. It belongs to the family flaviviridae genus flavivirus which also includes the dengue, yellow fever, Japanese encephalitis, West Nile virus and chikingunya viruses. While the others in the above mentioned list of flaviviruses are well researched, the Zika being a newly determined infector of human beings, had not attracted adequate scientific interest and therefore not enough is known of its characteristics. Kuno and Change [13]

characterized the Uganda 1947 MR766 strain along with two other African flavivirus strains, Haddow et al. [14] compared African and Asian strains from Malaysia and Cambodia and showed that the Asian strains are homologous and distinct from the African strains of the Zika virus, and Faye et al.

[15] characterized three Central African Republic strains.

To date, however, no paper appears to give an overall view of all the available African and Asian-American Zika viral strains. In the present work, we seek to characterize the Zika viral genome on an overall basis using all the genomes available to date. In addition to standard approaches of phylogenetic relationships, codon biases, transition- transversion ratios and so on, we also provide results on visualization of the base distributions in the Zika virus genomes through the recently developed techniques of 2- dimensional graphical representations and then characterize the genomes using mathematical sequence descriptors to quantify their mutual similarities and dissimilarities.

MATERIALS AND METHODS

We downloaded a total of 22 complete genome sequences of the Zika virus for characterization analysis. 11 of them are from African countries, and the rest from Asia, Pacific and the Americas, as available in the GenBank database as of February 15, 2016. The list of the sequences is given in Table 1. While the sequences of 12 of these are fully known, 10 of the available genomes have 2 or more unidentified bases;

the numbers of such ambiguous entries are given in the table.

The method we use to analyze the sequences has been explained in detail in our previous papers. We use the new technique of graphical representation [16] and numerical characterization of biomolecular sequences [17] to scan the ZIKV sequences for segments of interest. While many methods have been proposed (see Nandy, et al. [18] for a review), we use the 2D graphical representation technique

for visual clues and quick mathematical closure. In this representation the sequence is plotted base by base on a 2-dimensional rectangular grid going one step in the negative x-direction for an adenine, one step in the positive y-axis for a cytosine, one step in the positive x-direction for a guanine and one step in the negative y-direction for a thymine.

Starting from the origin, plots of bases as points according to this algorithm generate a curve in the 2D space that is characterized by the distribution of bases along the sequence.

Alternative axis identifications are, starting from the negative x-direction and going clockwise, A,T,G,C and A,G,C,T, the three systems exhausting all unique allocations.

Examination of the graphs of many sequences of the same gene can give visual clues of conserved regions and specific characteristics of the sequences. For a numerical measure of the graph, we define an average of the x- and y-coordinates of the points representing all N bases, μx = Σxi/N and μy = Σyi/N where i = 1,2,…, N and N is the total number of bases in the sequence. The distance from the origin to this centre of mass gR is defined as the graph radius and serves as sequence descriptor. The gR is found to be quite sensitive to any changes in the bases in the sequence and thus identical gR

between two sequences implies generally that the sequences have the same distribution pattern of bases except in some pathological cases [19].

One issue with this simple representation is that for some segments of a sequence the plot may retrace the path or reach the same point on the graph several times. To keep a count of the number of retracements, Bielińska-Wąż et al. [20, 21]

proposed a 2D-dynamic graphical representation model where each revisit to a point would be counted as 1; thus the second visit would increase the number to 2, the third visit to 3 and so on. These authors named the count as “mass”, which enabled the conceptual step towards defining nth-order moments of inertia. Replacing all counts as 1 reproduced the simple 2D graphical representation mentioned above.

To generate alignment results and phylogenetic trees, we used the Molecular Evolutionary Genetics Analysis software, MEGA 5.22 (www.megasoftware.net/mega4/mega.html).

This software has also been used to derive the transition- transversion ratios and other results.

RESULTS AND DISCUSSION

Aligned sequences of the 22 ZIKV genomes were found to be around 79% conserved which is remarkable considering the normally high level of mutations in RNA viruses [22]. However, it is comparable to the sequences of dengue types 1 and 3 genomes, also of the flavivirus genus.

Phylogenetic relationships of the genomic sequences of ZIKV are shown in Fig. (1). It is evident that they easily fall into two main groups, one comprising sequences from the African continent and another with sequences from Asian, Pacific and American countries. The African lineage is for the period 1947 to 2001, with changes in sequences locally accumulating over time. The clade in general matches closely with the envelope+NS5 concatenated phylogenetic tree derived in Faye et al. [15]. Curiously, two Senegal isolates from 2001 appear to have closer relationships with the 1947 Ugandan genome than the earlier isolates from the same country; in Faye et al. [15] phylogenetic tree, one of

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(4)

these, KF383118, is clustered with Senegal 1984. The other clade, with the sole exception of Malaysia strain of 1966, dates from 2007 in Micronesia, related closely to a strain isolated in Cambodia in 2010, showing the progress of the spread of the virus to French Polynesia in 2013 and the Americas, especially Brazil in 2015. The Asian lineage also matches with Haddow et al. [14] observation that the Malaysia and the Cambodian strains had some common ancestor and that the Cambodian strain was probably related to one that had been circulating for some time. Our phylogenetic tree also shows that the Micronesian strain belongs squarely in the Asian group and therefore probably owes its origin to some human traveler or a stray mosquito carrying the virus that may have found its way from Asia to the islands whose population had no immunity to it [10].

Similar arguments would prevail over the Brazilian- Suriname strains since they clearly belong to the Asian- American clade.

The difference in the basic structure of the two clades is reflected in several other attributes. The conserved bases cladewise as computed in MEGA 5.22 after alignment amount to 87.06% in the African clade and 92.6% in the non-African clade; combined they show conserved bases amounting to 79.05% of total bases as mentioned earlier. The high percentage of conserved bases in the non-African

genomes possibly arises from the fact that these are of recent origin, dating over a few years only, whereas the African genomes have mutated relatively more due to the longer time span involved. However, the fact that the percentage of conserved bases is significantly less when aligned together implies that there have been different mutations in the two groups.

The transition/transversion ratio matrices for the sequences in each of the two clades (Table 2) show that the transition rates are about 10 to 15 times the transversion rates; the ratios for the genomes of another flavivirus, dengue types 1 and 3, are also similar (data not included here, but those for the envelope genes published in Dey and Nandy [23]). In contrast, the transition/transversion ratios for mammalian beta globin genes, and also for the neuraminidase and hemagglutinin gene sequences of the influenza A virus are around 2 to 3, more in line with earlier observations [24].

Tamara and Nei [25] observed by considering nucleotide substitution patterns in the hypervariable segments of human mitochondrial DNA control region that the ratio of the observed number of transition to transversions was about 15 and that the transitional rate between pyrimidines, i.e., between T and C, was much higher than between purines (A and G), as also seen in the case of Drosophila.

Table 1. List of Zika virus genome sequences used.

GenBank Locus ID Zika Virus Genome Country Reference Genome Length Number of Unidentified Bases

KU321639 Brazil 2015 10676

KU365777 Brazil 2015 10662

KU365778 Brazil 2015 10727

KU365779 Brazil 2015 10662

KU365780 Brazil 2015 10662

JN860885 Cambodia 2010 10269 2

KF993678 Canada 2013 10141

KF383115 Central African Republic 1968 10272 25

KF268948 Central African Republic 1976 10788

KF268949 Central African Republic 1976-1980 10776

KF268950 Central African Republic 1976-1980 10755 2

KJ776791 French Polynesia 2013 10617

HQ234499 Malaysia 1966 10269 3

EU545988 Micronesia 2007 10272 2

HQ234500 Nigeria 1968 10251

KF383116 Senegal 1968 10272 3

HQ234501 Senegal 1984 10269 1

KF383117 Senegal 1997 10272 15

KF383118 Senegal 2001 10272 2

KF383119 Senegal 2001 10272

KU312312 Surinam 2015 10374

HQ234498 Uganda 1947 10269

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(5)

Table 2. Transition/Transversion rates matrices for Zika genome sequences of Africa and for Asian-Pacific- American strains.

African Clade

A T/U C G

A - 1.5 1.5 13.01

T/U 1.92 - 30.37 2.02

C 1.92 30.37 - 2.02

G 12.36 1.5 1.5 -

Asian-American Clade

A T/U C G

A - 0.96 0.98 13.19

T/U 1.22 - 33 1.31

C 1.22 32.52 - 1.31

G 12.35 0.96 0.98 -

Asian-American Clade without Malaysia 1966

A T/U C G

A - 0.97 0.99 13.62

T/U 1.24 - 32.54 1.32

C 1.24 32.03 - 1.32

G 12.77 0.97 0.99 -

To understand the distribution of bases in the Zika genome sequences, we represent graphically two genome sequences (Fig. 2) from the Central African Republic 1976 and from Brazil 2015 in a 2D grid [16] with axes A, G, C, T starting from the negative x-direction to represent the adenines and reading clockwise to represent the guanine, cytosine and thymine. The two sequences are seen to follow an almost north-easterly direction with the non-African sequence deviating slightly from the African sequence about a third of the way up. Since this is a representative map of the RNA sequence of Zika polyprotein, it implies that the gene sequences at the 5’-end coding for the structural proteins are rather strongly conserved and that the first few non-structural proteins have noticeable differences between the African and the non-African representatives.

To visualize the base distribution differences in a different way, consider the representation on a 2D graph where the axes are A, C, G, T reading clockwise again from the negative x-direction for the adenine [16]. Fig. (3) shows the same two genomes plotted in this axes system. The differences are quite noticeable with the Brazil genome sequence being elongated along the positive x-direction (guanine) quite considerably compared to the Central African Republic genome sequence, but much of the detail is lost due to overlaps of the paths. This is brought out very dramatically in the 2D-dynamic graph model [20, 21] where the number of reentries are shown in different colors (Fig. 4);

we show two other sequences in this instance, Nigeria 1968 and Suriname 2015, to reflect the general nature of the differences in base distributions in the genomes of the two clades. It is easy to see in this representation, where the axes Fig. (1). Phylogenetic tree of 22 Zika virus genome sequences.

KU365777_Brazil_2015 KU365780_Brazil_2015 KU365779_Brazil_2015 KU312312_Surinam_2015 KU365778_Brazil_2015

KJ776791_French_Polynesia_2013 KU321639_Brazil_2015

KF993678_Canada_2013 EU545988_micronesia_2007 JN860885_Cambodia_2010 HQ234499_malaysia_1966 HQ234501_Senegal_1984 KF383116_Senegal_1968 KF383117_Senegal_1997 HQ234500_Nigeria_1968 HQ234498_uganda_1947 KF383119_Senegal_2001 KF383118_Senegal_2001

KF383115_Central_African_Rep.1968 KF268949_Central_African_Rep.1976-1980 KF268948_Central_African_Rep.1976 KF268950_Central_African_Rep.1976-1980 0.000

0.005 0.010

0.015 0.020

0.025

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(6)

Fig. (2). Comparing Zika genomes from Central African Republic 1976 with Brazil 2015. Axes AGCT as shown in inset.

Fig. (3). Comparing Zika genomes from Central African Republic 1976 with Brazil 2015. Axes ACGT as shown in inset.

-100 0 100 200 300 400 500 600 700 800 900

-900 -800 -700 -600 -500 -400 -300 -200 -100 0

ZIka Genome Base Distributions - 1976 vs 2015

KU365777 Brazil 2015 KF268948 CAR 1976

A G

C T

-50 0 50 100 150 200 250

80 60 40 20 0 -20 -40 -60 -80 -100 KF268948 - Central African Republic 1976 Zika virus genome

A C

G T

-50 0 50 100 150 200 250

80 60 40 20 0 -20 -40 -60 -80 -100 KU365777 - Brazil 2015 Zika virus genome

A C

G T

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(7)

(a)

(b)

Fig. (4). Zika genome sequence base distribution plot in 2D-dynamic model. (a) Nigeria 1968 Zika genome (b) Suriname 2015 Zika genome.

-60 -25 10 45

0 40 80 120 160 200

HQ234500 Nigeria 1968

m=1 m=2 m=3 m=4 m=5 m=6 m=7 m=8 m=9 m=10 m=11 m=12 m=13 m=14 m=15 m=16 m=17 m=18 m=19 m=20 m=23 m=24

-60 -25 10 45

0 40 80 120 160 200

KU312312 Surinam 2015

m=1 m=2 m=3 m=4 m=5 m=6 m=7 m=8 m=9 m=10 m=11 m=12 m=13 m=14 m=15 m=16 m=17 m=19 m=20 m=23

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(8)

have been magnified, that in certain specific parts of the genome, viz., the portions that appear as big blobs, there are many repetitive parts of the walks, a result of loss of information consequent to representation in two dimensions which actually require higher dimensions, but that is outside the scope of this paper. What is important to note is that such reentries imply for the coding sequences a larger variety of amino acids than the more linear regions connecting the blobs which have comparatively more uniform pattern of bases. Thus, we can expect more complex mixture of bases in the blob regions, an observation that was the basis of a proposal several years ago to determine protein coding regions in newly sequenced DNAs [26].

Mathematically, a number of different parametrization methods can be used to characterize the DNA/RNA sequences, but here we consider only some basic parameters.

One such set considers a cumulative sum of the differences between the pyrimidines and the differences between the purines normalized over the length of the sequence, i.e., the

μx and the μy mentioned earlier (Table 3); we note that the μy

represents the difference between the co-ordinates of the cytosine points and the thymine points and the μx is the equivalent for adenine and guanine. A plot of the μx vs. μy of all the genomes, Fig. (5), shows that the spread of the points is much less for the non-African genomes than the African ones. This is possibly due to the comparatively lesser time these strains of the Zika virus have had to respond to mutational pressures; the African strains on the other hand, with a longer time frame at their disposal, have mutated significantly more and so are much more spread out in the chart.

In terms of protein sequences two items, among several, are generally of interest in characterizing genetic and genomic sequences: amino acid usage and hydropathy index.

Amino acid changes between the Zika viral sequences comprising the two clades can be traced from averages of the occurrences of the amino acids in the protein sequences of the genomes. The detailed results are given in Table 4 and a

Table 3. Intra-purine, intra-pyrimedene differences in Zika genomes in terms of sequence parameters μμx and μμy and the graph radii gR.

Clade GenBank Locus ID Country μμx μy gR

Asia-Pacific- American

HQ234499 Malaysia 1966 107.3623 -11.5883 107.9859

EU545988 Micronesia 2007 96.0001 -18.8349 97.83032

JN860885 Cambodia 2010 92.43265 -18.3415 94.23484

KF993678 Canada 2013 83.82595 4.172172 83.92972

KJ776791 Fr Polynesia 2013 90.7893 -20.2422 93.0185

KU312312 Surinam 2015 91.1846 -14.7798 92.37464

KU321639 Brazil 2015 89.45991 -22.1577 92.16312

KU365777 Brazil 2015 85.18224 -18.7933 87.23073

KU365778 Brazil 2015 82.96784 -17.7628 84.84799

KU365779 Brazil 2015 84.76243 -18.9814 86.86175

KU365780 Brazil 2015 85.12765 -18.8863 87.19753

Average 91.60682 Standard Deviation 6.946441

African

HQ234498 Uganda 1947 75.171 -17.6915 77.22479

HQ234500 Nigeria 1968 101.9132 -25.7423 105.114

KF383115 CAR 1968 64.11018 -31.6318 71.48906

KF383116 Senegal 1968 101.0587 -7.82131 101.3609

KF268948 CAR 1976 68.06007 -49.7933 84.32997

KF268949 CAR 1976-1980 58.26791 -51.5162 77.77578

KF268950 CAR 1976-1980 66.89947 -46.9652 81.73904

HQ234501 Senegal 1984 104.7397 -8.87115 105.1147

KF383117 Senegal 1997 88.38218 -45.3703 99.34723

KF383118 Senegal 2001 74.03739 -23.81 77.7718

KF383119 Senegal 2001 79.34638 -15.2449 80.79762

Average 87.46045 Standard Deviation 12.61858

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(9)

bar chart (Fig. 6) shows the deviations (in percentage terms) in amino acid usage in the non-African genomes compared to the African samples. It can be observed that in the non- African genomes, the usage of Asn, Asp and His has decreased substantially whereas usage of Leu, Met, Pro, Thr and Tyr has increased but not so dramatically. For the

hydropathy indices of the genomes, Fig. (7), we find that the non-African clade shows a slight tendency to increase over time, but those of the genomes from the African continent show no clear direction, perhaps again a consequence of many mutations.

Fig. (5). Intra-purine, intra-pyrimedene differences in Zika genomes plotted in terms of sequence descriptors.

Table 4. Amino acid usage frequencies in Zika virus genomes cladewise.

Clade 1 Genomes from Asian-Pacific-American countries

Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Total

EU545988_

micronesia_20074.5383898043.9788622942.1137705945.0046627291.9583462858.0820640352.7665526892.7354678275.56419023910.009325462.7665526891.492073367.242772775.6574448248.4550823759.7295617037.6468759713.9166925715.2844264841.056885297 3217 HQ234499_

malaysia_19664.8166563083.7911746432.2063393415.003107522.0820385337.8931013052.7346177752.4860161595.3449347429.8819142322.8589185831.5226848977.4580484775.5313859548.9496581739.3225605977.7377252953.7290242395.469235551.180857676 3218 JN860885_

Cambodia_20104.6967340593.9813374812.1150855374.8833592532.0217729398.1804043552.7682737172.6438569215.59875583210.202177292.8304821151.4307931577.0606531885.660964238.4603421469.7045101097.651632973.7636080875.2877138411.057542768 3215 KF993678_

Canada_2013 4.8098082363.9610185482.137692554.9669915121.9176359647.9849104062.8292989632.4834957565.5957246159.9968563342.7664256521.3832128266.9789374415.6585979258.51933354310.028292997.953473753.6152153415.3756680291.03740962 3181 KJ776791_

French_

Polynesia_2013

4.8677884624.0264423082.1935096154.987980769 1.953125 8.0228365382.7644230772.5540865385.67908653810.066105772.6141826921.5024038467.1514423085.7091346158.4735576929.6454326927.7824519233.6959134625.2584134621.051682692 3328 KU312312_

Surinam_2015 4.8 4.0307692312.1538461544.9538461541.9384615388.0923076922.7692307692.5538461545.69230769210.153846152.6461538461.4461538467.1076923085.6615384628.3692307699.7538461547.9076923083.6615384625.2615384621.046153846 3250 KU321639_

Brazil_2015 4.8969841744.0310540462.239474474.9268438341.9110182148.0322484322.7172290242.5679307265.73305464310.032845632.5977903851.492982987.1364586445.7629143038.3308450289.734249037.7635114963.762317115.2851597491.045088086 3349 KU365777_

Brazil_2015 4.9356865094.0382889622.2434938684.9955130121.9144481017.9868381692.7520191442.5725396355.74334430210.110679032.6323661381.4956625797.1791803775.713431058.3457971889.6918935097.7176189053.619503445.2647322761.046963805 3343 KU365778_

Brazil_2015 4.8765982754.0142729712.2598870064.9658043411.9030627427.969075232.7356526912.6464466255.76865893510.05055012.6464466251.5165031227.1959559925.7091882258.355634859.7829319067.7311923883.6277133515.1739518291.070472792 3363 KU365779_

Brazil_2015 4.9072411734.0394973072.2441651714.997007781.9150209467.9892280072.7528426092.5733093965.74506283710.113704372.63315381.4961101147.1514063445.7151406348.3482944349.7247157397.7199281873.6505086775.2363853981.04727708 3342 KU365780_

Brazil_2015 4.9356865094.0382889622.2434938684.9955130121.9443613527.9868381692.7520191442.5725396355.74334430210.110679032.6323661381.4956625797.1791803775.713431058.3457971889.6619802577.7176189053.619503445.2647322761.046963805 3343 Average 4.8255975923.9937278862.195523474.9709663561.9508446928.0199865762.7583781462.5808668525.65531497110.066243952.6931671511.4794766647.1674298395.6811973888.4503248539.7072704267.7572474643.6965034715.2874506691.0624815883286.272727

Clade 2 Genomes from African countries

Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Total

HQ234498_

uganda_1947 5.095738114.5398394074.2618900565.589870291.82211241510.1297097 5.3428042 2.7486102534.8177887587.6899320571.8529956763.2118591725.4972205064.6633724529.4193946889.2032118595.4354539843.6133415693.8295243981.235330451 3238 HQ234500_

Nigeria_1968 4.9553708834.6783625734.1859033555.6325023081.93905817210.557094495.2016004922.5854108964.4629116657.1098799631.9082794713.2933210225.663281015.0477069879.3567251469.0797168365.3554939984.0012311483.7242228381.261926747 3249 HQ234501_

Senegal_1984 5.1014136454.7019053474.1487400125.4394591271.81315304210.356484335.3472649052.5199754154.548248317.498463431.720958823.3804548255.531653355.1014136459.1886908429.0350338055.40872772 4.056545793.9028887521.198524892 3254 KF268948_

Central_African_

Rep.1976

5.3001464135.0366032213.8945827235.5636896051.8448023439.6339677895.2708638362.7818448024.7437774527.8770131771.961932653.4846266475.5344070285.1244509529.253294298.9604685215.0073206443.7188872623.8945827231.112737921 3415 KF268949_

Central_African_

Rep.1976-1980

5.3958944285.0733137833.8709677425.6304985341.9354838719.5014662765.3665689152.6979472144.8093841647.7712609972.0234604113.5190615845.4545454555.0733137839.0322580659.0615835785.1612903233.6363636363.9882697950.997067449 3410 KF268950_

Central_African_

Rep.1976-1980

5.2616108175.0558495 3.9094650215.5555555561.8518518529.6413874195.2910052912.7924750154.7619047627.9071134631.9694297473.4979423875.5261610825.0852439749.2298647858.9653145215.0264550263.674309233.8800705471.116990006 3402 KF383115_

Central_African_

Rep.1968

5.1750852184.9891540134.0285094525.5469476292.1072203289.9473194925.5779361642.6030368764.7412457397.6541679581.8593120553.4087387675.4849705614.8032228089.203594678.8317322595.330027893.8425782463.9045553150.960644562 3227 KF383116_

Senegal_1968 5.0430504314.674046744.1205412055.5043050431.87576875810.209102095.3505535062.4907749084.4587945887.4723247231.7527675283.3825338255.5965559665.1045510469.2558425588.9483394835.5965559663.9975399753.9975399751.168511685 3252 KF383117_

Senegal_1997 4.87053021 4.901356354.1307028365.5178791621.97287299610.203452535.2712700372.5277435274.5930949457.7065351421.8803945753.3292231815.3020961785.1787916159.0937114679.0628853275.3945745994.1307028363.6991368681.233045623 3244 KF383118_

Senegal_2001 5.0800492614.5874384244.0948275865.5726600991.7549261089.9753694585.326354682.7709359614.741379317.5431034481.970443353.2019704435.5726600994.6798029569.2980295579.2672413795.326354683.8793103454.0332512321.323891626 3248 KF383119_

Senegal_2001 5.1803885294.5945112554.1936478575.5504162811.75763182210.021584955.3345667592.6518655574.7178538397.6780758561.9118100523.2069071855.5195806354.6561825479.497378979.343200745.3962380513.5769349373.9777983351.23342584 3243 Average 5.1326616314.8029436924.076343445.5548894211.87953470110.016085325.3346171622.6518745844.6723985037.6279882011.8919803943.3560580945.5166483524.9561866159.2571622769.0689753015.3125902623.8297949983.8938037071.1674633463289.272727

Excess/Deficit in average usage frequencies of non-African genomes compared to African genomes

Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr

Difference -0.30706404-0.809215806-1.88081997-0.5839230650.071309991-1.996098744-2.576239017-0.0710077320.9829164682.4382557440.801186757-1.876581431.6507814870.725010774-0.8068374230.6382951252.444657202-0.1332915271.393646962-0.104981758

Difference in % (6.36) (20.26) (85.67) (11.75) 3.66 (24.89) (93.40) (2.75) 17.38 24.22 29.75 (126.84) 23.03 12.76 (9.55) 6.58 31.51 (3.61) 26.36 (9.88)

-60 -50 -40 -30 -20 -10 0 10

0 20 40 60 80 100 120

Intra-purine, intra-pyrimedene differences in Zika genomes

Non-African African

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(10)

The codon usage bias report (Table 5) of the genome sequences in the two clades reflects these observations. A chart of the codon usage bias, Fig. (8), shows that the two clades have considerable differences in codon usage, and some groups have higher frequencies of usage in one clade compared to the other.

SUMMARY

To sum up, we observed the Zika virus to have close parallels to the dengue virus in several aspects such as transition-transversion ratios, preferential vector for its spread, genome structure and the like. The genome structure is more convoluted with a large amount of complex base distribution along the sequence, especially in the structural protein coding part, and also in parts of the non-structural gene sequences; these have been clearly evident in some of the 2D graphical representation plots. We have also reported very noticeable differences between the viral sequences isolated in African countries and the isolates from more recent periods in Asia, the Pacific regions and the Americas.

The codon usage and amino acid counts also show remarkable differences in frequencies of some residues.

These aspects of the Zika virus genome suggest that qualitative changes have taken place between the older African sequences and the more recent variants circulating in the Asia-Pacific-America region. Since the outbreak in Yap island in 2007 and the recent epidemic in Brazil in 2015 have led to serious infections in pregnant women and reported cases of novel transmission through sexual contact, it is possible that some of the changes determined through this research could constitute factors facilitating this pathogenicity. Given the propensity of humans to travel and the consequences of climate change which are likely to aid the spread of the disease, these bioinformatic studies could form the basis of effective surveillance and closer monitoring of the genomes of the Zika virus. Some of the present authors have suggested computer-assisted approaches towards such surveillance and consequent design of drugs and vaccines [27] to combat the growth and spread of Zika and other zoonotic viruses, which may be taken under advisement for the benefit of the populations at large.

Fig. (6). Differences in amino acid usage between African and non-African genomes (in %).

Fig. (7). Cladewise Hydropathy Index.

-140 -120 -100 -80 -60 -40 -20 0 20 40

Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Differences in amino acid usage between African and non-African genomes (in %)

0.77 0.775 0.78 0.785

Cladewise Hydropathy Index

0.745 0.75 0.755 0.76 0.765

Asian-Pacific-American African

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(11)

Table 5. Codon Usage Bias for the two clades.

Codon

Count

Asian-American Clade African Clade

UUU(F) 26.8 32.7

UUC(F) 37.3 29.1

UUA(L) 30.5 24.8

UUG(L) 105.4 43.4

CUU(L) 31.5 55.6

CUC(L) 30.5 43.5

CUA(L) 31.1 21.6

CUG(L) 101.8 62

AUU(I) 20.9 34.7

AUC(I) 35.2 34.4

AUA(I) 28.7 18.2

AUG(M) 88.5 62.3

GUU(V) 18.1 41.2

GUC(V) 20 17

GUA(V) 16.1 21.5

GUG(V) 67.3 46.3

UCU(S) 50.6 46.1

UCC(S) 42.9 36

UCA(S) 115.5 52.6

UCG(S) 34.6 11

CCU(P) 49 55.5

CCC(P) 60.5 47.2

CCA(P) 92 59.9

CCG(P) 34 18.9

ACU(T) 57.5 52.3

ACC(T) 41.5 46.6

ACA(T) 115.5 59.5

ACG(T) 40.5 16.2

GCU(A) 38.4 62.3

GCC(A) 35.7 42.3

GCA(A) 57.4 50.7

GCG(A) 27.2 13.6

(Table 5) contd…..

Codon

Count

Asian-American Clade African Clade

UAU(Y) 14.5 23.1

UAC(Y) 20.4 15.3

UAA(*) 40.2 23.5

UAG(*) 59.6 39.4

CAU(H) 48.6 103.4

CAC(H) 42 72.1

CAA(Q) 72.9 89.7

CAG(Q) 113.8 73.4

AAU(N) 21.5 56

AAC(N) 27.2 54.5

AAA(K) 76.4 70.2

AAG(K) 109.5 83.5

GAU(D) 39.2 72.6

GAC(D) 33 61.4

GAA(E) 65.8 84.5

GAG(E) 97.5 98.2

UGU(C) 57.3 75.8

UGC(C) 74 82.3

UGA(*) 106.5 112.1

UGG(W) 173.7 128.1

CGU(R) 17.6 30.6

CGC(R) 20.9 26.6

CGA(R) 19.3 18.2

CGG(R) 35 33.4

AGU(S) 32.9 68

AGC(S) 42.5 84.5

AGA(R) 100 104.2

AGG(R) 84.8 91.5

GGU(G) 45.7 67.5

GGC(G) 43.8 74.9

GGA(G) 93.5 110.4

GGG(G) 80.5 76.5

Fig. (8). Codon usage bias in African and non-African genomes of the Zika virus..

0 20 40 60 80 100 120 140 160 180 200

Codon Usage Report

Asian-American Clade African Clade

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

(12)

CONFLICT OF INTEREST

The authors confirm that this article content has no conflict of interest.

ACKNOWLEDGEMENTS

AN designed the research and wrote the paper, SD collected and subjected the data to software analysis and composed the 2D graphs, DBW designed the 2D-dynamic approach, PW collected and analyzed data for the 2D- dynamic graphs, SCB made critical suggestions, and all the authors reviewed and agreed with the manuscript.

REFERENCES

[1] Elachola, H.; Gozzer, E.; Zhuo, J.; Memish, Z.A. A crucial time for public health preparedness: Zika virus and the 2016 Olympics, Umrah, and Hajj. Lancet, 2016, 387, 630-632.

[2] Victoria, C.G.; Schuler-Faccini, L.; Matijasevic, A.; Ribeiro, E.;

Pessoa, A.; Barros, F.C. Microcephaly in Brazil: How to interpret reported numbers? Lancet, 2016, 387, 621-623.

[3] BBC. Zika virus triggers pregnancy delay calls.

http://www.bbc.co.uk/news/world-latin-america-35388842/.

(Accessed February 26, 2016).

[4] Roa, M. Zika virus outbreak: Reproductive health and rights in Latin America. Lancet, 2016, 387, 843.

[5] WHO. "WHO Director-General summarizes the outcome of the Emergency Committee regarding clusters of microcephaly and Guillain-Barré syndrome". World Health Organization. 1 February 2016. Retrieved 16th February 2016.

[6] The Intergovernmental Panel on Climate Change. Chapter 8 Human health 8.2.8 Vector-borne, rodent-borne and other infectious diseases. IPCC 4th Assessment Report: Climate Change 2007, 403-405.

[7] Shuman, E.K. Global climate change and infectious diseases. N.

Engl. J. Med., 2010, 362, 1061-1063.

[8] Horton, R. Offline: Brazil—the unexpected opportunity that Zika presents. Lancet, 2016, 387, 633.

[9] Nandy, A.; Basak, S.C. Prognosis of possible reassortments in recent H5N2 epidemic influenza in USA: Implications for computer-assisted surveillance as well as drug/vaccine design.

Curr. Comp.-Aided Drug Des., 2015, 11, 110-116.

[10] Duffy, M.R.; Chen, T.-H.; Hancock, W.T.; Powers, A.M.; Kool, J.L.; Lanciotti, R.S.; Pretrick, M.; Marfel, M.; Holzbauer, S.;

Dubray, C.; Guillaumot, L.; Griggs, A.; Bel, M.; Lambert, A.J.;

Laven, J.; Kosoy, O.; Panella, A.; Biggerstaff, B.J.; Fischer, M.;

Hayes, E.B. Zika virus outbreak on Yap Island, Federated States of Micronesia. N. Engl. J. Med., 2009, 360, 2536-2543.

[11] Berthet, N.; Nakoune´, E.; Kamgang, B.; Selekon, B.; Descorps- Decle` re, S.; Gessain, A.; Manuguerra, J.-C.; Kazanji, M.

Molecular characterization of three zika flaviviruses obtained from sylvatic mosquitoes in the central African republic. Vector Borne Zoonotic Dis., 2014, 14, 862-865.

[12] Heymann, D.L.; Hodgson, A.; Sall, A.A.; Freedman, D.O.; Staples, J.E.; Althabe, F.; Baruah, K.; Mahmud, G.; Kandun, N.;

Vasconcelos, P.F.C.; Bino, S.; Menon, K.U. Zika virus and microcephaly: Why is this situation a PHEIC? Lancet 2016, 387, 719-721.

[13] Kuno, G.; Chang, G.-J.J. Full-length sequencing and genomic characterization of Bagaza, Kedougou, and Zika viruses. Arch.

Virol., 2007, 152, 687-696.

[14] Haddow, A.D.; Schuh, A.J.; Yasuda, C.Y.; Kasper, M.R.; Heang, V.; Huy, R.; Guzman, H.; Tesh, R.B.; Weaver, S.C. Genetic Characterization of Zika virus strains: Geographic expansion of the asian lineage. PLoS Negl. Trop. Dis., 2012, 6(2), e1477.

[15] Faye, O.; Freire, C.C.M.; Iamarino, A.; Faye, O.; de Oliveira, J.V.C.; Diallo, M.; Zanotto, P.M.A.; Sall, A.A. Molecular evolution of zika virus during its emergence in the 20th century. PLoS Negl.

Trop. Dis., 2014, 8(1), e2636.

[16] Nandy, A. A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to Globin genes. Curr. Sci., 1994, 66, 309-314.

[17] Raychaudhury, C.; Nandy, A. Indexing scheme and similarity measures for macromolecular sequences. J. Chem. Inf. Comput.

Sci., 1999, 39, 243-247.

[18] Nandy, A.; Harle, M.; Basak, S.C. Mathematical descriptors of DNA sequences: Development and applications. ARKIVOC 2006, 9, 211-238.

[19] Nandy, A.; Nandy, P. On the uniqueness of quantitative DNA difference descriptors in 2D graphical representation models.

Chem. Phys. Lett., 2003, 368, 102-107.

[20] Bielińska-Wąż, D.; Clark, T.; Wąż, P.; Nowak, W.; Nandy, A. 2D- dynamic representation of DNA sequences. Chem. Phys. Lett., 2007, 442, 140-144.

[21] Bielińska-Wąż, D.; Nowak, W.; Wąż, P.; Nandy, A.; Clark, T.

Distribution moments of 2D-graphs as descriptors of DNA sequences. Chem. Phys. Lett., 2007, 443, 408-413.

[22] Sanjuan, R.; Nebot, M.R.; Chirico, N.; Mansky, L.M.; Belshaw, R.

Viral mutation rates. J. Virol.; 2010, 84, 9733-9748.

[23] Dey, S.; Nandy, A. Phylogenetic and genetic analysis of envelope gene of the prevalent dengue serotypes in india in recent years.

SciForum Mol2Net, 2015, 1 (Section B, No 016). Proceedings 1 http://sciforum.net/conference/mol2net-1

[24] Brown, W.M.; Prager, E.M.; Wang, A.; Wilson, A.C.

Mitochondrial DNA sequences of primates: Tempo and mode of evolution. J. Mol. Evol., 1982, 18, 225-239.

[25] Tamara, K.; Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol., 1993, 10(3), 512-526.

[26] Nandy, A. Two dimensional graphical representation of DNA sequences and intron-exon discrimination in intron-rich sequences.

Comput. Appl. Biosci., 1996, 12(1), 55-62.

[27] Basak, S.C.; Nandy, A. Computer-assisted approaches as decision support systems in the overall strategy of combating emerging diseases: Some comments regarding drug design, vaccinomics, and genomic surveillance of the Zika virus. Curr. Comp.-Aided Drug Des., 2016 (in press).

Pobrano z Repozytorium Gdańskiego Uniwersytetu Medycznego / Downloaded from Repository of Medical Univeristy of Gdańsk 2023-07-02

Referências

Documentos relacionados

A revista eletrônica Crítica Cultural dedica seu décimo número, o primeiro a circular também em formato impresso, ao escritor argentino Juan José Saer (Serodino, 1937 -