• Nenhum resultado encontrado

A Bioinformatics approach to designing a Zika virus vaccine

N/A
N/A
Protected

Academic year: 2023

Share "A Bioinformatics approach to designing a Zika virus vaccine"

Copied!
10
0
0

Texto

(1)

A Bioinformatics approach to designing a Zika virus vaccine

$

Sumanta Dey

a,b

, Ashesh Nandy

a,

*, Subhash C. Basak

c

, Papiya Nandy

a

, Sukhen Das

a,b

aCentreforInterdisciplinaryResearchandEducation,404BJodhpurPark,Kolkata700068,India

bDepartmentofPhysics,JadavpurUniversity,Jadavpur,Kolkata700032,India

cUniversityofMinnesotaDuluth-NaturalResourcesResearchInstituteandDepartmentofChemistryandBiochemistry,UniversityofMinnesotaDuluth,5013 MillerTrunkHighway,Duluth,MN55811,USA

ARTICLE INFO Articlehistory:

Received15July2016

Receivedinrevisedform3March2017 Accepted5March2017

Availableonline10March2017 Keywords:

Zikavirus Zikavaccine

2Dgraphicalrepresentation Alignment-freemodels Epitoperegions

ABSTRACT

TheZikavirusinfectionshavereachedepidemicproportionsintheLatinAmericancountriescausing severebirthdefectsandneurologicaldisorders.Whileseveralorganizationshavebegunresearchinto designofprophylacticvaccinesandtherapeuticdrugs,computerassistedmethodswithadequatedata resourcescanbeexpected toassistinthesemeasurestoreduceleadtimesthroughbioinformatics approaches.Using60sequencesoftheZikavirusenvelopeproteinavailableintheGenBankdatabase,our analysis withnumerical characterizationtechniques and several webbasedbioinformatics servers identifiedfourpeptidestretchesontheZikavirusenvelopeproteinthatarewellconservedandsurface exposedandarepredictedtohavereasonableepitopebindingefficiency.Thesepeptidescanbeexpected toformthebasisforanascentpeptidevaccinewhich,enhancedbyincorporationofsuitableadjuvants, canelicitimmuneresponseagainsttheZikavirusinfections.

©2017ElsevierLtd.Allrightsreserved.

1.Introduction

Zikavirusinfectionshavereachedepidemicproportionsinthe NewWorld,especiallyaffectingpregnantwomenandleadingto highlevelsofmicrocephalyinnewborns(Victoriaetal.,2016).The incidencesofsuchcasesandotherneurologicaldisorderssuchas Guillaine-BarresyndromecoincidingwiththespreadoftheZika virusinfections,thelackofpreventiveortherapeuticmedications againstthevirusandtheprospectoffurtherexpansionofthevirus, havepromptedtheWorldHealthOrganization(WHO)todeclare on1stFebruary2016thesedisordersasPublicHealthEmergency ofInternationalConcern(WHO,2016a).Severalcountrieshaveset- upmachinerytocombattheeffectsoftheZikavirus(ZIKV)through publicmeasuresandheightenedpublicawareness(Elacholaetal., 2016; Roa, 2016; Rasmussen et al., 2016) while the National Institute of Allergy and Infectious Diseases (NIAID) under US

NationalInstituteofHealth(NIH)isrespondingtotheZikavirus crisis through vaccine, treatment, and clinical trials research (Anon.,2016a)toalleviatethesuffering.

TheZikavirushadbeenisolatedalmost70yearsagofroma monkey in Uganda’sZika forest but remained out of frontline researchinterestalthoughithadbeendetectedinseveralAfrican andAsiancountriesinthesubsequentperiod.Itisavector-borne diseasespreadthroughthebitesof theAedes aegyptiandAedes albopictus mosquitoes, whose ranges have been increasing in recenttimesbecauseofglobalwarming(IPCC4th,2007;Shuman, 2010). The virus first attracted limited attention when a Zika epidemiceruptedinYapislandinMicronesiain2007(Duffyetal., 2009)and again in FrenchPolynesia in 2013 (Heymann et al., 2016).However, a suddenincrease in cases ofmicrocephaly in newborns detected in Latin American countries late last year catapultedthevirustoworldattention(WHO,2016a).Lackofany drugs or vaccine against the new disease led to inadequate containment of the disease. Development of new drugs from laboratory benchtomarket takesyearsof effortandbillions of dollars;developmentofvaccinesinthetraditionalwaycostsonly slightlylessintimeandeffort.AmorerapidresponsetotheZika virusepidemicwould appeartobehighlydesirable (Basak and Nandy,2016).

Traditional vaccines consist of attenuated, inactivated or subunit cultures, all of them cultivated from the virus. While the former types have chances of regression or cause genetic Abbreviations: MDPI, Multidisciplinary Digital Publishing Institute; DOAJ,

directoryofopenaccessjournals;HLA,HumanLeukocyteAntigen;IEDB,Immune Epitope Database; ABCpred, artificial neural network based B-cell epitope prediction;NIAID, NationalInstitute of Allergyand Infectious Diseases;NIH, NationalInstituteofHealth;NCBI,NationalCentreforBiotechnologyInformation.

$ Thisresearchdidnotreceiveanyspecificgrantfromfundingagenciesinthe public,commercial,ornot-for-profitsectors.

* Correspondingauthor.

E-mailaddress:anandy43@yahoo.com(A.Nandy).

http://dx.doi.org/10.1016/j.compbiolchem.2017.03.002 1476-9271/©2017ElsevierLtd.Allrightsreserved.

ContentslistsavailableatScienceDirect

Computational Biology and Chemistry

j o u r n a l h o m ep a g e: w w w . e l s e v i e r . c o m / l o c at e / c o m p b i o l c h e m

(2)

damage, subunits consist of individual chemically purified components which can evoke immune response but can be expensive and also have problems like side effects, possible leakage of infectious agents, etc. (Sobolevet al., 2005).A new strategyofdevelopingsyntheticpeptidevaccinesistakingshape (MoisaandKolesanova,2012)wherethevirusgenomeisscanned fortheparticularproteinantigensthatelicitimmuneresponseand theseareselectedfor synthesis intoa peptidevaccine.A more focusedapproachistopreciselylocatetheepitoperegionswithin thisantigenandpresentthoseforimmuneresponse(Purcelletal., 2007).Widescalecomputerbasedapproachesareessentialtothis purpose(BasakandNandy,2016).Wehaveappliedsuchmethods toidentifytargetregionsinsurfaceproteinsofinfluenzavirusand rotavirus(Ghoshetal.,2010,2012;Sarkaretal.,2015).Initiallywe scanalibraryofsequencesofthedesignatedproteintodetermine segmentsthatareunchangedorleastchangedamongthevarious strains, then couple these with average solvent accessibility profilesofthesequencestoselectthosesegmentsthataremost conservedandhavehighestsolventaccessibilityprofile.Wethen determine the T-cell linear epitopes with acceptable binding affinitytohumanMHCclass1andclassIIandfinallyselectthose peptidesthatposenoautoimmunethreat,whereasforconforma- tionalepitopeswesearchforB-cellepitopesandtrytofindout whether our segments are also part of the conformational epitopes;intheprocess,the3Dcrystalstructure ofthesurface proteinisutilizedtoensurethattheselectedpeptideregionsare not covered against solvent accessibility due to neighboring proteinsthattogetherconstitutethequaternarystructure.

TheZikavirusisapositive-sensenon-segmentedRNAgenome about10700baseslongthatcodesforthreestructuralproteins thecapsid,thepre-membrane/membraneandenvelopegenes andsevennon-structuralgenesreferred toasNS1,NS2A,NS2B, NS3,NS4A,NS4B,andNS5.TheZikavirusbelongstotheflavivirus family which includes Japanese encephalitis, West Nile, Yellow fever,anddengueviruses.Vaccinesareavailableagainstmostof these latter viruses and the World Health Organization has organized resources to develop an inactivated vaccine against the Zika virus (WHO, 2016b). In the meantime, the National InstituteofHealthintheUSAhasstartedworkonaZikavaccine basedontheWestNilevirusvaccine(Anon.,2016d)andBharat Biotechin Indiahasclaimed tohavedeveloped a Zika vaccine, Zikavac,thatisreadyforpre-clinicaltrials(Bagla,2016).

Whereaspeptidevaccinescanbedesignedtobemorefocused, noneoftheaboveattemptstodeviseaZikavaccinehaveclaimeda peptidevaccineapproach. Welaunchedourproject todevise a peptidevaccine against theZika virus based onstudies of the sequencesoftheZikaenvelope(ZIKV-E)protein.Thesearchfor peptide targets in the Zika virus, however, suffered on two

grounds:insufficientsequencedataandlackofcrystalstructureat thetimeofourinitialstudy,areasofconcernwehadtouchedupon inourrecentcommentandreview(NandyandBasak,2016a,b).The recent publicationof thestructure of matureZikavirus(Sirohi et al.,2016)solvedone issue,we improvisedontheotherand appliedourmethodologytoarriveatasetoffourpossiblepeptide regionsthatcouldbeutilizedindesigningavaccineagainstthe Zika virus. These are indicators only to the eventual peptide vaccine design; more refinement, selection of appropriate adjuvants,deliverysystemsandlabtrialsareessentialstepsthat areyettobeundertaken.

2.Resultsanddiscussions

Sinceamajorpartofourdownloadedsequenceswerepartial cds(listedinTableS1),wefirstneededtounderstandwhichpartof the full proteinthey represent.An alignment exercise through MEGA5.22softwareshowedthatalltheseproteinsegmentswere almostperfectlyaligned(221matchesoutof251aminoacids),so wewereassuredthattheyallrepresentedthesamepartofthe protein;nucleotidematcheswereunderstandablyless,557/753.To determine which part of the whole protein this fragment represented,weutilizedanalignmentfreemethodbydoing 2D graphical representation of the nucleotide sequences of the envelope segment from the DQ859059 Uganda MR766 1947 polyproteingene,andonepartialcdssequence,KF383016Senegal 2001(Fig.1).(BLASTorMEGAapproachescouldhavehelpedbut alignmentsincludesgaps,etc.whichweavoidwithouralignment- free model.) From the visual clues we could determine an approximatelocationandreferringtothesequenceoftheUganda genewedeterminedthatthepartialsequencestartpointcoincided withnucleotidenumber393oftheUgandangeneandendingat around nucleotidenumber 1140.A checkwithamino acid (aa) sequencesshowedthatthepartial cdsproteinsequencesofthe Senegalsamplehadanextrapeptidesegment,DIGHETD(ataano.

25 offragment; correspondingpositionin Ugandanamino acid sequenceis155;seeTable1forschematic),otherwisethematch wasverygood;theextra21basesandothermutationalchanges cause the slight differences between the two plots in the corresponding areas. As an additional check we did the same exercise with KU926310 Brazil2016 envelope gene (figure not shown) and found that the gene fragments matched with the correspondingsegmentoftheBrazil2016genewhichhadthe21 nucleotidesthatwereabsentintheUgandansequence.

TheASA(averagesolventaccessibility)profileswerepredicted fromtheSABLEserver(Anon.,2016f);wesmoothoutthenumbers bytakingarunningaverageoverthenumbersfor12aminoacidsat a time. Getting the solvent accessibility profile of the protein

Fig.1.The2DgraphicalrepresentationofacompleteZikaenvelopegeneandapartialcds.Thepartofthefullgene(frombaseno.391to1143)thatmostcloselymatchesthe partialcdsiscolouredred.

(3)

sequencefragment,KF383016,byleavingouttheextra7amino acids(peptideDIGHETDcorrespondingtoaanos.155to161ofthe completeBrazilenvelopesequenceobtainedfromKU926310),and overlayingontheASAprofileofthefullgene,wefoundthatthe matchwasexcellent(Fig.2).ThematchofASAprofilesisexpected when the protein sequences match; here we use it as an independentverification thatthematches bythetwo methods indeedoverlap.Thisverifiedthelocationofthefragmentinthe wholegeneandalsothattheproteinsequencewasoverallwell conserved.RepeatingtheexercisewiththeBrazil2016(KU926310) proteinandthesamefragmentKF383016,butthistimewithout excludingany aminoacids ofthesequence, wefounda similar match.

Togetbetterstatistics,werestrictedouranalysistothe251aa longproteinsequenceavailablefromthe35fragmentproteins,and extractedequivalentsegmentsof251aminoacidsfromthe25full proteinsequencesmakingatotalof60fragmentsequencesinall.

Ourinterestherewastodetermineconservedpeptidesegments first, since the Zika virus is a RNA virus and therefore highly variable, and then later determine which ones of these were surfacesituated.Usingagraphicalrepresentationandnumerical characterizationtechniquedescribedintheMaterialsandMethods section,wecomputedpRvalues fora stretchof12aminoacids startingfromposition1ofproteinsequence1,thenforposition number2andsoonuntiltheendofthesequence.Similarsliding windowanalysesweredoneforproteinfragment2,thenprotein fragment3,andsoonuntiltheentiresetof60proteinfragments werecompleted.ThepRvaluesateachpositionwerethenscanned toseehowmanydifferentvalueswereobtainedateachposition.

Thisgaveaprofileof12-merproteinstretchvariability:thesmaller thenumber,themoreconservedthestretch.Fig.3showsthedata smoothedouttoaverageover12pRvariabilityvalues.Thefigure alsoshowstheASAprofile.Fromthetwoprofileswecanselect

aminoacidstretchesthatshowleastvariabilitywithhighestASA values.Thetentativelyidentifiedsegmentsaremarkedbystraight shortgreenhorizontallinesinFig.3.Theidentifiedpeptidesare listedinTable2.

Availability of a crystal structure, 5IRE, of the Zika virus envelopeproteinprovidedanopportunitytoinspectandfinetune the range of these surface exposed segments. The Zika virus envelope protein crystal (Sirohi et al., 2016) shows a trimeric structure, found in acidic endosomal environment, with three membraneproteinsandthreeN-Acetyl-D-Glucosamineattached, one toeach monomer; themature virus particles contain 180 copies of the envelopeproteinin 90 dimer configurations (Dai et al., 2016). Three domains are identified in each protein comprising betabarrels(domain I),finger-like domain(domain II)andimmunoglobulin-likedomain(domainIII)withthefusion loop buried by domains II and III of the neighboring protein monomers;Fig.4showsaschematicdiagramoftheZIKVenvelope dimer comprising three domains each for proteins A and B.

Consideringtheidentifiedsequencesandthefoldedstructureof theproteins,someofthesegmentsconsideredsurfaceexposedby ASAanalysisforoneproteincouldconceivablygetcoveredbya Fig.2.ASAprofilesoffullZikaenvelopeproteinfromDQ859059(blue)overlayedwiththeASAprofileofenvelopeproteinfragmentKF383016(red)in%plottedagainstamino acidnumbers.Theclosematchindicatesthelocationofthefragmentinthefullgenehasbeencorrectlydiagnosed.(Forinterpretationofthereferencestocolourinthisfigure legend,thereaderisreferredtothewebversionofthisarticle.)

Table1

Schematicdiagramofpartoftheaminoacidsequencesofthethreeproteinsshowingtheextrapeptidesegment,DIGHETD,intheSenegalfragment(nucleotidesequence KF383016)andBrazil(KU926310proteincomparedtoUgandansample(DQ859059).

AAnumbers 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165

DQ859059 G M I V N E N R A

KF383016 G M I V N D I G H E T D E N R A

KU926310 G M I V N D T G H E T D E N R A

Table2

Regionswithlowaminoacidvariabilityandhighsolventaccessibilityselectedas perourprocedure.

Region Aminoacidnumbers Peptide

Asperfragment Asperfullprotein

Start End Start End

I 43 50 173 180 SPRAEATL

II 97 106 227 236 AGADTGTPHW

III 180 190 310 320 AAFTFSKVPAE

IV 215 223 345 353 MAVDMQTLT

(4)

neighboring protein.Our investigation of theplacement of the peptide segments identified by ASA and conserved sequence analysisontheproteinfragmentshowedthatallfourregionswe identifiedlieonthesurfaceandonlyasmallpartofRegion3is coveredbyaneighboring protein.Fig.5 showsthefourregions individuallyandalltogether,andalsoaviewofthelocationofthe proteinfragment,ourworkingregion,onthefullenvelopeprotein forcomparison.

Thefourregionsidentifiedarespreadacrossall3domainsof theproteinmonomer:Ouridentifiedregion1occursinthebeta barreldomainI,region2occursinthefinger-likedomain2,while regions3 and 4 liein the immunoglobulin-like domain3. The segmentsaroundtheglycosilationsiteAsn154arehighlyvariable (Sirohietal.,2016;Daietal.,2016;Fayeetal.,2014);theywerenot pickedupbyouranalysisthatsearchedformostconservedpeptide stretchesinthefragmentsequences.Thefourregionsweidentified are apparently highly conserved and probably relates to the functionalityoftheindividualdomains.

The next step was to search for existence of linear and conformationalepitoperegionswithintheidentifiedsegmentsso thatwe canin thecurrent instance identifysuchepitopes and expect suitable adjuvants to enhance the immune response (Purcell et al.,2007).For this purposewe have used two web basedtoolsforepitopeprediction,viz.,theIEDB(ImmuneEpitope DatabaseandAnalysisResource)(Anon.,2016c;Vitaetal.,2010) whichhasbeenusedinseveralapplications(see,e.g.,(Cantoretal., 2011;Singhetal.,2010))andABCpredservers(Anon.,2016b;Saha andRaghava,2006)(whichhasalsogivengoodresultsindifferent

applications(Guetal.,2008;Altindisetal.,2009)),topredictMHC class IandMHC classIIT-cell epitopes,and alsoforlinearand discontinuousB-cellepitopes.InalargescaleevaluationofIEDB Analysis Resourcethe predictionfor MHCIepitopes havebeen seentohaveaccuraciesof90%,whileforMHCIIthisislessat60 to70%(Zhangetal.,2008).TheABCpredserverpredictionsforB- cell epitopes are reported to have accuracies of 65.9% (Anon., 2016b).

UsingtheIEDBserver todeterminethebindingaffinitiesfor HumanLeukocyteAntigens(HLA)forthefouridentifiedregionson the Zikavirus partial envelope sequence, theHLA alleles were chosen to provide wide coverage of the Indian population. All predictionsweredoneusingIEDBrecommendedprocedure;most oftheresultsreportedareconsensusofthevariousmeasuresused, butwhereonlyoneresultwasavailable,thatprocedureisnamed (detailsinwebsite(Anon.,2016c)).Thelistofthebindingaffinities forMHCClassIIT-cellepitopes,withpercentilerankwherelow rankimplieshigherbindingaffinity,aregiveninTables3aand3b.

TheregionsidentifiedbyourproteinvariabilityandASAanalysis aremarkedinred.Thetablesshowthatalltheseregionshavegood bindingaffinitieswiththeHLA-DRBalleles(Table3A),withthe bestbindingaffinitiesapparentlyinthesecondlastsegment.The HLA-DP/DQ allele binding scores are comparatively poor (Table 3B). The MHC Class I T-cell epitopes gave poor results andarenotlistedhere.

We also computed the linear epitope potential by another method: Table 4 shows the linear epitopes predicted by IEDB Ellipro Analysis Tools in which the regions identified by our A I

A II A III

B II

B I

B III

Fusion loop Fusion

loop

Fig.4. SchematicofZikaenvelopeproteindimerstructure.

0 5 10 15 20 25 30 35 40

0 50 100 150 200 250

Profiles of ASA and sequence variability of Zika envelope paral protein

aa numbers

Fig.3. Profilesofaveragesolventaccessibility(blue)in%andaminoacidsequencevariability(red)innumbersofthe60Zikaenvelopeproteinfragmentsplottedagainst aminoacidnumbers.Theshorthorizontalgreenlinesareidentifiedsegmentsofthesequenceswheretheaminoacidsaremostconservedandhavehighestsolvent accessibility.(Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)

(5)

methodarehighlightedinred.BothTables3A,3Band4showthat ouridentifiedregionshavegoodepitopepotentialaspredictedby differentmethods.

ItisknownthattheWestNilevirusepitopesaremostlynon- linearandconformationalinmicemodels(DeFiletteetal.,2014).

As a member of the flavivirus family the Zika virus can be expectedtoparallelthismodel.However,inourcasewherewe have mostly partial sequences, determining conformational epitopes could be problematic, butan improvised approachis appliedhere.

Fig.5.DisplayinspacefillrenderingofZIKVenvelopeproteinstructure(5IRE)byPyMol.Theconservedsurfaceexposedsegmentsidentifiedbycomparisonofprotein variabilityandASAprofileanalysisshowninFig.3andgiveninTable2arehighlighted.Peptidestretchesshownare,intermsofpeptidestretchpositionnumbers,(a)173–180 (inred),(b)227–236(mauve),(c)310–320(pink),(d)345–353(purple).Panel(e)showsthefullstructurewiththetemplateproteinwithallfourpeptideshighlightedinthese colors.Panel(f)showsthesameproteinwithourproteinfragmentworkingregionincyan.(Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderis referredtothewebversionofthisarticle.)

(6)

To get estimatesof thebinding affinities forconformational epitopes,we performed an IEDB Ellipro analysison the whole protein through the 5IRE Protein Data Bank sequence data to determinepresenceofdiscontinuousepitopesforB-cell.,andthen short-listedonlythe resultsthat matched inpeptide segments thosealreadyidentifiedforthe251aaproteinfragmentsequences.

TheresultsareshowninTable5;thepartsofthesequencesthat matchwithouridentifiedconservedsegmentsareagainmarkedin red. Fig. 6 shows the Zika envelope protein marked with the portionsoftheconformationalepitopesthathaveoverlapswith ouridentifiedregions.Overall,wefoundthattheregionsidentified inTable2notonlyhadgoodT-cellandB-cellaffinitiesasshownin Table3A

IEDBpredictionsofbindingaffinityforMHC-IIofalleleHLA-DRB.Thebindingaffinityisconsideredhighforlowpercentilerank.

allele start end peptide method percentile_rank

HLA-DRB1*15:01 44 58 PRAEATLGGFGSLGL Consensus 2.48

HLA-DRB1*13:01 44 58 PRAEATLGGFGSLGL sturniolo 6.41

HLA-DRB1*04:01 44 58 PRAEATLGGFGSLGL Consensus 11.54

HLA-DRB1*07:01 44 58 PRAEATLGGFGSLGL Consensus 15.19

HLA-DRB1*01:01 44 58 PRAEATLGGFGSLGL Consensus 39.51

HLA-DRB1*03:01 44 58 PRAEATLGGFGSLGL Consensus 70.94

HLA-DRB1*10:01 44 58 PRAEATLGGFGSLGL NetMHCIIpan 78.27

HLA-DRB1*13:01 97 111 AGADTGTPHWNNKEA sturniolo 34.99

HLA-DRB1*04:01 97 111 AGADTGTPHWNNKEA Consensus 60.57

HLA-DRB1*03:01 97 111 AGADTGTPHWNNKEA Consensus 66.47

HLA-DRB1*15:01 97 111 AGADTGTPHWNNKEA Consensus 80.18

HLA-DRB1*07:01 97 111 AGADTGTPHWNNKEA Consensus 87.43

HLA-DRB1*01:01 97 111 AGADTGTPHWNNKEA Consensus 98

HLA-DRB1*10:01 97 111 AGADTGTPHWNNKEA NetMHCIIpan 98.59

HLA-DRB1*01:01 182 196 FTFSKVPAETLHGTV Consensus 2.74

HLA-DRB1*07:01 182 196 FTFSKVPAETLHGTV Consensus 6.12

HLA-DRB1*04:01 182 196 FTFSKVPAETLHGTV Consensus 8.45

HLA-DRB1*10:01 182 196 FTFSKVPAETLHGTV NetMHCIIpan 10.95

HLA-DRB1*13:01 182 196 FTFSKVPAETLHGTV sturniolo 19.99

HLA-DRB1*03:01 182 196 FTFSKVPAETLHGTV Consensus 21.54

HLA-DRB1*15:01 182 196 FTFSKVPAETLHGTV Consensus 29.82

HLA-DRB1*13:01 215 229 MAVDMQTLTPVGRLI sturniolo 3.23

HLA-DRB1*03:01 215 229 MAVDMQTLTPVGRLI Consensus 4.77

HLA-DRB1*04:01 215 229 MAVDMQTLTPVGRLI Consensus 9.84

HLA-DRB1*10:01 215 229 MAVDMQTLTPVGRLI NetMHCIIpan 13.98

HLA-DRB1*07:01 215 229 MAVDMQTLTPVGRLI Consensus 22.11

HLA-DRB1*01:01 215 229 MAVDMQTLTPVGRLI Consensus 24.51

HLA-DRB1*15:01 215 229 MAVDMQTLTPVGRLI Consensus 33.16

Table3B

IEDBpredictionsofbindingaffinityforMHC-IIofalleleHLA-DP/DQA.Thebindingaffinityisconsideredhighforlowpercentilerank.

allele start end peptide method percentile_rank

HLA-DQA1*05:01/DQB1*03:01 43 57 SPRAEATLGGFGSLG Consensus 20.48

HLA-DPA1*01/DPB1*04:01 43 57 SPRAEATLGGFGSLG Consensus 54.76

HLA-DPA1*01:03/DPB1*02:01 43 57 SPRAEATLGGFGSLG Consensus 59.48

HLA-DQA1*01:01/DQB1*05:01 43 57 SPRAEATLGGFGSLG Consensus 65.55

HLA-DQA1*01:02/DQB1*06:02 43 57 SPRAEATLGGFGSLG Consensus 70.71

HLA-DQA1*05:01/DQB1*02:01 43 57 SPRAEATLGGFGSLG Consensus 78.28

HLA-DQA1*05:01/DQB1*03:01 97 111 AGADTGTPHWNNKEA Consensus 36.05

HLA-DQA1*01:01/DQB1*05:01 97 111 AGADTGTPHWNNKEA Consensus 84.75

HLA-DQA1*01:02/DQB1*06:02 97 111 AGADTGTPHWNNKEA Consensus 87.39

HLA-DPA1*01:03/DPB1*02:01 97 111 AGADTGTPHWNNKEA Consensus 88.95

HLA-DQA1*05:01/DQB1*02:01 97 111 AGADTGTPHWNNKEA Consensus 92.22

HLA-DPA1*01/DPB1*04:01 97 111 AGADTGTPHWNNKEA Consensus 92.49

HLA-DPA1*01:03/DPB1*02:01 180 194 AAFTFSKVPAETLHG Consensus 14.2

HLA-DPA1*01/DPB1*04:01 180 194 AAFTFSKVPAETLHG Consensus 15.94

HLA-DQA1*05:01/DQB1*02:01 180 194 AAFTFSKVPAETLHG Consensus 21.84

HLA-DQA1*05:01/DQB1*03:01 180 194 AAFTFSKVPAETLHG Consensus 28.32

HLA-DQA1*01:02/DQB1*06:02 180 194 AAFTFSKVPAETLHG Consensus 38.31

HLA-DQA1*01:01/DQB1*05:01 180 194 AAFTFSKVPAETLHG Consensus 52.51

HLA-DQA1*01:02/DQB1*06:02 215 229 MAVDMQTLTPVGRLI Consensus 34.9

HLA-DPA1*01:03/DPB1*02:01 215 229 MAVDMQTLTPVGRLI Consensus 40.75

HLA-DQA1*05:01/DQB1*02:01 215 229 MAVDMQTLTPVGRLI Consensus 40.76

HLA-DPA1*01/DPB1*04:01 215 229 MAVDMQTLTPVGRLI Consensus 44.04

HLA-DQA1*05:01/DQB1*03:01 215 229 MAVDMQTLTPVGRLI Consensus 44.17

HLA-DQA1*01:01/DQB1*05:01 215 229 MAVDMQTLTPVGRLI Consensus 45.67

(7)

Tables 3 and 4, all of them had overlaps with discontinuous epitopes also (Table 5). Thus, the peptide segments we had identifiedfromthesetof60fullandpartialsequencesoftheZika virus E-protein appear to hold reasonable potential to act as peptidevaccines.

Inthis context,foradditionalsupport.weusedtheABCpred server(Anon.,2016b)thatpredictsbindingaffinitiesspecifically forB-cellepitopeswithanaccuracyof65.9%.Wefindthatthreeof the identified regions show strong binding affinities, Table 6, whereagainwehavemarkedouridentifiedpeptidesinred.

Wecheckedall thefouridentifiedregions forauto-immune threatby doing protein–protein BLAST.No homology withany humanproteinsegmentwasfound.

In the search for immune response tothe ZIKV infections, antibody-ZIKV complex through the fusion loop has been discussed by Dai et al. (2016). In this report we have worked withpartialproteinsandthroughanalysisofmanysequenceshave shown possible four antibody target regions. Dai et al. (2016) showed that the fusion loop on the Zika virus can bind to antibodies (see structure in 5JHL of Protein Data Bank) and neutralizescirculatingZIKV-Einvitroandinmice.Thenewfour conserved surface exposed sites we have determined to have epitopebindingefficiencyaugmentsvaccinetargetsitesandhave the potential tobe effective as more accessible candidates for peptidevaccines(DatasummarizedinTable7).

3.Materialsandmethods

Wedownloaded43sequencesoftheZikavirusenvelopegene andproteinfromtheNCBIGenBank,allofwhich,barringtwo,were partialsequences.Weretained32partialgenesequenceswith753 bases each (Table S1) and used the complete sequence to determinetheexactlocationofthepartialsequences.Availability oflargernumberofcompleteZikaenvelopesequenceswouldhave helpedouranalysismuchmore;therearemoreAfricansequences in ourdatabase than Asian/American. However, the MEGA5.22 alignmentexercisehasshownthatthesequencesofthesegment wehaveinourdatabasearequitecloselyalignedimplyingthatfor thissetatleasttheconclusionswearriveatwouldbeuniversally applicable.

Forcomparisonofthepartialsequenceswiththefullsequences todeterminethelocation,weusedthe2Dgraphicalrepresentation method(Nandy,1994)whereaDNAoraRNAsequenceisplottedin a Cartesian co-ordinate system by starting at the origin and plottingapointonestepawayinthenegativex-directionforan adenine,onestepinthepositivey-directionforacytosine,onestep in thepositivex-directionfora guanineand inthenegativey- direction for a thymine/uracyl. Following this algorithm in sequenceovertheentireDNAorRNAsequencetracesoutacurve inthe2Dspacethatisreflectiveofthedistributionofbasesinthe Table4

IEDBEllipropredictedlinearepitope(s)fortheenvelopeproteinmonomerofZikavirus.Thepartsofthesequencesthatmatchwithouridentifiedconservedsegmentsare markedinred.(Forinterpretationofthereferencestocolourinthistablelegend,thereaderisreferredtothewebversionofthisarticle.)

Fig.6.TheZikaenvelopeproteinanddiscontinuousepitopes.Theportioncoloured blueshowsthefragmentweareworkingwith,ourworkingregion.Themauve, brownandyellowregionsarepartsofdiscontinuousepitopesthatoverlapthe fragmentprotein.(Thefourthregionisnotvisibleinthisview.).(Forinterpretation ofthereferencestocolourinthisfigurelegend,thereaderisreferredtotheweb versionofthisarticle.)

(8)

sequence.Suchplotsofgenesequencescanbeusedforvisualclues onsimilaritiesand dissimilaritiesinDNA/RNAsequences.Many other schemes have been proposed for such representations (Nandyetal.,2006),butweusethemethodjustdescribedbeing intuitivelysimpleandadequateforthepresentpurpose.

Inthecaseofproteinsequencestootherearemanyschemesfor numericalcharacterization (Randicet al., 2011). Weuse a 20D method (Nandy et al., 2009) where the amino acids are each assigned an individual axis and theprotein sequence graph is plottedusinganalgorithmasinthecaseofthenucleotides,except thistimethesequenceplotisentirelyintheimaginarydomain.

Howeverwecandefineforeachaxisaweightedaveragevalueof theco-ordinates,

m

x,oftheassociatedaminoacidandcomputea distanceoftheweightedcentreofmassfromtheorigin:

m

1¼X

i

x1i=N;

m

2¼X

i

x2i=N; ...;

m

20¼X

i x20i=N

pR¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

m

21þ

m

22þ...þ

m

220

q

wherexjiistheco-ordinate ofthejthaminoacid, j=1,2,...,20, i=1,2,3,...,N,NisthelengthoftheproteinsequenceandpRisthe graphradiusfromtheorigintothecentreofmass.Using these measureswithsomeadjustmentsasmentionedinanearlierwork (Sarkaretal.,2015),wehavefoundtheproteingraphdescriptor,pR, tobequiteusefulinmeasuringsimilarities/dissimilaritiesbetween sequencessinceanychangeintheaminoacidcompositionand distributionchangesthepR.Thisisanalignment-freemodeland thecomparisonsareonactualsequencebasis.Withoutanyneedto ascribeanymeaningtotheabsolutevaluesofthepR,wecanuse thispropertyofthedescriptortotrackchangesinanysequencein wholeorpart.

Table5

IEDBEllipropredicteddiscontinuousepitope(s)forenvelopeproteinmonomerofZikavirus.Thepartsofthesequencesthatmatchwithouridentifiedconservedsegments aremarkedinred.(Forinterpretationofthereferencestocolourinthistablelegend,thereaderisreferredtothewebversionofthisarticle.)

(9)

Alignment-freetechniquesarerelativelynewinthebiomolec- ularanalysisfield.Whilemultiplealignmentmethodshavewon wideacclaimanduse,comparisonofalargenumberofsequences requirelongtimesformultiplealignmmentresults.Computations in alignment-free methods, on the other hand, where each biomolecular sequence is represented by a number, called a descriptor,thatisuniquetothesequence,isveryfastandaccurate (see(Yuetal.,2013)foranexample)irrespectiveofcomparisons fortensorhundredsofsequences.Thebiodescriptorscanalsobe computedforthewholesequenceorpartsthereof,whichcanbe thenusedtocomparesequencesegmentsrapidlyoverhundredsof sequences; multiplealignment methods will enable residueby residue comparison, a very difficult task. Multiple alignment methods depend upon various models to introduce gaps, alignment-freemethodsareparameterfreeandhavethepotential togivenewinsights.Inouranalysesofvariousviruses(Ghoshetal., 2010, 2012; Sarkar et al., 2015) where we usually compare hundreds of sequences, we have adopted the alignment-free techniqueforsequencecomparisonpurposes.Multiplealignment techniquesareusedtocomplementouranalysiswhererelevant.In thispaperwherewedealwith<100sequences,wehaveusedboth techniquesasappropriate.

The 3D structure of the Zika envelope gene, 5IRE, was downloaded from Protein Data Bank and viewed in Cn3D of GenBankandPyMOL(Anon.,2016e).

4.Conclusions

Ourinterestinthispaperhasbeentoidentifypeptidesegments onsurfaceproteinsthatcanbeconsideredforpossibledesignof peptidevaccinesagainsttheZikavirus.Therecentavailabilityof 3DstructuraldataontheZikavirusenvelopeproteinhashelped thissearch.Ouranalysiswasbasedonatotalof60Zikaenvelope sequencescomprising 25 complete envelope sequencesand 35

partialsequences.Wehaveimprovisedupontheshort,half-size segments of the envelope protein sequences available in the databaseandidentifiedpeptideregionsthatholdgoodpotentialas epitopes to act as starting points for laboratory work towards peptide vaccines against the Zika virus. As mentioned in the Introduction, in theabsence ofsignificant number of complete sequencesforanalysis,thepeptidesidentifiedhereareonlyapart ofthefullrepertoryofpossiblepeptidetargets,butcouldbetaken asprecursortoamultivalentpeptidevaccine;morerefinement, selectionofappropriateadjuvants,deliverysystemsandlabtrials areessentialstepsthatwillhavetobeundertakentocompletethe process.Webelievesuchbioinformaticsapproacheswillassistin theprocessofeventualdesignofarationalpeptidevaccineagainst the Zika virus which has emerged in epidemic form in Latin Americasincelastyearandhaspotentialtospreadandcausehavoc acrosstheworld.

Authorcontributions

ANconceivedanddesignedtheresearchworkandwrotethe paper,SDeydownloadedandanalysedthedata,SCBguidedthe work and contributed tothe write-upof thepaper, SDas gave valuablesuggestions,PNworkedinclosecollaboration.

Conflictsofinterest

Theauthorsdeclarethattherearenoconflictsofinterest.

Acknowledgments

Theauthorsdidnotreceiveanygrantsforthisworkfromany source.Theauthorswouldliketothanktheanonymousreferees forvaluablesuggestionsthathaveimprovedthepaperimmensely.

Table6

ABCpreddeterminationofB-cellbindingaffinities.Thehighlightedsegmentsfeatureinthepeptidesselectedearlier.Notethathighscoreindicatesgoodbindingaffinity.(For interpretationofthereferencestocolourinthistablelegend,thereaderisreferredtothewebversionofthisarticle.)

Table7

Summarylistingof4identifiedpeptidesegmentsonthe251aaproteinsegmentsandpredictionsofbindingaffinitiesfortheepitopes.

ZIKV-E Peptide T-cellEpitopePercentileRank# B-cellEpitope

Region Startposition HLA-DRB HLA-DQ/DP DiscontinuousepitopesInEllipro@ ABCpredscore#

Consensusbest Average* Consensusbest Average*

1 173 SPRAEATL 2.48 32.04 20.48 58.21 Y 0.83

2 227 AGADTGTPHW 60.57 75.17 36.05 80.3 Y 0.75

3 310 AAFTFTKVPAE 2.74 14.23 14.2 28.52 Y 0.88

4 345 MAVDMQTLT 4.77 15.94 34.9 41.71 Y

ThepercentileranksareasgiveninTables3Aand3BforMHCIIbinding;lowerscoresimplyhigherbinding.

* Theaveragesarecomputedfromallranksforeachpeptide.

@“Y”inthecaseofdiscontinuousepitopesimplythattheidentifiedpeptidesformapartofthediscontinuousepitopes.

# TheB-cellscores(max.1)werederivedfromABCpredserver;higherscoresimplyhigherbindingaffinities.

Referências

Documentos relacionados

Materials and Methods Study Design This investigation is a descriptive and analytical observational study of radiographic findings of the Introdução The Zika virus ZIKV is an RNA