Contents lists available atScienceDirect
Theoretical
Computer
Science
www.elsevier.com/locate/tcs
New
insights
on
neutral
binary
representations
for
evolutionary
optimization
Marisol
B. Correia
a,
b,
∗
aESGHT,UniversityoftheAlgarve,Faro,Portugal bCEG-IST,UniversidadedeLisboa,Lisbon,Portugala
r
t
i
c
l
e
i
n
f
o
a
b
s
t
r
a
c
t
Articlehistory: Received21May2015
Receivedinrevisedform7March2016 Accepted21May2016
Availableonline26May2016 CommunicatedbyI.Petre Keywords:
Redundantbinaryrepresentations Neutrality
Errorcontrolcodes Propertiesofrepresentations Evolutionaryalgorithms NK fitnesslandscapes
ThispaperstudiesafamilyofredundantbinaryrepresentationsNNg(,k),whicharebased onthemathematicalformulationoferrorcontrolcodes,inparticular,onlinearblockcodes, whichareusedtoaddredundancyandneutralitytotherepresentations.Theanalysisofthe propertiesofuniformity,connectivity,synonymity, localityand topologyofthe NNg(,k) representationsis presented,aswellas theway an(1+1)-ES canbemodeledusingMarkov chainsandappliedtoNK fitnesslandscapeswithadjacentneighborhood.
The results show that it is possible todesign synonymouslyredundant representations that allow an increase of the connectivity between phenotypes. For easy problems, synonymouslyNNg(,k)representations,withhighlocality,andwhereitisnotnecessary topresenthigh valuesofconnectivityarethemostsuitableforanefficient evolutionary search. On the contrary, for difficult problems, NNg(,k) representations with low locality,whichpresentconnectivitybetweenintermediatetohighandwithintermediate values ofsynonymity are thebest ones.Theseresults allow toconclude thatNNg(,k) representationswithbetterperformanceinNK fitnesslandscapeswithadjacent neighbor-hood do not exhibit extreme values of any of the properties commonly considered in the literature ofevolutionary computation. This conclusion is contrary to what one would expectwhen takinginto accountthe literature recommendations. Thismay help understandthecurrentdifficultytoformulateredundantrepresentations,whichareproven tobesuccessfulinevolutionarycomputation.
©2016ElsevierB.V.Allrightsreserved.
1. Introduction
Theinfluenceofredundancy andneutralityon the behavioroftheevolutionarysearch hasbeendiscussedinthe evo-lutionary computation (EC) field. Some studies consider that the capacity that random mutations can contribute to the improvementoftheevolutionarysearch canbeachievedusingredundantrepresentations
[1,2]
,whileothersalertthatthe additionofrandomredundancydoesnotappeartoimprovethatcapacity[3]
.TheneutraltheoryofmolecularevolutionproposedbytheJapanesescientistKimura
[4]
andtheexistenceofredundancy and neutrality in thegenetic code which appears in several waysare considered to be responsible for the evolutionary capacityoflivingbeings.Kimuraobservedthatinnaturethenumberofdifferentgenotypeswhichstorethegeneticmaterial*
Correspondenceto:ESGHT,UniversityoftheAlgarve,Faro,Portugal. E-mailaddress:mcorreia@ualg.pt.http://dx.doi.org/10.1016/j.tcs.2016.05.033 0304-3975/©2016ElsevierB.V.Allrightsreserved.
of an individual greatly exceeds the number of differentphenotypes, thus, the representation which describeshow the genotypesareassignedtothephenotypesmustberedundant,andneutralmutationsbecomepossible.
Amutationisneutralifitsapplicationtoagenotypedoesnotresultinachangeofthecorrespondingphenotype.Aslarge parts ofthegenotype havenoactual effecton thephenotype,evolution canuse themasa storeforgenetic information thatwas necessaryforsurvivalinthepast,andasaplaygroundfordevelopingnewpropertiesintheindividualthatcould become advantageous in the future [5]. This has motivated the proposals of redundant representations in evolutionary computation
[6–8]
.Thedynamicsofevolutionarychangeinbiologyandofsearchinevolutionaryoptimizationcanbevisualizedasa popu-lationclimbingthepeaksofafitnesslandscape
[9]
.Geneticchangessuchasmutationscanbevisualizedasmovementsona dimensionalgeographicalmapscaleduptoadimensionalitycorrespondingtothenumberofgenes.Somefitnesslandscapes maycontain ‘ridges’orlevelconnectedpathwaysofthesame fitness,whichareconnectedvia single potentialmutations andthatareconsideredneutralnetworks[10]
.There is a great interest in how redundantrepresentations andneutral search spaces influencethe behavior andthe evolvability ofevolutionary algorithms (EAs). Diverseredundant representations were usedand, withsome ofthese, the performance oftheEAs improvedandthe evolvability,whichisdefinedastheabilityofrandomvariations tosometimes produce improvements andasthegenomes abilityto produceadaptive variantswhen actedupon by thegenetic system
[11],isincreased
[2,1,5]
.Withothers,asexplainedin[12,3]
,addingredundancyappearedtoreduceperformancewithmost oftheproblemsapplied.Thelossofdiversityinthepopulation,thelargersizeofthegenotypicsearchspaceandthefact thatdifferentgenotypesthatrepresentthesamephenotypecompeteagainsteachother,aresomeofthereasonsstatedfor the deterioration ofEAsperformance. Onthe contrary,theability ofapopulation to adaptafterchanges (redundantand neutralrepresentationsallowapopulationtochangethegenotypewithoutchangingthephenotype)isoneofthereasons givenfortheincreasedperformanceofEAs.Theresearchexplainedinthispaperisanimprovementoftheresearchpresentedin
[13,14]
,wheretheeffectof redun-dancy andneutrality inthebehavior ofa simpleevolutionaryalgorithmusingtwo differentfamilies ofredundantbinary representations were studied. These families of representations proposed in [15–17], consist ofneutral binary represen-tations, denoted as NNg(,
k)
, based on the mathematical formulation of errorcontrol codes and of non-neutral binaryrepresentations, denotedasNonNNZ
(
,
k)
,basedonlineartransformationsandwhichdevelopment wasperformedincre-mentally, startingwitha non-codingredundantrepresentation,andincorporatingadifferentproperty ineachphase, first polygeny,followedbypleiotropy.
ThisresearchgivesnewinsightsintothewaytheenumerationofNNg
(,
k)
representationscanbedonethroughtheclo-sureofan initialcanonicalrepresentationunderaneighborhoodrelationship,aswell aspresentsastudyoftheproperties ofrepresentations,suchas,uniformity,connectivity,synonymity,localityandtopology,whicharecommonlyanalyzedinthe literatureofECfield,andwhicharealsostudiedforthefamilyofNNg
(,
k)
representations.Theperformanceoftheserepre-sentationsinNK fitnesslandscapeswithadjacentneighborhoodareanalyzedusingasimple(1+1)-ES (EvolutionaryStrategy)
[18,19],whichismodeledusingMarkovchains.Thiscanbedonebecause(1+1)-ES correspondstoastochastichillclimbing
[20],whereanindividualparentgeneratesanindividualchildandthebestofbothbecomestheparentinthenextiteration. Theoutlineofthispaperisasfollows.Section2explainshowredundancyandneutralityaremanifestedinbiology,how they areusedinECfield andthewaysasredundancy andneutralitycanbeincorporatedinredundantbinary representa-tions. Insection3somepropertiesofthistype ofrepresentationsareexposed.Theerrorcontrolcodesusedto definethe neutral binaryrepresentations arepresented insection 4.The familyofneutralbinary representationsNNg
(,
k)
arepre-sentedinsection5,includingthealgorithmsofcodinganddecodingandthedefinitionofcanonicalneutralrepresentation. Section 6explainsthedefinitionsandalgorithmsusedtoenumeratethedifferentneutralrepresentationsthatcanbe gen-erated.The propertiesexplainedinsection3are extendedtotheNNg
(,
k)
representationsandtherelationshipsbetweenthesepropertiesare exhibitedinsection7.A briefexplanationaboutthe NK fitnesslandscapesisgiveninsection 8.The waysasNNg
(,
k)
representationscanchangetheruggednessofNK fitnesslandscapesareanalyzedinsection9.Section10presents how Markovchains can beused to modelan (1+1)-ES,presenting thetransitionmatrix foreach type of repre-sentation andtheprobability of reaching theglobaloptimum ofthe NK fitness landscapes.Section 11 presentshow the performance ofan(1+1)-ES is consideredasthe expectedvalue toachievethebestsolutionandshowsan analysisofthe performance ofNNg
(,
k)
representations versus theproperties oftheserepresentations. Finally,section 12 discusses theconclusionsandpresentssomeguidelinesforfuturework.
2. Redundantbinaryrepresentations
CharlesDarwin’stheoryofnaturalselection
[21]
assumesthatnaturalselectionisthedrivingforceofevolutionandthat randomlyadvantageousmutationsarefixedduetonaturalselectionandcanbepropagatedfromgenerationtogeneration. Ontheother hand,theneutraltheoryofmolecularevolutionassumesthatthedrivingforceofmolecularevolutionisthe randomfixationofneutralmutationratherthanthefixationofadvantageousmutationsbynaturalselection[5]
.Inbiology,thegenotypecorresponds totheinformationstoredinchromosomes,allowing todescribe individualsatthe levelofgenes,whilethephenotypedescribestheappearanceoftheindividual.Inthegeneticcode,therearemanifestations of redundancy andneutrality at severallevels. Forexample,the genetic codeis considered redundantbecause thereare morecombinationsofnucleotidesthanaminoacidsformed,thisiswhatiscalledthedegeneracyofthegeneticcode
[22]
.Also, single secondary structure isgenerally representedby manydifferent sequences, indicatinga highly neutral redun-dantgenotype–phenotypemappingbecausemutations indifferentpositions inthesequencesdonot affectthesecondary structure
[23,24]
.Themappinggenotype–phenotypeorrepresentationusesthegenotypicinformationtobuildthephenotype. Representa-tionsareconsideredredundantifthenumberofgenotypesexceedsthenumberofphenotypes
[5]
.Thenotionofneutrality andneutralnetworkshaveattractedtheattentionofseveralresearchers,duetothepotentialthatneutralnetworkshavein establishingalternativewaysfortheevolutionofthepopulation,allowingtheimprovementofthequalitysearch.Thereisnotasingledefinitionofneutralityanditisimplicitlydefinedasthepresenceofneutralnetworksinthesearch space. Aneutralnetworkcan bedefinedasapath throughgenotypespacevia mutations,whichleave fitnessunchanged
[10], oraspoints inthesearch spacethat are connectedthrough neutralpoint-mutations,where thefitness isthesame forallthepointsinsuchnetwork
[7]
,orasasetofgenotypesconnectedby singlepointmutationsthatmap tothesame phenotype[2]
.Thelatterperspectiveistheoneusedinthisresearch.Whenusingabinaryrepresentation,wheregenotypesandphenotypes arecodedasstringswithlength
,agenotype– phenotypemapping fgofanon-redundantrepresentationisdefinedasfollows:
fg
: {
0,
1}
→ {
0,
1}
where:
∀(
x0, . . . ,x−1)∈ {
0,
1}
,
∀(
y0, . . . ,y−1)∈ {
0,
1}
,
fg(x0, . . . ,x−1)
=
fg(y0, . . . ,y−1)⇒ (
x0, . . . ,x−1)= (
y0, . . . ,y−1)A metric hasto be definedon the search space
when usingsearch algorithms. Based on the metric, the distance
d
(
xa,
xb)
betweentwoindividualsxa∈
andxb∈
describeshowsimilarthetwoindividualsare.Thelargerthedistance,themore differenttwo individualsare [5].In general, differentmetricscan be definedforthesame searchspace, where differentmetrics result indifferentdistances anddifferent measurements ofthe similarityof solutions.In thisway, two individualsareneighborsifthedistancebetweentwoindividualsisminimal.Forexample,whenusingtheHammingmetric
[25] for binary strings, theminimal distancebetweentwo individuals isd
=
1.Therefore, two individuals xa and xb areneighborsifthedistanced
(
xa,
xb)
=
1.A waytoproduce a redundantcode isto increase the numberof genotypes,allowing for each phenotypeto be rep-resentedby severalgenotypes. Tousea redundantrepresentation forsolving anoptimization problem, itis necessaryto define agenotype–phenotypemapping fg anda phenotype–fitnessmapping fp.Denoting
g asthegenotypicspaceand
p asthephenotypicspace,agenotype–phenotypemapping fg determineswhichphenotypesresultfromthedecodingof
eachgenotypesandcanbedefinedasfollows:
fg(xg)
:
g→
pxg
→
xp=
fg(xg)wherexg
∈
g denotesagenotypeandxp∈
p denotesthecorresponding phenotype.The genotype–phenotypemappingfgonlyallowstodeterminewhichgenotypesrepresentaparticularphenotype,butdoesnotgiveanyinformationaboutthe
similaritybetweenthesolutions.Thephenotype–fitnessmappingassignsafitnessvalue fp
(
xp)
toeachphenotypexp∈
p:fp
(
xp):
p→ R
Themetricsusedmaybedifferentinthegenotypicsearchspaceandinthephenotypicsearchspace.Themetricofthe phenotypicsearchspaceisdeterminedbytheproblemtobesolved,i.e.,dependsonthenatureofthedecisionvariables.For example,ifthe problemtobe optimizedistheNK fitnesslandscape,theHammingmetriccan beusedinthephenotypic search space. Onthe contrary, thegenotypic search spacemetricdependsnot only onthe space, butalso onthe search operatorthatisusedonthegenotypes.The searchoperatorandthemetricusedinthegenotypicsearch spacedetermine eachotherandcannotbechosenindependentlyofoneanother.
3. Propertiesofredundantbinaryrepresentations
Inorderto clarifysome ofthequestionsin theliteratureaboutunderwhichcircumstances whichtypesofredundant representationscanbebeneficialforEA,Rothlauf
[5]
identifiesthefollowingproperties:Uniformity Arepresentationisuniformlyredundantifallphenotypesarerepresentedbythesamenumberofgenotypes; Connectivity Arepresentationhashighconnectivityifthenumberofphenotypeswhichareaccessiblefromagiven
phe-notypebysinglebitmutationsishigh;
Synonymity Arepresentation issynonymouslyredundantif thegenotypes that are assignedto thesame phenotypeare similartoeachother;
To understand theproperties presented,it is importantto clarify thenotion of phenotypicneighborhood. For a non-redundant binary representation,wherein the phenotype is of length k and inthat the reachable phenotypes from any phenotype are all different, the connectivityis k. While inredundantrepresentations, the phenotypicneighborhood of a phenotype corresponds to thephenotypes that are reachable froma particularphenotype by single bitmutations of the genotypes thatrepresentit,connectivitycorrespondsto thenumberofdifferentphenotypes, whichconstitutethe pheno-typic neighborhoodofaphenotype.Forexample,foranuniformredundantrepresentation,wheregenotypeshavelength
andphenotypeshavelengthk,ifphenotypexp
=
fg(
xg)
isrepresentedbythefollowingm genotypes,{
xg1,
. . . ,
xgm}
andaseach genotypehas
neighbors,thenthesetofneighborsofthem genotypesis
{
xg11,
. . . ,
xg1,
. . . ,
xgm1,
. . . ,
xgm}
.As eachgenotyperepresentaphenotype,thesetofphenotypesofthissetis:
fg
(
xg11), . . . ,
fg(
xg1), . . . ,
fg(
xgm1), . . . ,
fg(
xgm)
Thenumberofdifferentelementsofthissetcorrespondstotheconnectivity.
Accordingto
[5]
,arepresentationissynonymouslyredundantifthegenotypicdistancesbetweenall xg∈
xgp aresmallforall differentxp.Inotherwords,ifforall phenotypesthesumoverthedistancesbetweenallgenotypes thatrepresent
thesamephenotypeisreasonablysmall,i.e.:
xp∈p 1 2xg1∈xpg xg2∈xpg d
(
xg1,
xg2)
(1)wherexg1
=
xg2,d(
xg1,
xg2)
representsthedistancebetweenthegenotypexg1 andgenotypexg2 andxp
g denotesthesetof
genotypesthatrepresentthephenotypexp.The 12 fractionarisesbecausetheHammingdistanceissymmetric.
Furthermore, thelocality ofa representationdescribeshow well thegenotypic neighborhoodcorresponds to the phe-notypic neighborhood.Thelocalityofarepresentationisapropertythatcharacterizes bothredundantandnon-redundant representations.Thelocalitydependsontherepresentationused fg,onthemetricdefinedin
gandonthemetricdefined
in
p.Rothlauf
[5]
proposesthefollowingexpressionforthelocalitydm:dm
=
d(xg1,xg2)=dg min d(
xp1,
xp2)
−
d p min (2)where d
(
xp1,
xp2)
denotes the distance betweenthe xp1 and xp2 phenotypes,d(
xg1,
xg2)
describes the distancebetween the corresponding genotypes xg1 and xg2,dg
min isthe minimumdistancebetweentwo neighboringgenotypesanddpmin
indicates theminimumdistancebetweentwo neighboringphenotypes.When theHammingmetricisusedinphenotypic andgenotypicspaces,dgmin
=
1 anddpmin=
1,respectively. Furthermore,dm=
0 indicatesthat forall genotypesthat areneighbors,theirphenotypesarealsoneighborsandtherepresentationisconsideredperfectaccordingtothelocality.When
dm is low, therepresentation hashigh locality andthis meansthat smallchanges in thegenotypic spacecorrespond to
smallchangesinthephenotypicspaceandvice-versa.
It is fairto say that theimportance and therelationships between theseproperties arenot yet fullyunderstood. For example,Rothlauf
[5]
statedthat usingneutralsearch spaces,wheretheconnectivitybetweenthephenotypes isstrongly increased by the use ofredundant representation,allows many differentphenotypes to be reachedin one single search step, however, using non-synonymously redundant representations results in random search which decreases the effi-ciency of EAs. The author claimed that when using synonymously redundant representations, the connectivity between the phenotypes is not increased. Thisconclusion is not consistent withthe results obtainedwith theneutral redundant representations presented in [14], ascan be verified in section 7.6of thispaper, where synonymouslyredundant repre-sentations allowincreasingthe connectivitybetweenphenotypes, whencomparedwiththecorresponding non-redundant representation.Next,theerrorcontrolcodes
[26,27]
areexplainedaswell asthe wayasthese codesareusedtoadd redundancyand neutralitytoabinaryrepresentation.4. Errorcontrolcodesusedinbinaryrepresentations
Inthebinarycase,thedivisionofthegenotypicspaceintosubspacesinwhichtheminimumHammingdistancebetween genotypes inthe samespaceis,atleast, two,wouldguarantee thatsingle bit mutationsproducegenotypes insubspaces other thanthat oftheoriginal genotype.Thisis anessential characteristicoferrorcontrol codesdesign,andresultsfrom thisareawouldallowtoobtainneutralbinaryrepresentationswiththefollowingcharacteristics:
1. Splittheredundantgenotypicspaceofcardinality2into2−k subspaces,eachwithcardinality2k,insuchawaythat asinglebitchangeinthegenotypecausesatransitionfromonesubspacetoanother;
2. Definebijectivemappingsbetweeneachgenotypicsubspaceandthephenotypicspace,insuchawaythatallgenotypes whichrepresentthesamephenotypeformaconnectedneutralnetwork.
Table 1
Generationofthecodepolynomialsusingtheg(x)=x3+x+1=1011 generatorpolynomial.
U U(x) x3U(x) C(x)=x3U(x)mod g(x) V 0000 0 0 0 0000000 0001 1 x3 x+1 0001011 0010 x x4 x2+x 0010110 0011 x+1 x4+x3 x2+1 0011101 0100 x2 x5 x2+x+1 0100111 0101 x2+1 x5+x3 x2 0101100 0110 x2+x x5+x4 1 0110001 0111 x2+x+1 x5+x4+x3 x 0111010 1000 x3 x6 x2+1 1000101 1001 x3+1 x6+x3 x2+x 1001110 1010 x3+x x6+x4 x+1 1010011 1011 x3+x+1 x6+x4+x3 0 1011000 1100 x3+x2 x6+x5 x 1100010 1101 x3+x2+1 x6+x5+x3 1 1101001 1110 x3+x2+x x6+x5+x4 x2 1110100 1111 x3+x2+x+1 x6+x5+x4+x3 x2+x+1 1111111
Theidea ofusingerrorcontrol codesistoadd redundancyto themessagetobetransmittedinawaythat allowsthe receivertodetectand,ifpossible,tocorrecterrorsthatmayhaveoccurredduringthetransmission.Amongthemanyerror controlcodesavailable, C
(,
k)
blockcodesaredefinedonabinaryalphabetwhichconsistsofasetof2k codewordswith fixedlength,that resultfromtheencoding ofthek bits ofthe datatobetransmitted,aftertheadditionof
−
k checkbitsandwherethecodewordsforma vectorsubspaceofdimensionk.Inparticular,linearblock codesmaybedefinedas follows
[26]
:Definition1(Linearblockcode).Ablockcodeoflength
and2kcodewordsiscalledalinear
(,
k)
codeifandonlyifits2k codewordsformak-dimensionalsubspaceofthevectorspaceofallthe−
tuples overthefieldG F(
2)
.Cycliccodes
[27]
areanimportantsubclassoflinearblockcodesandaregenerallydefinedusingthepolynomialnotation. Inthiscase,acycliccodeC(,
k)
canbesetasfollows:V
(
x)
=
uk−1x−1. . .
+
u0x−k+
c−k−1x−k−1+ . . . +
c0=
x−kU(
x)
+
C(
x)
(3)whereV
(
x)
correspondstothecodepolynomial,U(
x)
isthemessagepolynomialconsistingofthek bitsofthemessageto betransmittedandC(
x)
correspondstotheparity-checkpolynomial.Themessagepolynomial U(
x)
hasdegreeuptok−
1 andbecauseitisabinarycode,itscoefficientsare0 or1.Thecodepolynomial V(
x)
hasdegreeupto−
1.Acycliccodecanbedefinedusingthegeneratorpolynomial g
(
x)
whichhavetobeafactorofx+
1 andthedegreeofg
(
x)
isequalto−
k,thenumberofparity-checkbits.Asaconsequenceoftheseproperties,anycodepolynomialismultiple ofthe generatorpolynomial andthegenerator polynomialis alsoa codepolynomial.On theother hand,anypolynomial factorofx+
1 withdegree−
k couldbeusedasageneratorpolynomial,however,notallpolynomialsintheseconditions seemtoproducecodeswiththedesirablefeatures.Todefineanon-systematiccycliccode,thecodepolynomialsV(
x)
are obtainedmultiplyingthemessagepolynomialsU(
x)
bythegeneratorpolynomial g(
x)
,i.e.:V
(
x)
=
U(
x)
g(
x)
(4)Amorepopularapproachconsistsingeneratingthecodepolynomialsbymultiplying themessagepolynomialsbyx−k
anddividingtheresultsby g
(
x)
,whichallowsthecodetobegeneratedinsystematic form(when themessagepolynomial canbeobtainedbydiscardinggiven−
k componentsfromthecorrespondingcodepolynomial):V
(
x)
=
x−kU(
x)
+
x−kU(
x)
mod g(
x)
(5)Anexampleofthegenerationofthecodepolynomialsusingthegeneratorpolynomial g
(
x)
=
x3+
x+
1=
1011 isgiven inTable 1
.Using equation (4), it is possible to verify that only the code polynomialsare divisible by the generator polynomial. When the received polynomial, which corresponds to the word received after transmission, is divided by the generator polynomial,iftheremainderisnull,thereceivedpolynomialcorrespondstoacodepolynomial,otherwise,thatpolynomial doesnotcorrespondtoacodepolynomialandanerroroccurredduringtransmission.
Incycliccodes,thesyndromepolynomialcanbedefinedas:
Table 2
TableofsyndromepolynomialsanderrorpolynomialsofthecycliccodedefinedinTable 1.
e e(x) s S(x) 1000000 x6 101 x2+1 0100000 x5 111 x2+x+1 0010000 x4 110 x2+x 0001000 x3 011 x+1 0000100 x2 100 x2 0000010 x 010 x 0000001 1 001 1
where Z
(
x)
correspondstothereceivedpolynomial,whichmightnotbeequaltoanyofthecodepolynomialsV(
x)
ifsome errorsoccurredduringtransmission.Inthiscase:Z
(
x)
=
V(
x)
+
e(
x)
(7)wheree
(
x)
istheerrorpolynomial.Thesyndromepolynomialcanbewrittenas:S
(
x)
= [
V(
x)
+
e(
x)
]
mod g(
x)
(8)When Z
(
x)
corresponds to a code polynomial, the errorpolynomial isnull, i.e., e(
x)
=
0.As V(
x)
mod g(
x)
=
0, the syndromepolynomialisequaltotheremainderofthedivisionoftheerrorpolynomiale(
x)
by g(
x)
:S
(
x)
= [
V(
x)
+
e(
x)
]
mod g(
x)
=
V(
x)
mod g(
x)
+
e(
x)
mod g(
x)
=
e(
x)
mod g(
x)
(9)The syndromeofcycliccodesdependsonthe errorpatternanddoesnotdepend onthereceived polynomial Z
(
x)
.Asg
(
x)
hasdegreeof−
k,thedegreeofthesyndromeis,atmost,−
k−
1,andcanbedefinedas:S
(
x)
=
s−k−1x−k−1+
s−k−2x−k−2. . .
+
s1x+
s0 (10)Itisalsopossibletousea tableofsyndromepolynomialsanderrorpolynomialsfordecoding.
Table 2
showsthistable forthecycliccodepresentedinTable 1
.Usingthistable,thesyndromepolynomialhastobecalculatedusingequation(6)
, theerrorpolynomiale(
x)
iscalculatedfromthesyndromepolynomialandusingequation(7)
thecodepolynomialcan be determined.Providedthat g
(
x)
isa primitive(oreven merelyirreducible) polynomial,it iseasyto showthat thecodepolynomialV
(
x)
cannot be oftype xi (otherwise g(
x)
itselfwouldhaveto be ofthesametype andwouldnot be primitiveoreven irreducible). Dueto the linearity of the code,neither can the difference betweentwo code polynomialshave that form, whichimpliesthattheminimumdistancebetweendifferentcodepolynomialsmustbe,atleast,two.Thus, expression
(4)
provides a methodofmappingarbitraryphenotypesonto alinearclass C of codepolynomialsof degreeupto−
1.Furthermore,itsuggeststhatanother2−k−
1 classesofcodepolynomialsmaybedefinedasfollows:V
(
x)
=
U(
x)
g(
x)
+
zi(
x)
(11)foreverynon-zeropolynomialzi
(
x)
ofdegreeupto−
k−
1.Clearly,theseclassesarenolongerlinearbecausethesumoftwo polynomialsinclass Ci isnot inCi,butsincetheycan beobtainedfromthelinearclassC throughtheadditionofa
constant polynomial,theyarecalledthecosets of C .Moreover,changingasingle coefficientinapolynomial V
(
x)
∈
C willhavetheeffectofproducingamemberofoneofthecosetsCi.Thisisthepropertywhichisexploredinerrorcontrolcodes
andprovidesasuitablemechanismformakingthesetsofreachablephenotypesvaryfordifferentgenotypicrepresentations ofthesamephenotype.
5. NeutralbinaryrepresentationNNg
(,
k)
Ifagenotypeisseenasthecodepolynomial V
(
x)
,ofdegree−
1,ofalinearcodeC(,
k)
,generatedbythegenerator polynomial g(
x)
,andthephenotypeasthemessagepolynomial U(
x)
,ofdegreek−
1,expression(5)
allowstodefine the genotypebasedonthephenotype.Asgenotype V(
x)
belongstothe0 cosetdefinedbythegeneratorpolynomial g(
x)
,the representationofU(
x)
ineachofthe2−kcosetscanbedefinedbyselecting2−kcosetleaderszi
(
x)
:Vi(x
)
=
V(
x)
+
zi(x)
(12)ToallowthatthegenotypesVi
(
x)
,whichrepresentaU(
x)
,formaneutralconnectednetwork,itisenoughthatthecosetleaders zi
(
x)
definethemselves aconnectedgraphby linksresultingfromone bitchange.When U(
x)
corresponds tothe0 phenotype, V
(
x)
isalsoequaltozeroandthe2−k polynomialsVi
(
x)
correspondtothe2−k cosetleaderszi(
x)
.Inthiscontext,andforthisreason,thecosetleaderswillbedesignatedbythezeros oftherepresentationandtheneutralnetwork definedbythecosetleadersformsaneutralrepresentationoftheNNg
(,
k)
(NeutralNetwork)family:Definition2(Neutralrepresentation).Aneutralrepresentationinthe NNg
(,
k)
familyconsistsofamapping fromasetofbinarystringsoflength
(thegenotypicspace)tothesetofbinarystringsoflengthk (thephenotypicspace),withk
<
, whichischaracterizedbythegeneratorpolynomial, g(
x)
,ofdegree−
k,andbythesetof2−kgenotypes:R
= {
z0,z1, . . . ,z2−k−1},
zi∈
Ci,i=
0, . . . ,
2−k−
1whereeach Ci denotesthesubsetofallgenotypeswithsyndromei,andeach zi isconsideredthecosetleaderofCiandis
calledzero.Foreachpair
(
zi,
zj)
,eitherd(
zi,
zj)
=
1,orthereisasetofgenotypes{
w1,
. . . ,
wp}
⊂
R, p=
d(
zi,
zj)
−
1,suchthat:
d
(
zi,w1)=
1d
(
wp,zj)=
1d
(
wt,
wt+1)=
1,
∀
t,
1≤
t<
pThecodingofphenotypeU
(
x)
tothegenotypes Vi(
x)
,where0≤
i≤
2−k−
1,giventhegeneratorpolynomial g(
x)
andtheneutralrepresentation
{
z0,
. . .
,z2−k−1}
ofafamilyNNg(,
k)
,isexplainedinAlgorithm 1
: Algorithm1Coding.1: Input:U(x),g(x),{z0,. . . ,z2−k−1}∈NNg(,k)
2: Output:AllVi(x)
3: CalculateV0(x)=x−kU(x)+x−kU(x)mod g(x)whichcorrespondstothecodepolynomial(‘coset’0 withsyndrome0). 4: for i=0 to2−k−1 do
5: CalculateVi(x)whichresultsfromtheadditionofzero zi(x)withpolynomialV0(x) 6: end for
Theprocess ofdecoding fromgenotype V
(
x)
tophenotype U(
x)
,giventhegenerator polynomial g(
x)
andthe neutral representation{
z0,
. . .
,z2−k−1}
ofNNg(,
k)
,isperformedbyAlgorithm 2
:Algorithm2Decoding.
1: Input:V(x),g(x),{z0,. . . ,z2−k−1}∈NNg(,k)
2: Output:U(x)
3: DeterminethesyndromeS(x)=V(x)mod g(x)
4: AddV(x)withthezero zi(x)oftheneutralrepresentationwhichhassyndromeS(x)
5: CalculatethequotientofthedivisionofV(x)+zi(x)byx−ktoobtainthephenotypeU(x)anddismisstheremainder
As an example,it ispresented thedecoding ofgenotype V
(
x)
=
x5+
x4+
x2 or v=
0110100=
52, usingthe neutral representation{
0,
1,
9,
77,
4,
14,
6,
73}
ofthefamilyNNg(
7,
4)
,generatedbythegeneratorpolynomial g(
x)
=
x3+
x+
1.Thesyndromeis S
(
x)
=
V(
x)
mod g(
x)
=
x2+
1,andthezero withthesamesyndrome is zi(
x)
=
x3+
x2+
x.ThepolynomialadditionofV
(
x)
+
zi(
x)
= (
x5+
x4+
x2)
+ (
x3+
x2+
x)
=
x5+
x4+
x3+
x andthequotientofthedivisionofV(
x)
+
zi(
x)
byx−k allowstoobtainthephenotypeU
(
x)
=
x2+
x+
1.5.1. Equivalentrepresentations
Havinginmindthat thevalues ofthegenotypesare not evaluateddirectly, itisassumedthat thevalue ofeach gene (0or1) isnotinherentlyimportant.Inthissense,itispossibletoestablishan equivalencerelationshipbetweenthe rep-resentationdefined by R1
= {
z0,
z1,
. . . ,
z2−k−1}
∈
NNg(,
k)
and R2= {
z0+
c,
z1+
c,
. . . ,
z2−k−1+
c}
∈
NNg(,
k)
, foranyconstant c.In fact,foreach genotype xg1 thatrepresents xp
=
fg1(
xg1)
, thereisa corresponding xg2 genotype,such thatxp
=
fg2(
xg2)
,andanymutationofxg1 correspondstothesamephenotypethat thesamemutationapplied toxg2,which meansthatthephenotypicneighborhoodofthetworepresentationsareequal.Torepresent theset ofrepresentations whosezeros differ only by a constant c,it was decided to choosea canonical representation.
Definition3(Canonicalneutralrepresentation).Therepresentationdefinedbythesetofzeros R
= {
z0,
z1,
. . . ,
z2−k−1}
,where the index i corresponds to thesyndrome of each zi, is a canonical representationof NNg(,
k)
if andonly ifthe vector(
z0,
z1,
. . . ,
z2−k−1)
istheminimumlexicographicofthesetofvectorswhichcorrespondtoall equivalentrepresentations{
z0+
c,
z1+
c,
. . . ,
z2−k−1+
c}
, c∈ {
0,
1}
. The syndrome of zi+
c can be different from the syndrome of zi and thecomponentsofthevectorsareorderedbythesyndromeofzi
+
c.GivenarepresentationR
= {
z0,
z1,
. . . ,
z2−k−1}
,thecanonicalequivalentrepresentationcanbeeasilydeterminedamongthe2−krepresentationsconstructedasR
i
= {
z0+
zi,
z1+
zi,
. . . ,
z2−k−1+
zi}
,foreachzi∈
R.Forexample,Table 3
presentstherepresentationequivalentsto
{
0,
1,
9,
77,
4,
14,
6,
73}
andTable 4
showstheequivalentsrepresentationswithzerosTable 3 Equivalentrepresentationsto{0,1,9,77,4,14,6,73}∈NNg(7,4). {0,1,9,77,4,14,6,73} zi 0 {0,1,9,77,4,14,6,73} z0 1 {1,0,8,76,5,15,7,72} z1 9 {9,8,0,68,13,7,15,64} z2 77 {77,76,68,0,73,67,75,4} z3 4 {4,5,13,73,0,10,2,77} z4 14 {14,15,7,67,10,0,8,71} z5 6 {6,7,15,75,2,8,0,79} z6 73 {73,72,64,4,77,71,79,0} z7 Table 4 Equivalentrepresentationsto{0,1,9,77,4,14,6,73}∈ NNg(7,4)withzeros orderedbysyndrome.
Req1 {0,1,9,77,4,14,6,73} Req2 {0,1,76,8,15,5,72,7} Req3 {0,68,9,8,15,64,13,7} Req4 {0,68,76,77,4,75,67,73} Req5 {0,10,2,77,4,5,13,73} Req6 {0,10,71,8,15,14,67,7} Req7 {0,79,2,8,15,75,6,7} Req8 {0,79,71,77,4,64,72,73}
6. Enumerationofneutralrepresentations
Although each representationof NNg
(,
k)
has beendefinedby the choice oftheir set ofzeros,it isimportant not toforget that the elements in this set have restrictions of syndrome and connectivity. For this reason, it is important to considerhowrepresentationsofsuchafamilycanbegenerated.
The proposed approach is to define a neighborhood relationship between representations of the same family and to calculateasetofcanonicalrepresentationsastheclosure
[28]
ofaninitialcanonicalrepresentationunderthatrelationship. The closure of an initial canonical representation of a family of representations under the neighborhood relationship is obtainedbyreplacingonezero totherepresentation,untilnomorerepresentationscanbecreatedforthatfamily.Definition4(Neighboringneutralrepresentations).Twoneutralrepresentationsofa NNg
(,
k)
family definedby thesets ofzeros R1
= {
z1,0,
z1,1,
. . . ,
z1,2−k−1}
andR2= {
z2,0,
z2,1,
. . . ,
z2,2−k−1}
areconsideredneighboringifandonlyifthesets R1and R2 differbyoneelement,i.e.,
∃
i∈ {
0,
. . . ,
2−k−
1}
:∀
j∈ {
0,
. . . ,
2−k−
1}
, j=
i⇔
z1,j=
z2,j.It is importantto note that neighboringrepresentations ofa canonical representation willnot necessarily have tobe canonicalrepresentations.However,itispossibletoadoptadefinitionofalternativeneighborhood,appliedtothecanonical representationsset.Itshouldbenotedthattheadditionofthesameconstantc tothezeros oftwoneighboring representa-tionsresultsintotwootherrepresentations,eachoneneighboringofeachother.
Definition5 (Neighboringcanonicalneutralrepresentations). Twocanonical neutral representations of the NNg
(,
k)
familydefinedbythesetsofzeros R1
= {
z1,0,
z1,1,
. . . ,
z1,2−k−1}
andR2= {
z2,0,
z2,1,
. . . ,
z2,2−k−1}
areconsideredneighborsifandonlyifthereisathirdrepresentationR3,notnecessarilycanonical,thatisneighborsofR1 accordingto
Definition 4
,whichisequivalenttoR2.
Algorithm 3calculatestheneighboringrepresentationsofacanonicalneutralrepresentation.Oneneutralrepresentation ofNNg
(,
k)
willhaveamaximumof2−kneighboringrepresentations,wheresomeofthemcanbeequaltothe
represen-tationinquestionby symmetryissues.Itisimportanttonote thataneighboringrepresentationisvalidwhentheneutral networkisconnectedanditisonlyanalyzedafteritscanonicalequivalentrepresentationhasbeenfound.
The neighboringrepresentationsof
{
0,
1,
2,
3}
∈
NNg(
4,
2)
andtheequivalentcanonical representationsareindicatedin Table 5.6.1. ClosureofNNg
(,
k)
underNeighborhoodrelationshipThe determination of the canonical representations of the NNg
(,
k)
family can be performed on the basis of thecalculationoftheclosureofaninitialcanonicalrepresentationundertheNeighborhood relationshipbetweencanonical rep-resentation(Definition 5). Assumingthatthe canonicalrepresentations setisclosedundertheneighborhood relationship, thiscalculationwillenumerateall representations.Intheabsenceofademonstrationofthevalidity ofthisassumption,it ispossibletoconsiderthat,atleast,apartofthedesiredrepresentationswillbefound.
Algorithm3CalculateneighboringrepresentationsNeighRep.
1: Input:Neutralrepresentation{0,z1,. . . ,z2−k−1}∈NNg(,k)
2: Output:NeighboringrepresentationsofneutralrepresentationNeighRep 3: R= {z0,z1,. . . ,z2−k−1}
4: NeighboringrepresentationssetisemptyNeighRep= ∅ 5: while therearezeros zi∈R do
6: Calculateneighboringofzero zi,suchthatNeigh(zi)= {neigh1(zi),neigh2(zi),. . . ,neigh(zi)}
7: while thereareneighm∈Neigh(zi)do
8: Calculatesyndromes(neighm)
9: Searchthezero withthesamesyndromes(zj)=s(neighm)
10: Replacethezero foundzjwithneighm
11: if representationR isconnected then
12: DeterminethecanonicalrepresentationequivalenttoR 13: AddtoNeighRep 14: end if 15: end while 16: end while Table 5 Neighboringrepresentationsof{0,1,2,3}∈NNg(4,2)
andtheequivalentcanonicalrepresentations. R1 {0,1,2,4} ⇔ {0,1,2,4} R2 {0,8,2,3} ⇔ {0,1,2,10} R3 {0,1,5,3} ⇔ {0,1,2,4} R4 {9,1,2,3} ⇔ {0,1,2,10} R5 {0,6,2,3} ⇔ {0,1,2,4} R6 {0,1,2,10} ⇔ {0,1,2,10} R7 {7,1,2,3} ⇔ {0,1,2,4} R8 {0,1,11,3} ⇔ {0,1,2,10}
Thealgorithmstartswiththe
{
0,
1,
. . . ,
2−k−
1}
representation.ThisisacanonicalrepresentationofNNg(,
k)
,whateverand k areconsideredandcorrespondstotherepresentationswithnon-codingbits.Thus,thestartingsetis:
A
= {{
0,
z1, . . . ,z2−k−1}}
(13)ThesetD ofneighboringrepresentationsofeachrepresentationiscalculatedby
Algorithm 3
.Algorithm 4
performsthe calculationoftheclosure A undertheNeighborhood relationship:Algorithm4CalculatetheclosureA underNeighborhood relationship.
1: Input:A= {0,1,. . . ,2−k−1}∈NN g(,k)
2: Output:Closure A 3: Closure A=A
4: while therearerepresentationstoanalyzeinClosure A do
5: CalculateD,thesetofneighboringcanonicalrepresentationsoftherepresentationanalyzed 6: for eachneighboringrepresentationinD do
7: if neighboringrepresentationdoesnotbelongtoClosure A then 8: AddtoClosure A
9: end if
10: end for
11: end while
Table 6presentsthenumberofcanonicalneutralrepresentationsofNNg
(,
k)
foundfor0<
k≤
11,whenthenumberofredundantbitsisin0
<
−
k≤
4 andthegeneratorpolynomial g(
x)
hasdegree0<
−
k≤
4.Asobserved,the numberofneutralrepresentationsgrowsrapidlywithk andvery rapidlywiththenumberof redun-dantbits
−
k. Thishuge numbercreatesdifficulties notonly onexecutiontime butalso onthe storagecapacityofthe generatedrepresentations.Forexample,fora|
p|
=
22=
4 phenotypic space, if3 redundantbits areadded,266 neutralrepresentationsareobtained,while1 845 628arereachedwhen4 redundantbitsarejoined.
As thenumber ofneutralrepresentations obtainedforsome NNg
(,
k)
is very high, itwas decided to usea freewarelibrary, Oracle Berkeley DB that provides a high performance bases from embedded data and can be used with the C programminglanguage,C++,Java,Perl, Python,amongothers.Inthiscase, theClanguage wasused.Thisdatabaseallows youtomanipulatedataintheorderofterabytesinvarioussystemsincludingUNIXandMicrosoftWindows.
7. PropertiesofNNg
(,
k)
Toanalyzetherelationshipsbetweenthepropertiesoftherepresentationsandtheperformance ofNNg
(,
k)
appliedtoTable 6
NumberofneutralrepresentationsofNNg(,k).
x+1 x2+x+1 x3+x+1 x4+x+1 g(x) 1 2 4 19 1489 2 3 9 266 1 845 628 3 4 18 1828 4 5 32 8128 5 6 50 25 100 6 7 75 66 750 7 8 108 158 784 8 9 147 342 701 9 10 196 690 748 10 11 256 1 307 528 11 12 324 2 350 336 k
Fig. 1. Hypergraph representing a genotypic space with|g| =2=27.
Fig. 2. Hypergraph representing the genotypic space with=4 and the phenotypes of the neutral representation{0,1,2,4} ∈NNg(4,2). 7.1. Uniformity
The NNg
(,
k)
familyisconsidereduniformbecauseall phenotypesarerepresentedby thesamenumberofgenotypes.IneachrepresentationofNNg
(,
k)
,eachphenotypeisrepresentedby2−kgenotypes.7.2. Phenotypicneighborhoodandconnectivity
Using hypergraphsisa wayofvisualizingthephenotypicneighborhood ofa representation.
Fig. 1
displays agenotypic spacewithcardinality|
g|
=
2=
27,whereforthesakeofsimplicity,onlythe1100101 genotype()andits7neighbors(
#
)arepresented,whileforallothergenotypes,onlythefirst4bitsareindicated.Forexample,
Fig. 2
presentsahypergraphofthegenotypicspaceoftheneutralrepresentation{
0,
1,
2,
4}
∈
NNg(
4,
2)
.InTable 7
Genotypesandcorrespondingphenotypes[]ofneutralrepresentation{0,1,2,4}∈NNg(4,2).
syndrome 0 syndrome 1 syndrome 2 syndrome 3
0000 [00] 0001 [00] 0010 [00] 0100 [00]
0111 [01] 0110 [01] 0101 [01] 0011 [01]
1001 [10] 1000 [10] 1011 [10] 1101 [10]
1110 [11] 1111 [11] 1100 [11] 1010 [11]
Table 8
Neighboringgenotypesandcorrespondingphenotypesoftheneutralnetworkthatrepresents phenotype01 of{0,1,2,4}∈NNg(4,2). Neigh. gen. 1 2 3 4 0111 0110 [01] 0101 [01] 0011 [01] 1111 [11] 0110 0111 [01] 0100 [00] 0010 [00] 1110 [11] 0101 0100 [00] 0111 [01] 0001 [00] 1101 [10] 0011 0010 [00] 0001 [00] 0111 [01] 1011 [10] Table 9
MaximumconnectivityforeachNNg(,k).
x+1 x2+x+1 x3+x+1 x4+x+1 x5+x2+1 g(x) 1 1 1 1 1 1 2 2 3 3 3 (3) 3 3 5 7 (7) (7) 4 4 7 14 (15) (15) 5 5 9 18 (30) (31) 6 6 11 22 (43) (59) 7 7 13 26 (55) (98) 8 8 15 30 (63) (125) 9 9 17 34 (–) 10 10 19 38 (–) 11 11 21 42 (91) k
Table 7presentsthegenotypesandthecorrespondingphenotypes[]obtainedusingtheneutralrepresentationpresented in
Fig. 2
.Astherepresentationisuniform,each phenotypeisrepresentedby 2−k=
22=
4 genotypes,whichcorrespondstothenumberofzeros oftheneutralrepresentation.Furthermore,aseachphenotypeisrepresentedby aneutralnetwork with 2−k genotypes, each ofwhich has
neighboring genotypes, including, atleast, a genotype which decodes to the
samephenotype,thephenotypicneighborhoodofeachphenotypehas,atmost,2−k
(
−
1)
differentphenotypes.Theactualnumberofdifferentphenotypesinthisneighborhooddeterminestheconnectivityoftherepresentation,whichisthesame forallphenotypes definedbythe representation,although thereachedphenotypescan bedifferent. Ascan beconfirmed with
Table 8
,theconnectivityof{
0,
1,
2,
4}
∈
NNg(
4,
2)
is3,becauseitispossibletoreach3differentphenotypes(excludingitself). When comparing thisrepresentation witha non-redundant representation withphenotype length of2, in which eachphenotypecanonlyreach2 differentphenotypes,theneutralrepresentation
{
0,
1,
2,
4}
∈
NNg(
4,
2)
canreachalargernumberofdifferentphenotypes,inthiscase 3.
Inconclusion,thewayastheneutralnetworksoftheserepresentationsarestructuredallowedthat:
1. AllneutralrepresentationsNNg
(,
k)
haveaconnectivityhigherorequaltotheconnectivityofthecorrespondingnon-redundantrepresentation,i.e.,higherthanorequaltok;
2. The connectivity of anyrepresentation of NNg
(,
k)
does not vary with the phenotype andthe number ofdifferentreachablephenotypesisequalforallphenotypesofthesameneutralrepresentation; 3. Differentgenotypesofaneutralnetworkcanreachdifferentphenotypes.
Table 9presentsthemaximumconnectivityforeachNNg
(,
k)
,wherefortheneutralrepresentationswhosenumerationwasnotpossibleduetothelimitationsexplainedinsection5,alowerlimitforthemaximumconnectivitywasobtainedby generatingneutralrepresentationsfromtheneutralrepresentationwiththegreatestconnectivityfounduntilthemoment, andleavingthealgorithmrunninguntilnomoreincreasinginconnectivitywasdetectedforagivenperiodoftime. These valuesappearinside()in
Table 9
.Notethatthemaximumconnectivityappearstoincreaseexponentiallywiththenumber ofredundantbits−
k.Fig. 3. MDS of some NNg(7,4)representations.
Fig. 4. MDS of some NNg(14,11)representations.
7.3. Topology
As the NNg
(
7,
4)
representations proposed exhibit neutral networks, the topology is an appropriated property to beanalyzed.Inordertovisualizethetopology ofaneutralnetwork,atechnique calledMDS(Multi-DimensionalScaling)was used [30]. This technique consists in determining the positions of a set of points in the Euclidean space, such that the distances between them approximate a given measure of dissimilarity of the objects that are to be visualized. Neutral network topologiesmaybe visualizedinthisway,bytakingtheHammingdistancebetweenthegenotypesoftheneutral networksastheMDSdissimilaritymeasure.
Fig. 3
showsthetopologiesofsome neutralnetworksoftheNNg(
7,
4)
family.Whilethenon-codingredundantrepresentations,inwhichthegenotypicredundantbitsdonotaffectthephenotype,display thetopologyofacube,therepresentationswithmaximumconnectivity(14)intheNNg
(
7,
4)
familyexhibitaYtopology.Fig. 4 showsthetopologies ofsome neutralnetworksoftheNNg
(
14,
11)
representations,whichpresentsthe greatestconnectivity(42),while
Fig. 5
displaysthetopologiesofsomerepresentationsoftheNNg(
6,
2)
family.7.4. Synonymity
The calculationofthe synonymityfollowstheequivalence classesapproachproposed by [5],inwhich ifallgenotypes
xg
∈
xpFig. 5. MDS of some NNg(6,2)representations.
Table 10
MinimumandmaximumsynonymityforeachNNg(,k).
x+1 x2+x+1 x3+x+1 x4+x+1 g (x) 1 1 1.33—1.5 1.71—2.18 2.13—2.67 2 1 1.33—1.67 1.71—2.71 3 1 1.33—1.67 1.71—2.86 4 1 1.33—1.67 1.71—2.86 5 1 1.33—1.67 1.71—3 6 1 1.33—1.67 1.71—3 7 1 1.33—1.67 1.71—3 8 1 1.33—1.67 1.71—3 9 1 1.33—1.67 1.71—3 10 1 1.33—1.67 1.71—3 11 1 1.33—1.67 1.71—3 k
to thesameequivalence classissmall, then therepresentationis consideredsynonymouslyredundant. Thus, synonymity forNNg
(,
k)
representationscan be simplifiedusing expression(14)
because thedistances betweenallgenotypes oftheneutralnetworkthatrepresentagivenphenotypeareequalforallphenotypes:
Syn
=
2−k−1 i=0 2−k−1 j=0 d(
zi,zj)(
2−k)
× (
2−k−
1)
(14)where d
(
zi,
zj)
corresponds to the Hamming distance between zero zi and zero zj. This expression gives a normalizedmeasureofsynonymity,sincethetotalnumberofdistanceswereconsidered.Arepresentationisconsideredsynonymously redundantwhenthevalue of Syn islowandnon-synonymouslyredundantwhenthisvalueishigh. Asthesynonymityis
ameasureofhowclosethegenotypesthatrepresentthesamephenotypeare,thesynonymityisrelatedwiththeneutral networkstopology.
Table 10
showstheminimumandmaximumsynonymityforNNg(,
k)
.7.5. Locality
Asthelocality doesnot varywithdifferentphenotypesforNNg
(,
k)
,thismeasurecanbe calculatedonthebasisofasinglephenotype,leadingtoasimplificationofexpression
(2)
.Thus,localityfortheNNg(,
k)
familycanbecalculatedusingtheaveragedistancefromaphenotypetothecorrespondingneighboringphenotypes,includingthephenotypeinquestion.
Table 11
MinimumandmaximumlocalityforeachNNg(,k).
x+1 x2+x+1 x3+x+1 x4+x+1 g(x) 1 0.5 0.33—0.5 0.25—0.50 0.2—0.57 2 0.67—1.0 0.5—1.0 0.4—1.0 3 0.75—1.25 0.6—1.5 0.5—1.58 4 0.8—1.4 0.67—1.67 0.57—2.0 5 0.83—1.5 0.71—1.86 0.62—2.38 6 0.86—1.57 0.75—2.0 0.67—2.78 7 0.88—1.62 0.78—2.0 0.7—2.95 8 0.89—1.67 0.8—2.1 0.73—3.14 9 0.9—1.7 0.82—2.18 0.75—3.25 10 0.91—1.73 0.83—2.17 0.77—3.27 11 0.92—1.75 0.85—2.23 0.79—3.36 k
Fig. 6. Relationships between connectivity, synonymity and locality of NNg(14,11).
There areotherpropertiessuchasinnovationrate
[31]
,diameterofconnectedneutralnetworks, accessibilityof neigh-boringneutralnetworks[32]
andothersthatwillbeanalyzedinafuturework.7.6. RelationshipsbetweenpropertiesofNNg
(,
k)
Fig. 6presentsscattergraphsshowingtherelationshipbetweenconnectivity,localityandsynonymityoftheNNg
(
14,
11)
family, which corresponds tothe familyofrepresentations withthe highestvalue ofk thatwas completelyenumerated. Althoughthescattergraphsonlyshow thepropertiesfortheNNg
(
14,
11)
family,theresultsare similarfortheremainingfamiliesofNNg
(,
k)
.Thesescattergraphsshowthattherearerepresentationswithlowlocalitythatreachhighconnectivityvalues.Thechartattheright-topin
Fig. 6
contradictstheideapresentedin[5]
thatsynonymouslyredundantrepresentations do not allowthe connectivitybetweenthe phenotypesto increase incomparisonwith thecorresponding non-redundant representations. Thecircleinthischart showsthatthere isasetofsynonymouslyneutralrepresentations thathavehigh valuesofconnectivity. TheNNg(,
k)
familyhasavariety ofrepresentationswithdifferentpropertiesthatprovideanim-portantbasis forthestudyoftherelationshipsbetweenthemandwillallowtostudytowhatextentthesepropertiescan affecttheperformanceofanevolutionaryalgorithm.
8. NK fitnesslandscapes
Inordertostudytheeffectofredundancyandneutralityonevolutionaryalgorithmperformance,evolutionarystrategies, originally proposed by [18,19] for integer and continuous spaces, are used in this research for binary spaces and with constantmutationrate.Inparticular,anevolutionarystrategy(1+1)-ES isappliedtoNK fitnesslandscapes
[33]
withdifferent degreesofruggedness.Fig. 7. Hypergraph representing an instance of a NK(3,2)fitness landscape.
TheNK fitnesslandscapesproposedby
[33]
arebasedonthefitnesslandscapesproposedby[34]
.Thisauthorfoundthat whenfitness interactions existbetweengenes,the geneticcompositionofa populationcan evolveinto multipledomains ofattraction[35]
.Ontheotherhand,genescansuppressorenabletheeffectsofothergenes,andthisinteractioniscalled epistasis.Wrightestablishedaconceptuallinkbetweenthismicroscopicpropertyoftheorganisms(epistasis)anda macro-scopicpropertyoftheevolutionarydynamics(multipleattractorsofthepopulationintheareaofgenotypes),andusedthe analogywithalandscapewithmultiplepeakstoillustratethissituation.Fitnesslandscapesmaybedefinedasfollows[36]
:Definition6(Fitnesslandscape). A landscape isa triplet
(
S,
V,
f)
where S is a set ofpotential solutions that matchthe search space, V:
S→
2S is afunction that assigns toeach s∈
S aset ofneighbors V(
s)
⊂
S,defining theneighborhoodstructure,and f
:
S→ R
isafitnessfunctionthatassignsavalueoffitnesstoeachsolutions.The NK fitness landscapes allow toexplore the wayasepistasis can control theruggedness ofthe landscape.Tothis purpose, fitness functions family whose ruggedness could be shaped by a single parameter was developed. NK fitness
landscapesare definedasstochasticallygeneratedfitnessfunctionsofbitstrings withlength N andinteractions between groupsof K
+
1 bits.The fitnessofeach bitdependson its value andthevaluesofthe bitswithwhich itinteracts.For each instanceofthe landscape,fitness valuesassociated witheach bitarerealizationsofarandom variablewithuniform distributionbetween0.0and1.0,andthefitnessofeachchromosomeistheaverageofthebitsfitnessforall N loci.Theruggednessofthelandscapeis controlledby theparameter K andreachesits maximumwhen K
=
N−
1 andits minimumwhen K=
0.When K=
0,thereisnoepistasis,thelandscapehasauniquelocaloptimum(theglobaloptimum), andthefitnessofdifferentchromosomesishighlycorrelatedwiththeHammingdistancebetweenthem.When K=
N−
1, thenumberofinteractionsbetweenbitsismaximumandthefunctionispurelyrandom,sothelandscapehasmanylocal optima,andthefitnessofchromosomesisnotcorrelatedwiththeHammingdistancebetweenthem.The choice of the K bits is known to affect the computational complexity of the NK fitness landscape optimization problem.WhenthesebitsaretheK nearesttothelocusthatisbeingassessed,theproblemispolynomialinN,whereasit becomesNP-hardwhenK
>
1 andsuchbitsarerandomlychosen[37]
.Next the definitions of local optimum and global optimum in NK fitness landscapes are given as well as how the
NNg
(,
k)
representationscanchangetheruggednessoftheNK fitnesslandscapes.9. RuggednessofNK fitnesslandscapes
Alocaloptimum
[38]
correspondstoapointinthefitnesslandscapethathasnoneighborswithhigherfitness.Alocal optimum willbe theglobaloptimum ifitis thebest solutionfortheproblem. A definitionoflocaloptimum andglobal optimumisgiven[36]
:Definition7(Localoptimum).Givenafitnesslandscape
(
S,
V,
f)
,asolutions∗∈
S isalocaloptimumifandonlyif:∀
s∈
V(
s∗),
f(
s)
≤
f(
s∗)
Definition8(Globaloptimum).Givenafitnesslandscape
(
S,
V,
f)
,asolutions∗∈
S isalocaloptimumifandonlyif:∀
s∈
S,
f(
s)
≤
f(
s∗)
(15)AstheNK fitnesslandscapecanbegraduallytunedfromsmoothtorugged,itisagoodfitnessmodeltostudydifferent typesofrepresentations.Awaytocharacterizethenatureofthelandscapeistounderstandtheruggednessorsmoothness ofthe landscape basedon thenumber oflocal optima,also knownas modality [38],and thedistribution ofthose local optima.As itis known,a landscapeisinduced by theoperator whichisused todefine neighborhoods[39]. Inthiscase, as the landscape is defined over the binary space, the Hamming metric is used and the neighborhood relation can be representedbyahypergraph.Next,anexampleisgiventhatshowsthat NNg
(,
k)
representationscanchangethenumberoflocaloptimaoftheNK fitnesslandscapes,changinginthiswaythedifficultyoftheproblem.
Fig. 7 representsan instanceof aNK