j ou rn a l h om epa ge :w w w . i n t l . e l s e v i e r h e a l th . c o m / j o u r n a l s / c m p b
Cardiovascular
risk
analysis
by
means
of
pulse
morphology
and
clustering
methodologies
Vânia
G.
Almeida
a,∗,
J.
Borba
a,
H.
Catarina
Pereira
a,b,
Tânia
Pereira
a,
Carlos
Correia
a,
Mariano
Pêgo
c,
João
Cardoso
aaPhysicsDepartment,ElectronicsandInstrumentationGroup,UniversityofCoimbra,Portugal bIntelligentSensingAnywhere,Portugal
cCardiologyDepartment,HospitalandUniversityCoimbraCenter,Portugal
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:Received16September2013 Receivedinrevisedform 12June2014
Accepted17June2014
Keywords:
Arterialstiffness Pulsewaveanalysis Riskscores Clusteringanalysis
a
b
s
t
r
a
c
t
Thepurposeofthisstudywasthedevelopmentofaclusteringmethodologytodealwith arterialpressurewaveform(APW)parameterstobeusedinthecardiovascularrisk assess-ment.Onehundredsixteensubjectsweremonitoredanddividedintotwogroups.Thefirst one(23hypertensivesubjects)wasanalyzedusingAPWandbiochemicalparameters,while theremaining93healthysubjectswereonlyevaluatedthroughAPWparameters.The expec-tationmaximization(EM)andk-meansalgorithmswereusedintheclusteranalysis,and theriskscores(theFraminghamRiskScore(FRS),theSystematicCOronaryRiskEvaluation (SCORE)project,theAssessingcardiovascularriskusingScottishIntercollegiateGuidelines Network(ASSIGN)andthePROspectiveCardiovascularMünster(PROCAM)),commonlyused inclinicalpracticewereselectedtotheclusterriskvalidation.Theresultfromthe cluster-ingriskanalysisshowedaverysignificantcorrelationwithASSIGN(r=0.582,p<0.01)anda significantcorrelationwithFRS(r=0.458,p<0.05).Theresultsfromthecomparisonofboth groupsalsoallowedtoidentifytheclusterwithhighercardiovascularriskinthehealthy group.Theseresultsgivenewinsightstoexplorethismethodologyinfuturescoringtrials. ©2014ElsevierIrelandLtd.Allrightsreserved.
1.
Introduction
Theatheroscleroticcardiovasculardisease(CVD)isthemost commoncauseofdeathworldwide,resultingfromthe combi-nationofseveralriskfactors[1].Theinternationalguidelines
[2,3]considerthatindividuals withestablishedCVDshould bethefirstpriorityforpreventivemeasuresapplication.The concerninchangingthecurrenthealthcareparadigm,from reactivetowardspreventivecare,aimsatidentifyindividuals forrisk in early stages of disease development, and then,
∗ Correspondingauthor.Tel.:+351239410109.
E-mailaddress:vaniagalmeida@lei.fis.uc.pt(V.G.Almeida).
directmoreeffortsandattentiontotheriskfactors modifica-tion[4,5].Fortunately,thisisanemergenttendencythatcan beaddressedusingthetraditionalriskscores,butalsousing innovativepredictivealgorithms.
Duringthelastyearsmanyriskestimationsystemshave beendevelopedinordertoassistcliniciansintherisk assess-ment, and in the individual chances prediction, for CVD development. Themajor challenges ofthese tools are the capabilitiesto:(1)identifyhighriskindividuals,(2)weightthe individualeffectsofallriskfactors,(3)stratifyororganizewho needslifestyleadviceormedicaltherapy,andfinally(4)avoid
http://dx.doi.org/10.1016/j.cmpb.2014.06.010
overmedicalizationofindividualsatlowrisk[6].Takingthese challengesinto accountseveralriskfactors wereidentified, bytheir association with anincreased risk forCVD devel-opment. CVDrisk assessmenttoolsdiffer from each other ontheselectedriskfactors,thedisease forwhattheywere designed(CoronaryHeartDisease(CHD), heartfailure,etc.),
theselectedeventtype,theconsideredperiodoftime(long or short term) and the cohort location. The most popular areFraminghamRiskScore(FRS),PROspective CArdiovascu-larMünster (PROCAM), ASsessingcardiovascular riskusing ScottishIntercollegiateGuidelinesNetwork(ASSIGN)and Sys-tematicCOronaryRiskEvaluation(SCORE)project.
Thesetoolsareimportanttohelpphysiciansintheirdaily practice. However, its application in different populations remains a topic concerning attention. Theresearch needs to bedirected at refining the accuracyof prediction mod-elsand,mostimportantly,examiningwaysofturningthem into effective clinical tools. Several risk prediction models forcardiovasculardiseaseareavailabletodayandtheirhead toheadcomparisonandapplicationindifferentpopulations wouldbenefitfromstandardizedreportingandformal, con-sistent statistical comparisons. The work presented by [7]
reinforcesthisstatement.Thelimitationsofthecomparison ofdifferentmethodsareassociatedtomissinginformation, whichmakesdifficulttoreachrobustconclusionsaboutthe best model or the ranking of models’ performance. And, additionally moststudies did notstatisticallycompare the modelsthatwere examined.Theinclusionofstandardized reportingofdiscrimination,calibration, andreclassification metricswithformalstatisticalcomparisonswouldcontribute tothesuccessfulapplicationofdifferentriskscoresindistinct populations.
Thetrendsfortheriskoverestimationinlow-risk popu-lationsand underestimationinhigh-risk groupshavebeen successfullydemonstratedbyCooneyetal.[6].Itisknownthat anexaminationof5%SCOREcanequatetoa10–25%FRSrisk, dependingonwhichoftheseveralFRSfunctionsisselected
[3].Haqetal.[8]studiedseveralmethodsforriskestimation (FRS,PROCAM,Dundee,andBritishregionalheart-BRHS)and theresultsdemonstratedacloseagreementbetweenallthese, regardingaverageriskandshowedmoderateagreementfor estimationamongindividuals.Finally,itwasalsoconcluded thatFRSfunctionisacceptablyaccurateinnorthernEuropean populations.
Thearterialstiffnessmeasurementcurrentlyassumesan increasingrole inclinical assessment dueto its predictive valueincardiovascularevents inpatientswithvarious risk levels,suchasitwasdemonstratedbyseveralstudies[9–12]. Thereareseveraladvantagesofusingnon-invasivemethods overinvasivemeasurements,e.g.,thepotentialusein follow-upstrategiesinpopulationswithoutsymptomaticCVD,such aschildrenoryoungadults.Furthermore,non-invasivetools canbeessential tothe CVDassessmentinaddition tothe establishedriskfactorsinpopulationsathighriskaimingthe preventionof coronaryvascular diseases. Inferences about CVDprogressivedevelopmentcanbeassessedbythe anal-ysisofthemechanicalpropertiesofarteriesthroughavariety ofindicesbasedon thePulse WaveAnalysis(PWA) [13,14]. Theanalysis isbased on the identification of the key fea-turesinthe arterialpressure wave profile,suchassystolic
wavetransittime(SWTT),reflectedwavetransittime(RWTT) anddicroticnotch(evaluatedbyleftventricularejectiontime (LVET)), andcan includetime oramplitude considerations, aswellasvariabilitybasedparameters[15].Thewave reflec-tionsareoftenaddressed,intermsoftheaugmentationindex (AIx), which expresses the ratio of the “augmented pres-sure” assigned to the reflected wave towards each overall pulse.
Data mining techniques have attracted a great deal of attention duetotheirabilitytoextractimplicit and poten-tially useful information from large volumes of data [16]. TheirfeasibleimplementationinComputer-AidedDiagnosis (CAD)methodologieshasgivennewinsightsinthe develop-ment ofinnovative and effectivedecision supportsystems forCVDpremature riskassessment[15,17–19]. An interest-ing approachistheexplorationofdifferentclassifiers,asit was proposed by Jovic and Bogunovic [20]. The electrocar-diogram(ECG)classificationproblemwasaddressedusinga combinationofseveralfeaturesintheanalysisoftheHeart RateVariability(HRV).OtherapproachpresentedbyTsipouras et al. [18]was based on the development ofa fuzzy rule-baseddecisionsupportsystemforCADdiagnosis.Ontheother hand,multi-classifiersshouldperformbetterinsome situa-tions,overcomingerrorsfromsingleclassifieranalysis[21]. Theincorporatingofthepredictionoutcomeofeachoneof theindividualclassifierwassuggested,asawaytoreducethe classificationerrors[22].
Ontheotherhand,clusteringanalysisisanotherimportant branch of unsupervised learning that allows the arrange-ment ofobjects into groups(i.e., theclusters), wherein the objectsinthesameclusteraremoresimilar(inoneormore characteristics),thanthoseindifferentclusters[23].Thereis a wide varietyofclustering methodologiesavailable in lit-erature, essentiallyorganized in threegeneral classes [24]. Thethreetypesincludeparametricmodel-based, hierarchi-calandpartitioningalgorithms.Shahetal.[25]haveproved the usefulness and feasibility ofusing clustering risk fac-tors in the detection ofCVD in youth, by the comparison with the Patholobiological Determinants ofAtherosclerosis inYouth(PDAY)riskscore.Otherstudieshavealsoreferred the role of clustering methodologies for CVD assessment, such as the work developedby Haseena et al. [26], where a fuzzy C-mean clustered probabilistic neural network for ECGbeatsdiscriminationwasdescribed.Clustering method-ologieswerealsosuccessfulappliedinothermedicalfields, suchasintheidentificationofpatternsinbloodglucose mea-surementsandregularinsulindosestakenbeforemealtime
[27].
Ouraimisthedevelopmentofaclusteringmethodology todealwitharterialpressurewaveform(APW)based param-eterstocardiovascularriskassessment.Theevaluationwas performedthroughthestrengthoftherelationshipswith tra-ditional risk scores. In the current paper, Section2 details thesubjectsandmethodsusedduringdataanalysis, includ-ingaquickandup-to-dateliteraturesurveyonattemptsfor risk scores and clustering methods used. The results are presented and discussed in Sections3 and 4, respectively. Finally,inSection5someguidelinesforfurtherresearchare presented along with the mainconclusions of the current work.
2.
Methodology
2.1. Database
Thedata used in this study were obtained from 116 sub-jectsdivided into twogroups,asdepicted inFig.1(a).Data were collected withapprovalby theEthical Committeesof theCoimbraHospitalandUniversityCentre(CHUC),Portugal, withinformed consent.Hypertensionwasdiagnosedwhen systolic blood pressure (SBP)≥140mmHg and/or diastolic blood pressure (DBP)≥90mmHg, or if thepatient was tak-inganti-hypertensivemedication.Age,smoking habits,and familiar hypertensive history were recorded by structured questioninginaccordancewiththecriteriausedbyeachone ofthescores.Currentsmokingwasdefinedashavingsmoked thelast cigarettelessthan 1yearbefore.Diabetesmellitus wasconsidered forthose subjects that presentedafasting bloodsugarlevel>126mg/dL,orcurrentprescriptionofanoral hypoglycemicdrugorinsulin.
Twogroupsofsubjectswereanalyzed:GroupCandGroup H.TheGroupHinclusioncriteriaincludeddiagnosed hyper-tensionbyaclinicianandforGroupCyoungsubjectswithout anyknowncardiovascularcomplication.
• GroupHconsistsof23hypertensivesubjects,10menand 13women.Lipidicvaluesweremeasured:theserumtotal cholesterol(Total-CH),thehighdensitylipoprotein choles-terol(CH-HDL)andthetriglyceride(TGL)levels.Allsubjects were tested atthe same time of dayto avoid any diur-nalvariations.Additionally,theAPWparameterswerealso computed:SWTT,RWTT,SWA,LVETandAIx.
• GroupCconsistsof93youngandhealthysubjectsbetween 18 and 30 years. The APW parameters were computed: SWTT,RWTT,SWA,LVETandAIx.
2.2. PWAmeasurements
APWswererecordedatasamplingrateof1kHzwithasingle non-invasivePZprobedevelopedinapreviouswork[28],and showninFig.1(a)(maintext).Theprobeisheldinplacebya neckcollarspeciallydevelopedforcarotidmeasurements.The mechanicalinterfacebetweentheprobeheadandthesensing pointalsoplaysanimportantroleonthedataquality. The probeheadisbasedonamushroom-shapedPZsensorthat transmitsthedistensionassociatedtothepressurewavein suchawaythattransversalandsheareffectsaresuppressed andonlyradialappliedforcesareallowed.Accuracytestswere performedatabenchtest,where1.80%wasthemaximumroot meansquareerror(RMSE)introducedbytheelectroniccircuits andbythemechanicalinterface.
Somedataconcerningvalidationwasalsopublishedin[29]
comprisingthreesetsofrecordingsforthecarotidpressure waveformatleftandrightcarotidarteries,under standard-izedconditions,in20volunteersbythreetrainedoperators. Inter and intra-operator differences were calculated being goodindicators,similartootherdatareportedinliteraturefor commercialdevices[30].
Foreachsubjectatleastthreeconsecutivemeasurements wereperformed.APWparametersweretabulatedfortheset
ofpulses,consideringeachoneofthesubjects,aftertothe segmentation process.ThePWA parameters, schematically representedinFig.2,arerelatedtoamplitudeandtemporal characteristicsofAPW,namely:
• Reflectedwavetransittime(RWTT) isdetermined bythe timeintervalbetweenthefootofthecarotidpressure wave-formtothefirstinflectionpoint,whichcorrespondstothe footoftheglobalreflectedpressurewave.
• Reflectionwaveamplitude(RWA)ismeasuredatinflection pointinreferencetothenormalizedamplitude.
• Systolicwavetransittime(SWTT)isdefinedastheinterval betweenthewaveformfootandthesystolicpeak.
• Leftventricularejectiontime(LVET)ismeasuredbythetime intervalfromthefootofthewaveformtothedicroticnotch. • Augmentation index (AIx) is computed as the quotient betweenthereflectedwaveamplitudeandthepulse pres-sure(PP),expressedasapercentagevalue.Theequations usedforitscomputationweredefinedbyMurgoetal.[31], presentedbelow.ATypeAwaveformisdefinedwhenthe systolicpeakoccursinlatesystoleaftertheinflectionpoint andtheTypeCwhenthesystolicpeakprecedesthe inflec-tionpoint.
AIx=±SWA−RWA PP
2.3. Riskscoresselection
Four risk scores were considered along this work namely, FRS,PROCAM,ASSIGN andSCORE. TheFRS wasdeveloped from ageneralpopulationinFramingham,MA, USA,while the remaining scores derive from European studies. The largest isthe SCORE, sinceit consistsof≈205,000subjects from 12cohortstudiesfrom Europeancountries. Theother two,ASSIGNandPROCAM,were developedinScotlandand Germany, respectively. Ingeneral,these toolshavedistinct characteristicsassociated,differentriskfactors,specific dis-ease,eventtype,periodoftimeandcohortlocations.Table1
summarizesthemaincharacteristicsforeachone.
2.4. Statisticalanalysis
ThedatawereanalyzedusingtheSPSS16statisticalpackage (SPSSInc.,Chicago,IL,USA).Descriptivestatisticswere con-ductedtodescribethesamplecharacteristics.Thenormality results were assessed bythe Kolmogorov–Smirnov test. To determinetherelationshipsamongvariables,theSpearman rank-ordercorrelationtestwasusedduetothenon-normal natureofthedistribution. Afterwards,alsoduetothe non-normaldistributiontheKruskal–Wallistestwasconductedto clustercomparisons.Ap-value<0.05wasconsidered statisti-callysignificant.
2.5. Clusteringanalysis
Duringtheclusterassignments,thepulsesareindependently groupedaccordingtotheirclustersimilarities.Theadopted strategyforthedeterminationoftheidealnumberofclusters
Fig.1–(a)Firstly,biochemical(CH-HDL,highdensitylipoproteincholesterolandTotal-CH,totalcholesterol),pressurevalues (SBP,systolicbloodpressureandDBP,diastolicbloodpressure)andAPWparameters(RWTT,reflectedwavetransittime; RWA,reflectionwaveamplitude;SWTT,systolicwavetransittime;LVET,leftventricularejectiontimeandAIx,
augmentationindex)werecollected.(b)Fourriskscoreswerestudiedtouseasthereferencealongthiswork,namely:FRS, FraminghamRiskScore;PROCAM,PROspectiveCArdiovascularMünsterriskscore;ASSIGN,ASsessingcardiovascularrisk toScottishIntercollegiateGuidelinesNetworkandSCORE,SystematicCOronaryRiskEvaluation.(c)Theexpectation maximization(EM)andk-meansweretheselectedclusteringalgorithms,andtheKruskal–Wallistestwasusedtocompare groups.
consisted,firstly,intheselectionoftwoclusters(k=2)followed byatentativetospliteachoftheseclusters.Duringthe pro-cess,clusterscanalsobemergediftheyaresufficientlyclose or,ifthereistoomanypatternsandunusuallylargevariance. Themaximumlevelofsimilarityforeachsubjectoccurs whenallpulsesfitinonlyonecluster.Expectation maximi-zation(EM)andk-meansclusteringalgorithmsarethemost studied.Weka3.6.8frameworksoftwarewastheselectedtool touseduringtheclusteranalysis(Fig.1(c)).
2.5.1. k-Means
Thek-meansalgorithmisatypeofpartitionalclusteringthat continuouslyiteratesuntilaspecificcriterionfunction (usu-allythesquareerror)converges.Itacknowledgesthenumber ofdesiredclusterinputs(k)anddividesthesetofobjects(n) intokclusters.Theresultisahigherintra-clusterandlower inter-cluster similarity. The cluster similarity is measured considering the mean value of the objects contained in the cluster,which is,infact, the cluster centroid.k-Means
Fig.2–Arterialpressurewaveformsmeasurednon-invasivelyatcarotidartery.In(a)aTypeAcontourisrepresented, wherethepeaksystolicpressure(SP)occursinlatesystoleafteraninflectionpoint(Pi)thatresultedfromthereflection wave.IntheseconditionsAIxiscomputedasapositivevalue.In(b)aTypeCcontourisrepresented.NoticethatinaTypeC contourtheSPprecedesthePiandAIxiscomputedasanegativevalue.ThedicroticwaveisrepresentedbyDW.
Table1–Riskassessmenttools(10yearsterm).
Model Patients Countryoforigin Riskfactors
FRS 8491 USA Age,gender,Total-CH,CH-HDL,SBP,BPT,SMK
SCORE 205,178 Finland,Russia,Norway,Denmark,UK(England),UK
(Scotland),Sweden,Belgium,Germany,Italy,FranceandSpain
Age,gender,Total-CH,CH-HDL,SBP,andSMK
ASSIGN 13,297 Scotland Age,gender,FH,DB,SMK,SBP,Total-CHandCH-HDL
PROCAM 5389 Germany Age,CH-LDL,SMK,CH-HDL,SBP,PE,DB,andTGL
Total-CH,totalcholesterol;BPT,bloodpressuretreatment;FH,familyhistory;TGL,triglycerides;SMK,smokinghabits;DB,diabetes;PE,previous
event;SBP,systolicbloodpressure;CH-HDL,high-densitylipoprotein;CH-LDL,lowdensitylipoprotein.
clustering is relatively scalable and efficient in processing
large datasets. However, it cannot handle categorical
attributesanditisnotsuitablefordealingwithnon-convex
shapes.Also,itisquitesensitivetothepresenceofnoiseand
outliers.Itrequiresanefficientdatapre-processinganalysis
beforeitsapplication[23].
2.5.2. Expectationmaximization
Theexpectation maximization(EM) clustering algorithm is a complex probabilistic extension of the k-means method thatprimarilydiffersbythewayhowtheinitialgroupsare obtained.Insteadofassigningeachobjecttoacluster,with whichit ismostsimilar, EMassigns eachobject to a clus-teraccording toaweightthatrepresentstheprobability of membership.Inthismanner,therearenostrictboundaries betweenclusters,andnewmeansare determinedbasedon weightedmeasures[23].
3.
Results
3.1. Subjectcharacteristics
Theclinicalcharacteristicsofourstudypopulationareshown inTable2.ThemeanageofsubjectsinGroupHis58.10±11.64 years.Thebloodpressure valuesinthis groupareelevated (170.98±11.41mmHg and 100.29±9.05mmHg for SBP and DBP,respectively)relativelytonormalranges,130–139mmHg forSBPand85–89mmHgforDBP[3].TGLisalsoamarkerof increasedriskinthisgroupduetobehigherthanthe guide-linerecommendations(<150mg/dL)[3].Thesamesituationis verifiedforTotal-CHvalues, whichare alsosuperiortothe guidelinerecommendations(<190mg/dL).However,the CH-HDL is withinthe optimal limits, as it is higher than the minimumthresholdof40–45mg/dL.Finally,itisalso possi-bletoconcludethatthemeanbodymassindex(BMI)inGroup Hisalsosuperiortotheidealrecommendations(<25kg/m2)
[3].
Regarding Group C, the mean age is 21.19±2.28 years, and the pressure values are within normal ranges. Focus-ing on the APW differences between Group H and Group C,SWTT occurs later in the first group, where the arrival timeassumesthevalueof224.08±52.86ms,whileinGroup C, this is at 163.45±60.64ms. LVET arrival time is quite similar for both groups: 305.40±47.46ms in Group H and 281.99±53.06ms in Group C. The RWTT occurs earlier in GroupH(115.83±27.93ms)atlower amplitude (0.72±0.14), inoppositiontothevaluesinGroupC(159.29±43.47ms,at theamplitudeof0.87±0.09).GroupHisalsocharacterizedby
higherpositiveAIxvalues(25.69±17.14%),inoppositiontothe GroupC(0.35±15.69%).
3.2. Clusteringanalysis
TheclusteranalysiswasperformedforGroupCandGroupH, independently.Differentnomenclatureswereadoptedtohelp understandingthedataathand,numericlabels(1,2,...)for theGroupC andalphabetlabels(A,B,...)fortheGroupH. After,theclusteranalysis,theclusterswerecomparedusing theKruskal–Wallistest.
3.2.1. GroupC
Thefirstapproachconsistedintheselectionofthebest clus-teringalgorithmtodealwiththe setoffeaturesattest,as wellastheidealnumberofclusterstothegroup character-ization. Fig.3displaysRWTT and SWTTplotfor2-clusters analysisusing EM(a) andk-means(b)algorithms.Red and bluecolouredpointsrepresentthepulselabels(blue=Cluster 1,red=Cluster2).Categoricalfeatures(gender,smoker)were notconsideredduringtheanalysis,sincek-meansanalysisis notabletodealwiththesekindattributes.
ThefigureshowsthattheEMplothasanunsatisfactory divisionbetweentheclusters,withsomepointsfromCluster 2identifiedasbeingintheCluster1area.Visually,theresults fromthek-meansclusteringhaveamoreefficientseparation, asthedatasetispartitionedintwohomogeneousriskgroups. ForbothEMandk-means,theCluster1representsthepool ofhealthiersubjectswhencomparedtotheCluster2,sinceit representsthecaseswherethereflectionwavearrivesafterto thesystolicpeak.TheanalysisofCluster2alsoindicatesthe presenceofanothersub-cluster.Takingthisintoaccount,the
k-meansmethodwasusedtoexplorethedistributionofathird cluster,sinceitperformedwell,comparativelytotheEM,inthe twoclusterdistribution.Theobtainedresultsfora3cluster distributionarepresentedinFig.4.Thedetailedinformation aboutclustersdistributionispresentedinTable3.TheCluster 3(greenhomogeneouszone)ismostlyrepresentativeofType CAPWpulses,whereRWTT>SWTT.ThemeanAIxvalueof theclustercentroidisnegative(−11.2%),thusbeing consid-eredthelowerriskcluster.Cluster2predominantlyconsists ofTypeBpulses,withSWTT>RWTTandAIx>0,beingthus consideredtheintermediateriskgroup.Cluster1pulses rep-resentsthelesshomogeneousgroup,withsomepointsalso scatteredacrosstheCluster2area.Thesepulsesaremainly APWTypeApulses,withsomepunctualTypeBpulses.This groupevidencesahighercardiovascularriskcomparativelyto theCluster2andCluster3.
Table2–Characteristicsofsubjects.
Variable GroupH(mean±SD) GroupC(mean±SD) Units
Numberofsubjects 23 93 – Gender(M/F) 12/11 31/62 – Age 58.10±11.64 21.19±2.28 years SBP 170.98±11.41 108.26±11.88 mmHg DBP 100.29±9.05 69.54±7.64 mmHg BMI 27.87±5.23 21.62±2.63 kg/m2 Total-CH 205.10±25.42 – mg/dL CH-HDL 65.70±28.80 – mg/dL TGL 155.25±28.71 – mg/dL SWTT 224.08±52.86 163.45±60.64 ms RWTT 115.83±27.93 159.29±43.47 ms RWA 0.72±0.14 0.87±0.09 a.u.a AIx 25.69±17.14 0.35±15.69 % LVET 305.40±47.46 281.99±53.06 ms
SBP,systolicbloodpressure;DBP,diastolicbloodpressure;BMI,bodymassindex;CH-HDL,highdensitylipoproteincholesterol;Total-CH,total cholesterol;TGL,triglycerides;RWTT,reflectedwavetransittime;RWA,reflectionwaveamplitude;SWTT,systolicwavetransittime;LVET,left ventricularejectiontime;AIx,augmentationindex.
a Arbitraryamplitudeunits.
3.2.2. GroupH
Thek-meanswas alsousedinthe GroupHanalysis. Fig.5
depictsthe2-clusterdistribution,whereClusterAandCluster Bwerethelabelsadopted.
Fig.3–ClustersperformanceobtainedforEM(top)and
k-means(bottom)algorithmsusingtwoclusters,Cluster
1=blue,Cluster2=red.(Forinterpretationofthereferences tocolourinthisfigurelegend,thereaderisreferredtothe webversionofthearticle.)
Thecharacteristicsofeachoftheclustersarepresented in Table 4. Itcan be verified that the clusters do not dif-fer significantly according tothe SBP and DBP values.The mostsignificant parameters(exceptAPWparameters)were theCH-HDLandtheTGLvalues.ClusterAischaracterizedby higher TGLvaluesandlower CH-HDLlevels, characteristics ofsubjectsatrisk.Fromthewaveformanalysis,lowerRWTT, higher SWTTand,consequently,higherpositiveAIxvalues wereobtained.Thesuitablenumberofclusterswas consid-eredtobetwoasmaybeconfirmedbyvisualinspection.
Duringtheevaluation,thepulseinstanceswere indepen-dently groupedaccordingtotheirclustersimilarities,being themaximumlevelofsimilarityachieved,foreach subject, whenall thepulsesfitinasinglecluster.So,foreach sub-jectavaluerepresentingtheclusterrisk(aspercentage)was computed and used inthe correlationwithother available parameters(includingthestudiedscores).Sinceonlytwo clus-tersarestudied,atthispoint,thevaluesofClusterA(%)are symmetrictothevaluesofClusterB(%).TheClusterA(%) correlationsarepresentedinTable5.
Fig.4–RWTTandSWTTplotafterk-meansclusteringfor usingthreeclusters.Blue=Cluster1,red=Cluster2and Green=Cluster3.
Fig.5–Scatterplotsfor:(a)RWTTandRWA,(b)LVETandSWTT,wherethegreyandblackmarkersdenoteClusterAand ClusterB,respectively.
Table3–Averagevaluesforthe3-clustersgroups.
Attributes Cluster 1 2 3 Pulses 458 1550 2463 Age(year) 21.6 21.0 21.7 Weight(kg) 63.0 55.4 63.2 Height(m) 1.7 1.6 1.7 BMI(kg/m2) 21.3 20.8 21.8 SBP(mmHg) 109.9 106.1 108.6 DBP(mmHg) 69.6 70.3 68.9 HR(bpm) 72.8 67.5 72.9 SWTT(ms) 172.7 234.4 117.1 RWTT(ms) 103.0 143.2 179.9 LVET(ms) 240.3 306.4 274.4 SWA(a.u.a) 1.0 1.0 1.0 RWA(a.u.a) 0.8 0.9 0.9 DWA(a.u.a) 0.8 0.8 0.7 AIx(%) 21.0 12.6 −11.2
a Arbitraryamplitudeunits.
ThelipidicandtheBPvaluespresentlowsignificancewith theClusterA(%).Ontheotherhand,significantcorrelation valueswereobtainedbetweenClusterA(%)andPWA param-eters.Additionally, it wasobtainedasignificantcorrelation betweenClusterA(%)and theriskscores.Inthis casewas observedaverysignificantcorrelationwithASSIGN(r=0.582,
p<0.01),asignificantcorrelationwithFRS(r=0.458,p<0.05)
Table4–ClusterdistributionsforsubjectsinGroupH.
Attributes ClusterA ClusterB
SBP(mmHg) 169.68±10.38 172.56±13.59 DBP(mmHg) 101.22±7.03 102.00±7.14 Total-CH(mg/dL) 206.50±26.16 204.01±13.69 CH-HDL(mg/dL) 59.35±25.07 66.33±23.31 TGL(mg/dL) 160.41±44.95 142.94±40.96 RWTT(ms) 99.90±23.62 140.32±42.76 RWA(a.u.a) 0.60±0.11 0.90±0.06 SWTT(ms) 257.34±35.76 185.34±40.70 LVET(ms) 326.24±39.43 272.16±44.17 AIx(%) 40.01±11.15 6.2±9.67 Clusteredinstances 723(52%) 664(48%)
a Arbitraryamplitudeunits.
and,weakercorrelationswithSCORE(r=0.275,p=0.241)and PROCAM(r=0.391,p=0.088).Theseresultsaregoodindicators fortheuseofthismethodologyasatoolforthecardiovascular riskassessment.
Fig.6showstheClusterA(%)distribution(consideringa thresholdofmorethan50%)foreach ofthescores. Target-ing databyCluster A(threshold >50%),obtainedamedian riskbytheSCOREfunctionof2.0%peryear,FRSfunctionof 13.0%peryear,ASSIGNfunctionof16.0%peryearandbythe PROCAMfunctionof20.7%peryear.FortheFRS,ASSIGNand PROCAM scores, these valuescorrespond to the borderline betweenhighriskandanintermediaterisk[32,33].However, for SCORE, the observedmedian value correspondsto low risk,beingSCOREthelesscorrelatedfunctionwiththecluster results.
3.2.3. Comparisonofgroups
Theevaluationwasperformedbythecomparisonofthe clus-tersinGroupH(ClusterAandClusterB)withclustersinGroup C (Cluster1, Cluster2and Cluster3). TheClusterAis the cluster athigher riskinthe hypertensive groups,as previ-ouslydiscussed.Since,thereisnomedicalinformationabout the riskassociatedto GroupC (wehaveonlyinformations
Table5–Spearman’scorrelationcoefficientsobtained forClusterA(%). Attributes ClusterA(%) SBP −0.131 DBP −0.080 Total-CH 0.011 CH-HDL −0.177 TGL 0.017 AIx 0.921** SWTT 0.649** RWTT −0.632** RWA −0.933** LVET 0.574** FRS 0.458* SCORE 0.275 ASSIGN 0.582** PROCAM 0.391 ∗ Significantlevelatp<0.05. ∗∗ Significantlevelatp<0.01.
Table6–ComparisonofclustersfromGroupH(ClusterAandClusterB)andGroupC(Cluster1,Cluster2andCluster3) usingKruskal–Wallistest,measuredthrough2value.
GroupH GroupC Parameters
RWTT RWA SWTT RWTT LVET AIx
ClusterA Cluster1 38.66** 402.42** 526.65** 38.66** 369.23** 402.40** Cluster2 1149.82** 1399.96** 248.87** 1149.82** 154.81** 1399.97** Cluster3 1454.05** 1545.50** 1619.16** 1454.05** 706.65** 1638.37** ClusterB Cluster1 536.38** 458.25** 0.02 536.38** 4.79 538.34** Cluster2 265.77** 3.16** 892.34** 265.77** 598.03** 49.97** Cluster3 593.88** 4.06** 998.62** 593.88** 60.22** 1158.33** ∗∗ Significantlevelatp<0.01.
SCORE FRS ASSIGN PROCAM
0 5 10 15 20 25 30 35
Identification by Cluster A distribution (>50%)
CHD risk (%)
Fig.6–BoxplotsofCHDriskdistributionforSCORE,FRES, ASSIGNandPROCAM.Thehorizontallinesrepresentthe medians,theboxesrepresenttheinterquartileranges(50% ofthedistribution)andthewhiskersrepresenttherangeof valuesobtainedforsubjectsfromGroupH.
concerningtheAPWparametrizations),eachoneofclusters from GroupC was comparedto theGroup H clusters. The comparisonwasperformedusingthechi-squared(2)value,
obtainedfromtheKruskal–Wallistest,asshowninTable6. Therearesignificantdifferencesforthemajorityof clus-tersinGroupHandGroupC,asexpectedduetothedistinct populationcharacteristicsofeachgroup.However,some sim-ilaritieswerefoundbetweentheClusterB(thescoreatlower riskinGroupH)and theCluster 1(belonging totheGroup C),namely:forLVET(2=4.79,p<0.01)andSWTT(2=0.02,
p<0.01)measures.FromtheanalysisofTable3,itispossible toconcludethatthisistheclusterathigherriskinGroupC. However,it isnotpossibletoconcludethattherisk associ-atediseffectivelyriskassociatedtothedevelopmentofCVD. Thisconclusionisonlypossiblefromthecomparisonwiththe clustersofthehypertensivegroup(GroupH).And,the simi-laritiesareevidentfortheclusterinanalysis(Cluster1)and thecluster atlower riskinthehypertensivegroup (Cluster B),leadingtoassumethatthesubjectsbelongingtothe Clus-ter1needmedicaladvice,andthattheymaybedeveloping CVD.Itwould notbetrustable, ifthe similaritiesoccur for
thegroupofsubjectsinadvancedstageofdisease(ClusterA). Thisconclusionsupportsthatispossibletoscreenahealthy population(usingonlywaveformparameters)concerningthe cardiovascularriskusingclusteringmethodologies.
4.
Discussion
Inthispaperanewapproachtomorphologicalpulseanalysis ispresented,andaninnovativemethodologyto cardiovascu-larriskassessment,takingasreferencetheCHDriskscores, isapplied.
Theinformationthatisextractedfromtheclustering anal-ysiscan becrucialtofully understand ofthedata, mainly whenthereisno,orlittle,availableinformation.Itwas ver-ified,that Total-CHand TGLvaluesare intrinsicallyrelated totheAPWvariables,andthananincreaseoftheselevelsis associatedtotheRWTTdecreaseandSWTTincrease.Onthe otherhand,thehigherCH-HDLvaluesareassociatedtothe increaseofRWTTanddecreaseofSWTTvalues.Significant correlationsfortheclusteroutputwiththeASSIGN(r=0.582,
p<0.01)andwithFRS(r=0.458,p<0.05)werealsoverified. Thismethodisparticularly interesting,consideringthat thisapproachcanavoidtherequirementsoftheclassification procedureswhichoftenrequirecostlylabellingofalargesetof patterns,suchasbiochemicalanalysisusedbytraditionalrisk scores,asdemonstratedontheclusteranalysisofsubjectsin GroupC.ItwaspossibletoidentifythesimilaritieswithGroup Hclusters,usingonlymorphologicalparameters.The possi-bilitytoemulatetheperformanceofthetraditionalriskscores by the use ofonlynon-invasive parameters, which canbe easilyobtainedusingaPWAtechnique,isanefficient alterna-tiveapproachtostudylargedatasets.Inthissense,clustering analysisisaneasy methodforlargepopulationsscreening producingvaluableknowledgeforposteriorprioritizingpeople forpharmacologicalmeasures.
Thehealthcarepublicandprivatesystemfacesthe chal-lengeofanincreasinglyageingpopulationandtheescalation ofmedicalcosts. Therefore,thedevelopmentoftechniques that allow for earlier identification of subjects at risk, by the incorporation of personalized, predictive and pre-ventive methodologies, could bring an interesting impact comparatively to the traditional cardiovascular tools used forriskprediction.Itisevidenttheforeseeableimpactthat an accurate, non-invasive and easy-to-use instrument for haemodynamic condition assessmentcould impart on the
diagnosis and follow-up of the CVD. However, a similar methodologycouldbeappliedtoothermeasurements(e.g., invasivepressures,electrocardiogram)usedintraditional clin-icalpathofcardiovascularpatients.
5.
Conclusions
Thispaperdemonstratestheutilityofclusteringtechniques in risk scoring when applied to a medical dataset. When comparedwiththetraditionalriskscores,clustering method-ologiesshowedgoodperformancewithsignificantcorrelation values.Thisisasimple,yetreliabletool,thatcanbeusedin scoringtrials.
As this approach needs somedevelopments, a compu-tationaltoolto integrate theseresults withother machine learningtechniquessuchasclassificationalgorithms,is cur-rently being developed [15]. A larger sample will also be studiedin future trials forthe stratification bymedication use,age,diabetesduration,and/orgender.Giventhese consid-erations,thisapplicationcanbeaninterestingmethodology tobeusedinfurtherclinicalstudiestopulsemorphological analysis. Itcan alsobepotentially useful in other medical applications,suchasintheanaesthesiologyroom,wherethe numberofparameters anddevices are high,asatoolthat incorporatesseveralimportantinformations.
Acknowledgements
The authors acknowledge the support of the Fundac¸ ˜ao paraaCi ˆencia eTecnologia throughthescholarship (Grant number SFRH/BD/61356/2009) and project (Grant number PTDC/SAU-BEB/100659/2008).Thisprojectwasalsofundedby UE/FEDERthroughCOMPETE-ProgramaOperacionalFactores deCompetitividade.TheauthorsalsothanktoISA-Intelligent SensingAnywhereandHospitalandUniversityCoimbra Cen-ter(C.H.U.C.)forthecollaborationinclinicaltrials.
r
e
f
e
r
e
n
c
e
s
[1] W.H.Organization,TheGlobalBurdenofDisease:2004 Update,WorldHealthOrganization,2004.
[2] R.E.W.Kavey,S.R.Daniels,R.M.Lauer,D.L.A.Atkins,L.L. Hayman,K.Taubert,Americanheartassociationguidelines forprimarypreventionofatheroscleroticcardiovascular diseasebeginninginchildhood,Circulation107(11)(2003) 1562–1566.
[3] J.Perk,G.DeBacker,H.Gohlke,I.Graham,Z.Reiner,M. Verschuren,C.Albus,P.Benlian,G.Boysen,R.Cifkova,C. Deaton,S.Ebrahim,M.Fisher,G.Germano,R.Hobbs,A. Hoes,S.Karadeniz,A.Mezzani,E.Prescott,L.Ryden,M. Scherer,M.Syvanne,W.J.ScholteopReimer,C.Vrints,D. Wood,J.L.Zamorano,F.Zannad,Europeanguidelineson cardiovasculardiseasepreventioninclinicalpractice (version2012).ThefifthjointtaskforceoftheEuropean societyofcardiologyandothersocietiesoncardiovascular diseasepreventioninclinicalpractice(constitutedby representativesofninesocietiesandbyinvitedexperts). DevelopedwiththespecialcontributionoftheEuropean associationforcardiovascularprevention&rehabilitation (EACPR),Eur.HeartJ.33(13)(2012)1635–1701.
[4] A.Beswick,P.Brindle,Riskscoringintheassessmentof cardiovascularrisk,Curr.Opin.Lipidol.17(2006)375–386.
[5] M.D.Whitfield,M.Gillett,M.Holmes,E.Ogden,Predicting theimpactofpopulationlevelriskreductionin
cardio-vasculardiseaseandstrokeonacutehospital admissionratesovera5yearperiod–apilotstudy,Public Health120(12)(2006)1140–1148.
[6] M.T.Cooney,A.L.Dudina,I.M.Graham,Valueand limitationsofexistingscoresfortheassessmentof cardiovascularrisk:areviewforclinicians,J.Am.Coll. Cardiol.54(14)(2009)1209–1227.
[7] G.C.M.Siontis,I.Tzoulaki,K.C.Siontis,J.P.A.Ioannidis, Comparisonsofestablishedriskpredictionmodelsfor cardiovasculardisease:systematicreview,BMJ344(2012),
http://dx.doi.org/10.1136/bmj.e3318.
[8] I.U.Haq,L.E.Ramsay,W.Yeo,P.R.Jackson,E.J.Wallis,Isthe FraminghamriskfunctionvalidfornorthernEuropean populations?Acomparisonofmethodsforestimating absolutecoronaryriskinhighriskmen,HeartVessels81 (1999)40–46.
[9] G.M.London,A.P.Guerin,Influenceofarterialpulseand reflectedwavesonbloodpressureandcardiacfunction,Am. HeartJ.138(3)(1999)220–223.
[10] J.D.Cameron,B.P.McGrath,A.M.Dart,Useofradialartery applanationtonometryandageneralizedtransferfunction todetermineaorticpressureaugmentationinsubjectswith treatedhypertension,J.Am.Coll.Cardiol.32(5)(1998) 1214–1220.
[11] S.Laurent,J.Cockcroft,L.V.Bortel,P.Boutouyrie,C. Giannattasio,D.Hayoz,B.Pannier,C.Vlachopoulos,I. Wilkinson,H.Struijker-Boudier,Expertconsensusdocument onarterialstiffnessmethodologicalissuesandclinical applications,Eur.HeartJ.27(21)(2006)2588–2605.
[12] I.Ikonomidis,S.Tzortzis,T.Papaioannou,A.Protogerou,K. Stamatelopoulos,C.Papamichael,N.Zakopoulos,J.Lekakis, Incrementalvalueofarterialwavereflectionsinthe determinationofleftventriculardiastolicdysfunctionin untreatedpatientswithessentialhypertension,J.Hum. Hypertens.22(10)(2008)687–698.
[13] A.P.Avolio,M.Butlin,A.Walsh,Arterialbloodpressure measurementandpulsewaveanalysis–theirrolein enhancingcardiovascularassessment,Physiol.Meas.31(1) (2010)R1–R47.
[14] B.Hametner,S.Wassertheurer,J.Kropf,C.Mayer,A. Holzinger,B.Eber,T.Weber,Wavereflectionquantification basedonpressurewaveformsalone–methods,comparison, andclinicalcovariates,Comput.MethodsProgramsBiomed. 109(3)(2013)250–259.
[15] V.Almeida,J.Vieira,P.Santos,T.Pereira,H.Pereira,C. Correia,M.Pego,J.Cardoso,Machinelearningtechniquesfor arterialpressurewaveformanalysis,J.Personal.Med.3(2) (2013)82–101.
[16] I.Yoo,P.Alafaireet,M.Marinov,K.PenaHernandez,R. Gopidi,J.Chang,L.Hua,Datamininginhealthcareand biomedicine:asurveyoftheliterature,J.Med.Syst.36(4) (2012)2431–2448.
[17] K.B.Wagholikar,V.Sundararajan,A.W.Deshpande,Modeling paradigmsformedicaldiagnosticdecisionsupport:asurvey andfuturedirections,J.Med.Syst.36(5)(2012)3029– 3049.
[18] M.G.Tsipouras,T.P.Exarchos,D.I.Fotiadis,A.Kotsia,K. Vakalis,K.K.Naka,L.K.Michalis,Automateddiagnosisof coronaryarterydiseasebasedondataminingandfuzzy modeling,IEEETrans.Inform.Technol.Biomed.12(4)(2008) 447–458.
[19] S.Paredes,T.Rocha,P.deCarvalho,J.Henriques,M.Harris,J. Morais,Longtermcardiovascularriskmodels’combination, Comput.MethodsProgramBiomed.101(3)(2011)231–242.
[20] A.Jovic,N.Bogunovic,Electrocardiogramanalysisusinga combinationofstatistical,geometric,andnonlinearheart ratevariabilityfeatures,Artif.Intell.Med.51(2011)175–186.
[21] L.I.Kuncheva,J.C.Bezdek,R.P.Duin,Decisiontemplatesfor multipleclassifierfusion:anexperimentalcomparison, PatternRecogn.Lett.34(2)(2001)299–314.
[22] D.Ruta,B.Gabrys,Anoverviewofclassifierfusionmethods, Comput.Inform.Syst.7(2000)1–10.
[23] J.Han,M.Kamber,DataMining.ConceptsandTechniques, Elsevier,SanFrancisco,CA,2006.
[24] V.Melnykov,G.Shen,Clusteringthroughempirical likelihoodratio,Comput.Stat.DataAnal.62(2013)1–10.
[25] A.S.Shah,L.M.Dolan,Z.Gao,T.R.Kimball,E.M.Urbina, Clusteringofriskfactors:asimplemethodofdetecting cardiovasculardiseaseinyouth,Pediatrics127(2)(2011) e312–e318.
[26] H.H.Haseena,A.T.Mathew,J.K.Paul,Fuzzyclustered probabilisticandmultilayeredfeedforwardneural networksforelectrocardiogramarrhythmiaclassification,J. Med.Syst.35(2)(2011)179–188.
[27] J.Namayanja,V.P.Janeja,Anassessmentofpatientbehavior overtime-periods:acasestudyofmanagingtype2diabetes throughbloodglucosereadingsandinsulindoses,J.Med. Syst.36(S1)(2012)65–80.
[28] V.G.Almeida,H.C.Pereira,T.Pereira,E.Figueiras,E.Borges, J.M.R.Cardoso,C.Correia,Piezoelectricprobeforpressure waveformestimationinflexibletubesanditsapplicationsto thecardiovascularsystem,Sens.ActuatorsA169(2011) 217–226.
[29] V.Almeida,H.Pereira,T.Pereira,L.Ferreira,C.Correia,J. Cardoso,Assessmentofthepulsewavevariabilityforanew non-invasivedevice,in:Y.-T.Zhang(Ed.),TheInternational ConferenceonHealthInformatics,IFMBEProceedings,vol. 42,SpringerInternationalPublishing,Vilamoura,Portugal, 2014,pp.240–243.
[30] M.Frimodt-Moller,A.H.Nielsen,A.L.Kamper,S. Strandgaard,Reproducibilityofpulse-waveanalysisand pulse-wavevelocitydeterminationinchronickidney disease,Nephrol.Dial.Transpl.23(2)(2008)594–600.
[31] J.Murgo,N.Westerhof,J.P.Giolma,S.Altobelli,Aorticinput impedanceinnormalman:relationshiptopressurewave forms,circulation,Circulation62(1980)105–116.
[32] M.Versteylen,I.Joosen,L.Shaw,J.Narula,L.Hofstra, ComparisonofFramingham,PROCAM,SCORE,anddiamond forestertopredictcoronaryatherosclerosisand
cardiovascularevents,J.Nucl.Cardiol.18(5)(2011)904–911.