• Nenhum resultado encontrado

Tensor-based anomaly detection: An interdisciplinary survey Knowle dge-Base d Systems

N/A
N/A
Protected

Academic year: 2023

Share "Tensor-based anomaly detection: An interdisciplinary survey Knowle dge-Base d Systems"

Copied!
18
0
0

Texto

(1)

ContentslistsavailableatScienceDirect

Knowle dge-Base d Systems

journalhomepage:www.elsevier.com/locate/knosys

Tensor-based anomaly detection: An interdisciplinary survey

Hadi Fanaee-T

a,

, João Gama

b

aLaboratory of Artificial Intelligence and Decision Support/ INESC TEC and FCUP/University of Porto, Rua Dr. Roberto Frias, Porto 4200-465, Portugal

bLaboratory of Artificial Intelligence and Decision Support/ INESC TEC and FEP/University of Porto, Rua Dr. Roberto Frias, Porto 4200-465, Portugal

a rt i c l e i nf o

Article history:

Received 12 October 2015 Revised 18 January 2016 Accepted 20 January 2016 Available online 8 February 2016 Keywords:

Anomaly detection Tensor analysis Multiway data Tensor decomposition Tensorial learning

a b s t ra c t

Traditionalspectral-basedmethodssuchasPCAarepopularforanomalydetectioninavarietyofprob- lemsanddomains.However,ifdataincludestensor(multiway)structure(e.g.space-time-measurements), somemeaningfulanomaliesmayremaininvisiblewiththesemethods.Althoughtensor-basedanomaly detection(TAD) has been appliedwithin avariety ofdisciplinesover thelasttwenty years,it is not yetrecognizedasaformalcategoryinanomalydetection.Thissurveyaimstohighlightthepotentialof tensor-basedtechniquesasanovelapproachfordetectionandidentificationofabnormalitiesandfailures.

WesurveytheinterdisciplinaryworksinwhichTADisreportedandcharacterizethelearningstrategies, methodsandapplications;extracttheimportantopenissuesinTADandprovidethecorrespondingex- istingsolutionsaccordingtothestate-of-the-art.

© 2016ElsevierB.V.Allrightsreserved.

1. Introduction

Thosepatternsindatathatdonotconformtoexpectedbehav- iorarecalledanomaliesandtheprocessofdetectionofsuch pat- ternsisknown asanomalydetection[1].Anomaly detectionisan essential componentof manysafety, monitoring andsurveillance systems.Thereasonisthatituncoverssignificantandcriticalfacts aboutthesystem’sbehaviorthatleadstopreventionoffurtheres- calationandlosses. Plenty ofmethods havebeendeveloped dur- ing the last two decades for anomaly detection in different do- mains,themajorityofwhicharecoveredinthesurveypaper[1]. Onegroupofmethodsthatismentionedinthissurveyisspectral methods. These approaches attempt to project high dimensional dataontoa lowersubspace inwhichanomalies can beidentified moreeasily.Themainassumptionofthesetechniquesisthatnor- mal andabnormal instances appear significantly different in the projectedsubspace [1]. However, inmanyreal-worldapplications wedeal withdatawithtensor(multiway)structure whichunfor- tunatelyis widelyignored. Insuch circumstances,anomalies may remaininvisiblewiththematrix-basedspectralmethods.Besides, ignoringthetensorstructureindatacancausesomeproblemsand resultinwrongresults.Asanexamplesomerealfailurecasestud- ies ofmatrix-based solutions andsuperiority oftensor-based so- lutions over them are listed in Table 1 which can manifest how muchtensorsarerequiredforanomalydetection.

Corresponding author. Tel.: +351 222 094 0 0 0.

E-mail address: [email protected] (H. Fanaee-T).

Although authors in [1] discuss the matrix methods in their survey,theyexcludetensorsandtheirapplicationsinanomalyde- tection. This is while over the last twenty years, since the work ofNomikos andMacGregor[12],research relatedto tensor-based anomalydetection(TAD)hasbeenexponentiallygrowing.Further- more,manymethods havebeendeveloped inmultipledisciplines from chemometrics andenvironmental monitoring to signal pro- cessing and data mining. Despite the popularity of thisresearch area(thoughwithdifferentterminologies),nocomprehensivesur- veyonTADisyetavailable. Themostprobablereasonis thatthe TAD belongs to wide scopes and spans across differentresearch fields.

Our main objective in this survey is to bridge the gap be- tween two popularresearch areasof anomaly detectionandten- sors.Westudytheliteraturefromallmajordisciplineswhereten- sors are frequently applied and classify the contributions related toTADbasedonsomefactorssuchasapplications,learningtypes, methods andevaluation metrics.Moreover, we identify andclas- sify theimportantissues andproposed solutionsinTADresearch.

Wefollowamotivationalstrategyinthissurvey,inthesense that we do not limit ourselves introducing only techniques that are already applied for anomaly detection. Rather, we include those methods that are used in theclose applications,such asclassifi- cation,regressionandforecastingthatmayshow agreatpotential foranomaly detection.Therefore,thissurveycanberegardedasa comprehensivecomplementforSection9of[1]andfromtheten- sor point ofview it can be considered asa focused complement forapplicationsoftensorsindatamining,i.e.surveypaperin[13]. Ourassumptionis that thereaderisfamiliarwithbasicconcepts http://dx.doi.org/10.1016/j.knosys.2016.01.027

0950-7051/© 2016 Elsevier B.V. All rights reserved.

(2)

Table 1

Some empirical evidences in the literature indicating the superiority of tensor-based solutions over matrix solutions.

Study Tensor method Matrix method Matrix method’s reported problem

[2] Tucker3 PCA Difficult interpretation of score plots

[3] PARAFAC PCA Difficult interpretation of score plots

[4] Non-negative Multiway PCA PCA Lower classification accuracy [5] Incremental Tensor subpsace learning PCA Lower tracking performance

[6] Multiway PCA PCA Higher error rate in damage detection

[7] Multiway PCA PCA Lower recognition accuracy

[8] HOSVD SVD Higher prediction error

[9] Tucker3 SVD SVD fails on modeling tensor structured data

[10] PARAFAC PCA Loss of multiway linkages plus over-fitting

[11] PARAFAC PCA PCA fails to identify the right variance

Table 2

Tensor-based anomaly detection examples.

Domain Typical tensor Application Ref.

Process control Batch ×Measurements ×Time Detection of faulty batches [12]

Environment Variables ×Site ×Time Detection of spatiotemporal source of pollution [18]

Video surveillance ImgRow ×ImgCol ×Time Abnormal event/objects discovery [19]

Network security OriginIP ×DestIP ×Time Abnormal traffic discovery [16]

Social networks Person ×Person ×Time Event detection [20]

Text-based systems Actor ×Keyword ×Time Event detection [21]

Neuroscience Frequency ×Channel ×Time Seizure recognition [22]

Remote sensing ImgRow ×ImgCol ×Wavelength Target detection [23]

Sensors Measurements ×Location ×Time Anomaly detection [15]

Transportation Origin ×Destination ×Time Detection of urban traffic problems [24]

Metallurgy Eng. Coils ×PSD ×Frequency Fault detection in hot strip mill [25]

Civil structures Location ×Time ×Frequency Detection of damages in civil structures [26]

Mechanical systems Experiment ×Sensor ×Time Damage detection in aircraft wing flap [6]

Power systems Experiments ×Variables ×Time Detection of voltage sags [27]

Medical diagnosis Medication ×Patient ×Diagnosis Heart failure prediction [28]

Epidemiology Space ×Time ×Indicators Disease outbreak prediction [29]

Seismology Location ×Time ×Frequency Predicting earthquake ground motion [30]

Criminology Lng ×Lat ×Time ×Indicators Crime occurrence forecasting [31]

inanomalydetectionandtensordecomposition(ortensoriallearn- ing). Forthis reason, we omit explanation ofthe straightforward concepts related totensor decomposition,anomaly detection and spectral-based anomaly detection.Instead, we referthe readerto therecentsurveysaboutanomalydetection[1]andtensordecom- position[13,14]thatadequatelycoveressentialtechnicalmaterials forunderstandingthecurrentreview.

The articleis organizedasfollows.In Section 2,we introduce the historyofTAD andits applications.Section 3 presentslearn- ing methods forTAD. Section 4 discussesthe techniquesforten- sordecomposition.Section5outlinestheissuesinTADalongwith the corresponding solutions. In Section 6 we discuss the evalua- tionmetrics usedinTADandintroducetheavailable softwarefor tensoranalysis.Section7concludesthesurvey.

2. Historyandapplications

Atensorisageometricobjectusedinmathematicsandphysics forextension ofconcepts suchasscalars, vectorsandmatricesto higherdimensions.Theoriginoftheword”tensor” istheLatinten- dere ”to stretch” firstly appeared in anatomy in the seventeenth century to denote muscle’s stretch. It was later used in mid- eighteenth-century by William Hamilton to describe some con- cepts in quaternion algebra. Tensor calculus, which comes closer tothe word’scurrentmeaning,wasintroduced in1900 byItalian mathematician Gregorio Ricci-Curbastro andhis doctoral student Tullio Levi-Civita. In 1915, tensor wasused by Albert Einstein in generalrelativitytheoryforexplaininggeometricandcausalstruc- tureofspace-timeanddefinitionofconceptssuchasdistance,vol- ume, curvature,angle,futureandpast.Thefirstprinciplesoften- sordecomposition[14]werefoundedbyAmericanmathematician Frank Hitchcockin1927. Complex andmultiway structureof hu-

manbehaviorswasprobablythefirstmotivationforuseoftensors in data analysis. Psychologists such as Raymond Cattell, Ledyard Tucker and Richard Harshman were pioneers in extending ten- sordecompositionapplicationsinpsychologyduringthreedecades from1940sto1970s.In1981,tensordecompositionwasintroduced by Appellof and Davidson to the Chemometrics community. The firstapplicationsoftensorsinanomaly detectionappearedinthis communityalmostadecadelater. TheworkofNomikosandMac- Gregor [12] about multi-way batch monitoring was a pioneer in motivatingtensor(multiway)methodsinthemonitoringandfault detectionproblems.Themodernapplicationoftensorsinanomaly detection appeared a decade ago in a series of articles from Ji- mengSunandcolleagues[15–17]whohadamajorcontributionto thegrowthofTADresearch.Nowadays,TAD’sapplicationhasbeen widespread in wider areas, including environmental monitoring, video surveillance, network security, social networks, text-based systems,neuroscience,remote sensing,engineeringandother do- mains. Inthe following,some oftheseapplications are discussed inmoredetail(SeeTable2forsummary).

2.1. Processcontrol

Thefirstfootprintoftensor(multiway)methodsasearliermen- tionedcanbeseeninthemonitoringofbatchprocesses.Thecom- mon objective in operating batch processes is to achieve value- addedproducts ofhigh-quality withcompetitive prices. The goal ofthebatchprocessanalysisistounderstandthemajorsourcesof batch-to-batchvariations[12],real-timedetectionoffaultybatches anduseittoimprovetheoperationpolicies.

Tensorsare very popularmonitoring techniquesin production ofchemicals andother manufacturing applications. Examplesare polymerization processes [32–35],semiconductor etching process

(3)

[36–38],manufacturingpharmaceuticalmaterials[39,40],wastew- atertreatment[41],bioprocesses[42],fed-batchfermentationpro- cess[40,43–45], nuclear waste storage tank monitoring [46] and winemakingprocess[47].

In the majority of these applications, the typical tensor is a three-order tensor of I (batch) × J (measurement) × K (time) whichusuallyis unfoldedinbatch ortime mode.Therefore,usu- allythe matrix of I × JK or J × IK is processed which is called respectivelybatch-wiseandtime-wiseunfolded matrix.Themain goaloftensor-based batch processingisto identifythe abnormal batchesortimeinstants.

2.2.Environmentalmonitoring

Thanks to recent advances in sensor technologies, it is feasi- bletoanalyze tensofecological parametersthrough differentlo- cations and times. The need for tensor analysis has emerged in thisdomain, mainly due to existing spatiotemporal variations in suchdata.Identificationoflocationsortimeperiodsrelatedtoab- normalmeasurement isthemaingoalofthisapplication. Tensors haverecentlybeenappliedinwaterqualitymonitoring[2,48–52], airpollutioncontrol[18,53]andmonitoringofsoilquality[54,55]. The multi-way data in these applications follows a general schemeofvariables×samplingsite×samplingtimewherethefirst dimensionnormallyincludesthechemical(e.g.oxygenrate),phys- ical (e.g. temperature) and biological parameters (e.g. faecal col- iforms)measuredbythesensors.

2.3.Videosurveillance

Identification oftime instantsinvideo surveillancecamerasis of great interest in public security for the prevention of terror- ism/crimeactivities.Tensorsarenaturaldatamodelsforvideodata andthereforecanprovidemoreaccurateframework forabnormal activitydiscovery.Videodatacanberepresentedasa4Dtensorof RGBcolor×imagerow×imagecolumn×timeora3Dtensorofim- agerow× imagecolumn×time.Themostrelevantworksthat use tensormodelforanomaly detection are[19,56] which apply TAD invideo surveillancecameras.[57]alsomodel3D videoastensor forhumanactionrecognition.Atensor-basedapproachisproposed in[58]forreal-timetrackingofmovingpointsfrominfraredimage sequences.Someother works[5,59,60]alsousetensorsforobject tracking in video data so that these works are versatile enough tobeadaptedforanomaly detectionpurposes.Somemethodslike [61,62]exploit tensors respectively, for crowd densityestimation andmotionrecognition that canbe usefulfor anomaly detection aswell.

2.4.Networksecurity

Computer-based systems are at risk fromvarious attacks and maliciousactivities.Anomalydetectioninthesenetworkshasbeen forlongyearsthecenterofattentionbymanyresearchers.Tensors are powerfultools for anomaly detection inthese networks. The reasonisthatatensorcaneasilymodelthedynamicoftrafficma- tricesthatrequiresextradimensionoftime.Moreover,innetwork securityapplicationitisverydifficulttoobtainlabelsforabnormal situations.Usuallyonlythehistoryofnormaloperationisavailable.

Thereforesemi-supervised andunsupervised toolssuch astensor decompositioncanbeadequatetools.

There is no unique tensor data structure for analysis of net- work data.The majority of worksuse the origin × destination × timescheme.Thisformatisusedforanalyzingawiderangeofnet- workdatasuch asTCP/IPnetwork, emails,phonecalls,IP-TV and WorldWideWeb(WWW).Forinstance,inTCP/IPnetwork,thetwo

mostpopularmodelsusedareSourceIP×TargetIP×Time[16,63–

65,65,66]andSourceIP× TargetIP× Port×Time[16,20].Inemail orphone callnetworksthe tensormodels are constructed inthe scheme of Sender × Recipient × Time [16,63,67–69]. There exists anothertypeofworksthatmodeltheinteractionofuserwiththe system. Examples are IP × URL × User × Time [70] andUsers × URL×Time[71]inweb-accesslogdataandUser×TVProgram× Timein IP-TV system [72]. Anomaly detection fromInternet net- worksare also addressedin [73]. The authorspropose a method based on tensor decomposition for finding the source of distur- bancesoriginatedinthenetworkelementsinalargeInternetnet- work.Athree-ordertensormodelofVP×AS×timeisintroduced where VPdenotes the vantage point andAS refers to a network element. Thebuiltmodel isthen usedto tracklarge earthquakes occurredduringthenetworkactivity.

2.5. Socialnetworks

Social networksare a special case ofgeneral networkswhere nodes of networks are mostly liveagents (e.g. humans) and the edges show the interactionof theseagents.Tensorsare normally used forthedetectionof anomalous people,linksandcommuni- tieswhichisobtainedby takingintoaccount their longtermbe- haviorovertime.Thegeneraltensormodelforthistaskisperson× person×time.Oneofthepopulartensor-basedpracticesisrelated totheanalysisofanomalies inelectronic discussionnetworkdata setsuchasENRON[16,17,67–69,74,75].Tensorsareusedinanalysis ofFacebookdata[20],phonecalls [63],location-based socialnet- works(user× location×time)[76]andanalysisofphysicalsocial networks such asface-to-face contacts of individuals [77]. Apart fromthetraditionalmodel,[78]proposednewtensormodelssuch asnodes× measures × timeandcommunities × measures × time fordynamicsocialnetworkswheremeasuressuchasbetweenness and degree closeness are computed fromsocial network in each snapshot.

2.6. Text-basedsystems

Tensorsareusedformodelingtheuser/topicevolutionintext- basedsystems.The constructedmodels arelater appliedtoevent andanomalydetectionorco-clustering.The generaltensormodel fortextual dataanalysisisuser ×keyword ×time.Such model is usedforanomaly detectionfromTwitterdata[21]andanalysisof chatrooms [9] andbibliographic data (author × keyword × time) [16,17,68].Thetensor-basedtopicmodelingtechniquessuchas[71]

alsoshowpotentialregardingtext-basedeventdetection.

2.7. Neuroscience

The brainisone of thecomplex systemsthat produces a rich source of multiway data. The reason is that every occurring ac- tivity in the brain is managed via different regions of the brain duringaspecificperiodoftime.Therefore,braindataisinherently spatiotemporal.Thetwowell-knowntoolsforcapturingthebrain’s activitiesinamachine-readableformatareElectroencephalography (EEG) signals or Functional Magnetic Resonance Imaging (fMRI).

The data being generated from these tools is analyzed via ten- sormodels to detectabnormalactivities orpatterns inthebrain.

Forinstance, tensors are usedto find theresponsible regions for generating the abnormal neural activity resulting in the initial seizuredischarge[22,79].Theinformationobtainedfromthisanal- ysisisveryhelpfulforthesuccessofanepilepsysurgery.Different fromtheabove-mentioned application,tensors areused formen- talworkloadmonitoringofoperators insafety-criticalapplications (e.g.controllingtheUnmannedAirVehicle(UAV)[80]).

(4)

The general tensor model for EEG data is frequency × chan- nel × time [22,79–82]. If measurements are recorded across dif- ferent subjects or conditions, extra dimensions can be added to the simplemodel. Thesekindof higher-orderdata structures are mostly used for classification purposes. For instance, in [83,84], multi-subjectEEGdataismodeledasafourth-ordertensoroffre- quency × channel × time × subject. Likewise, EEG data is mod- eled asa fifth-order tensorfrequency × channel × time× subject

×condition[82].Notethatthetensormodelsdoesnotoperatedi- rectly on EEG raw signals, instead, a preprocessing step (usually via wavelettransformation) isrequiredto transformtherawEEG signalstoEEGtensors[82].

Tensors are also applied to fMRI data analysis. fMRI images can be used to detect brainregions that have been damaged by variousneurodegenerativediseasessuch asAlzheimerandParkin- son. AtypicalfMRIscan image maycontain 64×64× 14voxels (3Dequivalentofpixels)sampledatdifferentconsecutivetimein- stants,producingasinglematrix.Multiplescansonagivensubject generateahigher-ordertensorofvoxel×time×runswhichisusu- allyused infMRIdata analysis[10].Scanscan alsobe performed formultiplesubjectsresultinginvoxel×time×subjects[10].The tensor model can have extra dimensionssuch as trials (e.g. rest, finger tapping, etc.) resultingin a fourth-order tensor of voxel × time×trials×runs[85].

2.8. Remotesensing

Nowadays,withtheaidofhyperspectralimagingtechnologywe areabletocapturespectralimageswithadifferentrangeofspec- tra. We can createmultipleimages ofa sceneor objectvia light from different parts of the spectrum. Furthermore, these hyper- spectral images can be used for target and object detection and identifyingmaterialsfromlongdistancesandofcourseanomalies.

The simplest tensor model used for hyperspectral images is a third-order tensorof spatialrows × spatialcolumn × wavelength thatisusedfortargetdetectionandclassification[23,86,87]orfor spaceobject materialidentification[88].The moreadvancedten- sormodelsarethoseusedby[89]whoadd twoextradimensions tothehyperspectraltensor.Thenewmodelwhichiscalledmulti- feature-tensorrepresentationisafifth-order tensorofspatialrows

×spatialcolumn×wavelength×scale×directionwhich scaleand directionaretheparametersoftheGaborfunction,chosenascon- stantnumbers.TheGaborfunctionisapopulartechniquefortex- turerepresentationanddiscriminationinimageprocessing.

The other dimension that can be added to the simple model is time. The majority ofremote sensing techniquesare based on the assumption that the spectral signature of objects is persis- tent and uniformover time, whichmight not be true.Therefore, a newmodelcalledmulti-temporalhyperspectral tensor,denoted by spatialrows × spatialcolumn × wavelength × time is proposed in[90].Thismodelisobtainedbycombiningmultiplehyperspec- tral images obtained at different time instances. It is considered asa newgenerationmodel ofsoftsensorsin theremote sensing community.

2.9. Sensors

One ofthe potential applicationsoftensors is anomaly detec- tion in sensor networks which uses the same tensor model as environmentalmonitoring differinginthe speedby whichsensor gatherdataandaremostlyusedinrealtimemonitoring.Thesen- sornetworksaremodeledasthird-ordertensorofmeasurements× space × time in [15,17,91]. In some other circumstances, sensors maygathersomeinformationfrompeople.Theschemeoftheten- sor in this condition is persons × measurements × time. For in- stance,in[92]sixmeasurementsaregatheredfrom20peopledur-

ing a period of 255 h in an office environment. Then via tensor decomposition,some meaningful eventsare detected which have beenlinkedtosomeregulareventssuchaslunchbreakorgeneral meetingoramonthlyseminar.

2.10.Engineering

Tensordecompositionhasbeenusedincivilengineering[26,93]

for detection of abnormal changes in the structure vibration re- sponse. Different sensors are employed in different parts of the structuresandtheirvibrationresponsesaremeasuredduringape- riodoftime.Therefore,thetensormodelisrepresentedasspace× time×frequency.

Applicationoftensorsinmetallurgyengineeringcanbeseenin [25]wheretensordecompositionisusedforfaultdetectioninthe hotstrip mill,specificallyfordamage onthesurfaceofcoils. The datageneratedfromASIS(automaticsurfaceinspectionsystem)is modeledasathird-ordertensorofCoils×PSD×frequencieswhere PSD(power spectrumdensities)and frequenciesare obtainedvia autoregressiveprocessesofseveralsignalsmodeledbyFastFourier transform(FFT).

An example from the mechanical engineering domain can be observedin[6]wheretensorsareappliedtodetectdamageinsen- sitiveartefactssuchasaircraftwingflap.Themainprobleminair- craft wing flap includes barely visible impacts on its surface. To dealthisproblem,theauthorsproposeanewmultiway modelfor detectionof damagesvia monitoring multiplesensors. They sug- gestatensorschemeofexperiment×sensor×timefortheanalysis task.

Theelectricalengineeringcommunityhasalsousedtensorsfor voltagesagdetectioninpowerdistributionnetworks[27].Theten- sor model of experiments × variables × time is proposed which laterisunfoldedtime-wisetodetectsagpoints.

The robotic engineers also used tensors for prediction of fall upinwalkingrobots[94].Inspiredbythetensor-basedbatchpro- cess monitoring, they model the non-linear trajectory of walk- ing robots and suggest a third-order tensor of trajectoryslices × scaledstatevariables(e.g.position,angletimeforfaultdetection.

2.11. Transportationsystems

Trafficdata(Origin×Destinationmatrix)isfrequentlyusedfor trafficplanningandmanagementinintelligenttransportation sys- tems.Tensor decompositionhasbeenusedonthetensorOrigin× Destination×Timefordiscoveryofspatiotemporaltrafficstructure [24,95]that hasimportant applications to urbanplanning and traf- ficjamcontrol.Sometimesthecollecteddatamightalsobeabnor- malduetofailuresinthecollectionprocessandrecordingsystems.

Thisproblemwhich isknown asoutlierrecovery is addressedin [96] withtensors. Tensorsalsoare usedforpredictionofmissing valuesintraffictensors(knownastensorcompletion)[97]. 2.12. Medicalapplications

Tensorsareexploitedforanalysisofelectronicmedicalrecords.

In[98]a changedetectionsystemisdeveloped forpainmanage- mentdecisionmaking.Acollectionofmedicalformscompletedat varioustreatmentandrecoverystagesaremodeledasasixth-order tensorofinitialpain×initialinfusion×sex×surgerysite×pain× monthandbasedonthatsomeinterestingchangepatternsarede- tected. Tensor decomposition is also applied to electronic health records(EHR)forpredictionofheart failure [28].A tensormodel ofMedication× Patient× Diagnosisis usedforthispurpose.Ten- sorsarealsousedinbio-informaticsformodelingmicro-arraygene expression tensors (gene × sample × time) that can be used for diagnosingdiseases [99]. Tensordecomposition hasrecentlybeen

(5)

Table 3

Existing and potential learning techniques for tensor-based anomaly detection.

Model Category Examples

Supervised Dimensionality reduction based Categorical target [26,93]

Numerical target [30]

Classification based Support tensor machines [89]

Supervised tensorLearning [111]

Tensor least square [112]

Multilinear discriminant analysis [114]

Factorization machines [115]

Tensor subspace learning [116]

Regression based Multiway PLS (N-PLS) [108]

Tensor ridge regression [121]

Support tensor regression [121]

H-MOTE [123]

Tensor regression [122]

Time series based Multilinear dynamical systems [124]

Greedy low-rank tensor learning [125]

Tensor hidden Markov model [126]

Tensor time series models [127,128]

Tensor singular spectrum analysis [129]

TriMine [71]

Semi Supervised Monitoring of decomposition statistics (SPE, T2, etc.) [18,25,35,39,40,44,45,94,118,119,130–132,134–136]

Eigenspace based [29,100]

Un-supervised Analysis of score-plots 1D [32,76,77]

2D [18,37,64,83]

3D [64,100]

Latent factors time series [48,67]

Multivariate-SPC on multiple latent factors [95]

Streaming residuals Dynamic tensor analysis [16,59,98,101]

Window-based tensor analysis [15]

Spatio-temporal tensor streams [91]

Histogram based [133]

exploitedin epidemiologyfordetectionandspotting diseaseout- breaks[29,100].AthirdordertensorofSpace ×Time×Indicators issuggestedforthemonitoringtask.

2.13.Otherapplications

Many other applications fromtensor-based methods have ap- pearedinrecentyears,inparticularduringthelastfiveyearsthat areinherentlydifferentfromthetraditionalapplicationsoftensors.

In[11]spectralchanges ofsubstratesandproductsaremonitored inreal time via modeling temporal evolution of enzyme activity withthird-order tensor of wavenumber × time × activity. Tensor analysisis applied for tracking the analysis of proteins. In [101]

authorsusetensoranalysistomodelthedeviationsofcontactsbe- tweenresiduesandtheir environmentwithrespecttoeach other (i.e.,relative behavior) aswell aswith respect to time (i.e. tem- poralbehavior). Thetensormodelusedinthiswork isinscheme ofcontractmatrix×timewherethecontactmatrixAij(t)represents thenormalized value ofthe numberof heavy atomsin residuei comingincontactwiththeheavyatomsinresiduejattimet.

A dynamicpatternof internationaltradesandtheasymmetric relations between countries is studied in [69] which can poten- tiallybeappliedforanomalydetection(e.g.economiccrisis).

Tensor decompositionhas applicationsin seismology. Athird- ordertensorofspace×time×frequencyisbuiltin[30]forthepre- diction of groundmotion after earthquake. Time-frequency com- ponents are obtained by transforming of acceleration records of earthquakegroundmotionswithcontinuouswavelettransform.

Tensor decompositions are used for analysing climate tensors climateindicator×grid×time[102–104]which makes them capa- bletechniquesforpredictionofclimatechanges.

Tensorsareusedforcrimeforecasting[31].Afourth-orderten- soroflongitude×latitude×time×measurementsisusedforthis

purpose wheremeasurements refer to criminal activities such as residentialburglaries, constructionpermits,motorvehiclelarceny, offenderdata,etc.

One of the recently emerged topics in anomaly detection is acoustic anomaly detectionin which severalacoustic sensors are monitoredforeventdetection.Acousticanomaly detectioncan be used, for instance, in safety monitoring of nuclear power plants [105]. Unfortunately, although tensor decomposition shows great potential, is not yet used for thispurpose, whereas we can find worksthatmodelvoicedataasathird-ordertensorofrate×scale

×frequency [106,107]orrate× time×frequency [4].These tensor modelsmightbeusedforacousticanomalydetection.

3. Tensor-basedanomalydetection:existingandpotential methods

Tensor methods arebetter known forunsupervised andsemi- supervised learning. However, in recent years, many supervised tensorlearningmethodsandtensortimeseriesmodelshavebeen developed. Some ofthese recenttechniques are not yet usedfor anomalydetection,butmightprovethemselvesusefulforthispur- pose. Table 3 presents the summary of these methods withcor- responding references. In the following, these strategies are de- scribedinmoredetail.

3.1. Supervisedmodels

Perhapswecan seekthe firstfootprintofusingtensors insu- pervised anomaly detection in Multiway PLS models [108]. The secondimportantroleoftensorsisindimensionalityreductionfor classification problems. Nowadays, more supervised tensor-based learningmethodsaredeveloped.Someofthesetechniques,inspite oftheirpotentialforanomalydetection,arenotyetappliedforthis

(6)

application. Thegoalofthissectionistoprovide astructuredlist ofexistingandpotentialapproaches.

3.1.1. Tensordecompositionfordimensionalityreduction

Inthiscategoryofsupervisedmodels,tensordecompositionis used as a dimensionality reduction tool forfeature extraction (a moreadvancedalternativeformatrix-baseddimensionalityreduc- tionsolutionssuchasPCA).Dependingonthetargetvalue,meth- odscanbegroupedintwocategories.

Inthefirstgroupofmethods[26,93,109],itisassumedthatwe havetwosets,trainandtest,wheretrainsetcontainsnormalsam- ples.Tensordecompositionisappliedonthenormaltensorasadi- mensionalityreductiontool.Then,oneofthefactormatrices(usu- allytime)isfedtoa regularclassifier(e.g.k-nearest neighborsor SVM)formakingamodelfromthenormalsamples.

Thegoalistopredictthelabelsoftheobservationsinthetest set.Therefore,thebuiltmodelfromtrainsetisusedtopredictthe label(normalorabnormal)ofobservationinthetestfactormatrix.

Forinstance,in[26,93],aPARAFAC decompositionwithknumber ofcomponents isappliedon thespace× time× frequencytensor corresponding to thenormal samples andthen the derived time factormatrixistrainedviak-NN(featuresarethelatentvariables).

The builtmodel is then used for classification of time pointsin thearrivingdata.Inotherrelatedwork,acombinationofPARAFAC andself-organizingmap(SOP)isused[90]forclassificationofsig- naturesofmultitemporal-hyperspectralimages.

Thesecond groupofmethods[30] followsthesameprocedure as the former, but instead of binary labels (abnormal/normal) a numeric targetisgivenforprediction.Therefore,regressionmod- els are replaced withcategorical classifiers.Targets can be single ormultiplevariables.Forinstance,in[30] theauthorsproposeto trainaGRNN(generalizedregressionneuralnetworks)ontheten- sorsubspacelatentvariablesforpredictionofmultipleseismolog- icalvariables.Theyusedthismethodforpredictionpurposes.This kindofapproachescanbeeasily extendedforanomaly detection.

Afurtherstep,however,isrequired.Forinstance,thedifferenceof predictedandactualvaluescanbeusedalongwithathresholdto detectanomalies.

Note that tensor decomposition isnot necessarilyused as di- mensionality reduction tool in classification tasks. Rather, it can servealongwithvariousother taskssuchascase-basedreasoning [6]andclustering[72,110].

3.1.2. Tensorclassifiers

Tensorclassifiersarethosethatadaptregularclassifiersforten- sorial data. In thesemethods, data is traineddirectly via tensor- based classifier and then the built model is used for prediction.

A binary tensor classifier has a great ability for anomaly detec- tion frommultiway data. A good example for thiscategory is a method presentedby Zhanget al.[89] whereSVM (support vec- tormachines) isextendedtoSTM(support tensormachines).The newtensorial classifier is traineddirectly withthetensorial data ofspecificobjectsandthenthebuiltmodelisusedfortargetde- tection. Inanotherwork [111],a generalframeworkcalledSuper- visedTensorLearning(STL)isproposed thatadaptsmanyconven- tional machine learningmethods to take higher order tensors as inputs. This model is successfully tested for binary classification problemswhichcanbeveryusefulforanomalydetection.In[112], inaddition toanother versionof STManewmethod isalsopre- sentedcalledTensor LeastSquare (TLS)whichisthe extensionof leastsquareclassifier.AnewtypeofSTMisalsopresentedin[113]

whichisappliedforgaitandactionrecognition.

Multilineardiscriminant analysis(MDA)[114] isalso proposed fortensor-basedimage classificationthat isanextensionofLinear discriminantanalysis(LDA)fortensordata.Factorizationmachines [115]isanothermethodfortensor-basedclassificationthatextends

SVMfortensorsusingPARAFACwhichismotivatedforSVMdiffi- cultyincollaboratingfilteringproblems.Tensorclassifiersarealso knownassupervisedmultilinearsubspacelearninginimage pro- cessingcommunity.Therecentsurveypaper[116]coversthema- jorityofadvancesfortensorsubspacelearning.

3.1.3. Tensorregression

Thefirst tensorregressionmodels emerged inthe1980s from theChemometricscommunityasthetraditionalnameofN-PLSor multiwayPLS[117].Inthesetechniqueswhicharewidelyusedfor anomaly detection [33–35,50,108,118–120]a model is built based onthe relationship ofthe input tensor(X) tosome quality mea- surements(Y).Thatmodelisthenused forpredictingthequality measurementsofnewtensors.Deviationsofpredictedtargetvari- ables fromthenormal referenceare interpreted asabnormalbe- havior.

Apart fromthe traditional multiway regression models, some novel techniques have been recently developed in different re- search communities. One is [121] that proposes two tensor re- gressionmodels calledtensorridge regression (TRR)and support tensorregression (STR)that respectively extendvector regression modelssuchasridgeregression(RR)andsupportvectorregression via some propertiesof PARAFAC model.The authors apply these methodsto facial dataforhuman-ageestimationandhead/body- poseprediction.Thesemethodscanbequiteinterestingforacou- pleofproblemsinTAD.

Anothertensorregressionmodelisproposedin[122]whichis motivatedbysomeproblemsinbrainimagingwhereobservedbi- narydiagnosisstatus(Y) isrequiredto bemodeled basedon the fMRIimagesasaninputtensor(X).Theproposedtensormodelis usedtoidentifyregionsofinterestinbrainsthatarerelevanttoa clinicalresponsewithapplicationsfordetectionofbraindiseases, includingAttentionDeficitHyperactivityDisorderandAlzheimer.

Moreover, Zhu et al. [123] proposes a tensor-based regres- sionalgorithmcalledH-MOTEthatiscapabletoincorporateback- groundknowledge intothemodel.Thismodelisusedforpredic- tionofwaferqualityinsemiconductormanufacturing.

3.1.4. Tensorforecasting

Tensorforecastingisan extensionofvector timeseriesmodels formultiway timeseries.The procedure foranomaly detectionis the sameas in univariate ones. A model is builtfor tensortime seriesandthenbasedonthatmodel,futuretensorsarepredicted.

Inthesubsequentmoment,ifthetensorhasaconsiderablediffer- encewiththepredictedtensor,itismarkedasananomaly.Differ- entmethodsaredevelopedfortensorforecasting.In[124]amodel calledMultilinearDynamicalSystems(MDS)isproposed,whichis a tensorialextension of lineardynamical system(LDS). Detection ofthe market collapseandclimate changeare introduced asap- plicationsofthismethodology.Anothertensorforecastingmethod, named Greedy Low-rank Tensor Learning is proposed in [125]

that is applied forforecasting tensortime seriessuch asclimate tensors.

Some time series analysistools are alsoextended for tensors.

For instance, a tensor-based Hidden Markov Model (HMM) ap- proach is proposed in [126] and is used for fault detection and prediction.Some ideasin time series analysis, such asweighting andaveraging are also extended for tensor analysisin [127,128]. Thetensorversionofsingularspectrumanalysis(SSA)isalsopre- sentedin[129],replacingSVDwithPARAFACinregularSSAandis appliedforanon-stationarysourceseparationofseizuresignals.

AninnovativeapproachcalledTriMine[71]isalsoproposedfor tensor forecasting in the context of topic modeling. In the pro- posedmethodology,atraintensordataisdecomposedasaregular tensordecompositionandthenbasedontheobtainedtime factor matrix, the next factor matrix is predicted withdifferent scales.

(7)

Fig. 1. Left) Tucker3 Decomposition: the third-order Tensor X is decomposed to a smaller core tensor and three factor matrices. Right) CP/PARAFAC decomposition: a third- order tensor X is decomposed to three factor matrices.

Later,the newpredictedtime factorsaremultipliedby othertwo dimensions to construct the tensor forecast for both short-term and long-term. This approach seems promising for multi-scale anomalydetectionandprediction.

3.2.Semi-supervisedmodels

Semi-supervised methodsare twofold.The firstgroup isorigi- natedinonlinefaultdetectionfrombatchprocesseswhereatrain tensormodelcorresponding tonormaloperation condition(NOC) isusually constructed. Then, arriving datais monitored todetect deviationsfromNOC modelusing statistics such asSquaredpre- dictionError(SPE)orHotellingT2 chart[12].Examplesofthiscat- egoryareexplainedin[12,44,45,118,130–132].Thereexistsanother groupofmethodsthat insteadoftheabovestatistics monitorthe anglebetweenEigenvectorsor Eigenvalue magnitudesin thetest setincomparisonwiththetrainset.Examplesofthiscategoryare explainedin[29,100].

As semi-supervised models impose less human intervention, they are more desirable comparing to supervised methods. In manyapplicationssuchasprocesscontrolornetworksecurity,la- belingdataforeachtimeinstantisinfeasible.Therefore,thismodel presentssuperiorflexibilityandsimplicity.

3.3.Unsupervisedmodels

Tensorsarebetterknownforunsupervisedlearninginproblems such asco-clustering and anomaly detection. In thissection, the popularunsupervisedmethodsaredescribed.

3.3.1. Scoreplot-based

The mosttraditional use of tensor decomposition in anomaly detectioniswithscoreplotsobtainedfromthedecompositionthat areanalyzed manually or automatically foranomaly detection or clustering. Scoreplots can be1D (only one factor) [32,76,77],2D [18,37,64,83] and 3D [64,100]. If the latent factor is time, some factors might be presented as a multivariate time series[48,67]. Sometimes this multivariate time series may also be monitored automaticallywithmultivariate SPCmethodssuch asHotellingT2 [95].

3.3.2. Streamingdecompositionerror-based

Thisgroupofmethodsiscomposedbythosestreamingdecom- positionmethodsthat operateon dataincrementally withoutthe requirementfor a train set. They monitor the decomposition re- constructionerrorforeachtensorineachtimeinstant.Anomalous timeinstant is theone which corresponding reconstruction error goesbeyondapre-definedthreshold(e.g.twicestandarddeviation oferrorssofar).Examplesaregivenin[5,16,17,56,59,91].

3.3.3. Histogram-based

Fanaee and Gama[133] proposed an efficient multi-aspect- streamingtensoranalysisapproachcalledMASTAbasedononline histograms.In this approach, the whole tensor is vectorized and

is simultaneously segmented into slices in each mode. Then the distributionofeachsliceiscomparedversusthevectorizedtensor usingastandardmetricssuchasEarthMover’sDistance.Theused logic is that tensor information isdistributed over slices in each mode. Bymatching slices withthe referencedistribution, similar slicescanbeidentifiedaswellasanomalousslices.

4. Tensordecomposition

Traditionaldataanalysistechniques,suchasthePCA,clustering, regression, etc. are only able to model second-dimensional data andtheydonot considertheinteraction betweenmorethan two dimensions.However, inseveralreal-worldphenomena,thereisa mutual relationship between more than two dimensions, inpar- ticular, when the time dimension is added to the problem. Ten- sor (Multi-way) data analysis considers all mutual dependencies betweenthe differentdimensionsandprovides a compact repre- sentation of the original data in lower dimensional spaces. The most common multi-way analysis techniques are that of Tucker [137]andCP/PARAFAC[138,139],whicharegeneralizedversionsof PCAor,morespecifically, Singular ValueDecomposition (SVD)for higherordermatrices.

Amongmanytypesoftensordecompositionapproaches,Tucker andPARAFAC/CPmodelsarethemostusedones.Tuckerdecompo- sitionapproximates alarge tensorby a productofa smallerten- sorwithpredetermineddimensions(calledcoretensor),multiplied byfactormatricesineachdimension(SeeFig.1Left).Formally,the problemcan be definedasan optimizationproblem[140]:Given atensorX∈Rn1×n2×...×nd,findacoretensorG∈Rr1×r2×...×rd with pre-definedintegersriwith1≤rinifori=1,2,…,d.andfactor matricesA(i)thatoptimizes

min

XG×1A(1)×2A(2)...×dA(d)

(1) Subjectto:

G∈Rr1×r2×...×rd,

A(i)∈Rni×ri,(A(i))TA(i)=I,i=1,2,3.

In the above model, d represents the dimension of the ten- sor (e.g.For three-dimensionaltensor, d=3) andr1,r2, ...,rd (i= 1,2,...,d)aremodelinputparameters(coresize).Thesimplestal- gorithm forfinding matricesA(d)andG isa methodcalledHigh- order SVD (HOSVD) [141] where firstly tensor is unfolded into lower-order matricesover all its modes(e.g. unfolding I × J × K tensortoI×JKorJ×IKorJ×IKmatrices)andthenSVDisinde- pendentlyperformedoneachmatrix(e.g.I×JKmatrix).Themore sophisticated approach is high-order orthogonal iteration (HOOI) [142] that uses alternating optimizationto find better projection matrices iteratively. In the HOOI algorithm, HOSVD can used for betterestimationoftheinitialelementsofA(d)andG.

PARAFAC/CP also is a special case of Tucker model where the core tensor is super-diagonal. Therefore, obtaining (1) for PARAFAC/CPisstraightforward.Although,thereexistotherkindsof decompositionmodels,thealgorithmicdetailsofthesekindofap- proachesisoutofthescopeofthissurvey.However,theinterested

(8)

Table 4

Methods for tensor decomposition and applications to anomaly detection.

Family Method Anomaly detection example

Tucker Multiway PCA (Tucker1) [46] [25,25,27,27,39,39,44,47,94,110,134,135,147–149]

GTucker2 [150] [150]

Tucker3 [137] [2,2,3,9,4 8,4 9,53,72]

Non-negative Tucker [151,152] [24,83]

HOSVD [141] [8,29,61,153]

PARAFAC PARAFAC [138] [3,10,11,26,30,49,63,73,76,80,81,90,154]

Non-negative PARAFAC [143] [20,67,77,84,144]

PARAFAC2 [145] [37]

Dynamic PARAFAC [35] [35]

CP-APR [146] [70]

DEDICOM [155] [69,156]

Bayesian EM-based (pTucker [157] , ETF [158] , InfTucker [75] ) [75,92,158]

MAP-based (ARD [159] , FBCP [160] )

Gibbs sampling (Multi-HD [161] , BTA [162] , BPTF [163] , TriMine [71] , MGP-CP [164] , sp-PARAFAC [165] ) [71]

LPP TLPP [166,167] [40,58]

TGLPP [168] [168]

ICA Tucker1-based ( MICA [45] , MKICA [169] , FS-MKICA [132] ) [45,132,169]

Tucker3-based [104] [104]

PARAFAC-based [85]

readers are referred to [13,142] for more technical details about thesemodels.

In the following we list the six main categories of methods for tensor analysis that can potentially be used foranomaly de- tection, including PARAFAC-based, Tucker-based, DEDICOM-based, Bayesian,LocalityPreservingProjection(LPP)basedandICA-based.

Table4demonstratesthesummaryofexistingtechniques.

4.1. PARAFAC-based

4.1.1. PARAFAC

PARAFAC/CP has been the most used decomposition model amongother models.Thereasonisprobablythesimilarityofim- plementation and interpretation, such that PARAFAC as PCA re- quires only one user input which is the number ofcomponents.

PARAFAC model is applied in wide range of anomaly detection tasksinvariousdomains.Forexamplesee[3,11,26,49,73,81].

4.1.2. Non-negativePARAFAC

Oneoftheimportantissuesintensordecompositionisthatel- ementsinfactormatricescan getnegativevalues.Thesenegative scores cannot be justified with the our physical knowledge (e.g.

fMRItensors).Thismightnotbeaproblemwhenwewanttowork directly oneigenspace, butmight be aconstraint whenwe want toperformouranalysisontheobtainedcomponents.Thisproblem is mostlymotivatedby thechemometrics andneuroscience com- munity wherethe output oftensordecomposition requiresto be interpretedbyaspecialist.PARAFACmodelwithnon-negativecon- straintiscallednon-negativePARAFACornon-negativetensorfac- torization (NTF) which was presentedfor the first time in [143]. Nowadays, NTFhas becomeremarkablypopulardueto itsmean- ingful andphysicalinterpretation,especiallyin manual score-plot basedanomalydetection[20,67,77,84,144].

4.1.3. PARAFAC2

In some specific circumstances as occur in batch monitoring, a tensorwithuneven-lengthslicesappears.Forinstance,inbatch monitoringwithtensorofbatch×measurement×time,thematrix measurement×timecanbeofdifferentlength foreachbatch due to differentelapsed time forthe batch. PARAFAC2 [145]whichis anextension ofPARAFACprovidesasolutionforsuch problems.It isusedin[37]forfaultdetectionfrombatchtensorswithunequal

time axisandits superiorityover regular PARAFAC andTucker is shown.

4.1.4. DynamicPARAFAC

AprocedurecalledDPARAFAC(dynamicparallelfactoranalysis) isintroducedin[35]foronlinefaultdetectioninprocessmonitor- ing.Thismethodologyincludestwophases:learninganddetection.

Inthelearningphase, weare giventhedataofnormal operation condition(NOC).EachsliceoftheNOCtensor(matrixmeasurement

× time) issegmentedinto differentequal-length windows in the timeaxis.Thenallthesegmentstogetherformanewtensor(mea- surement×window× time).PARAFAC isthen appliedonthisten- sorforeachbatchandloadingsareobtained.Theaverageoffactor matricesforeach windowis obtainedforall batches.Later,some statisticssuchasT2 andcontrollimitsarecomputedforeachtime point.Inthedetectionphase,whennewbatchesofdataarrives,it isarrangedasthepreviousprocedure,andisthenprojected onto theprevious under-controlsubspacetoassessitsdegreeofabnor- mality.

4.1.5. Poissontensorfactorization

Poissontensordecomposition(PTF) [146],alsoknownasCAN- DECOMP/PARAFACAlternating PoissonRegression(CP-APR) usesa new fitting algorithm based on Kullback–Leibler (KL) divergence insteadofcommonALSfittingalgorithm inPARAFAC.The ideaof such approachesis that count data can be better describedby a PoissondistributionratherthanGaussian distribution.Thismodel issuggestedforanomalydetectionfromcountdata[70].

4.2.Tucker-based

4.2.1. Tucker1

Tucker1 or Multiway PCA (MPCA) is the first tensor model used for TAD in many applications [25,25,27,27,39,39,44,47,94,110,134,135,147–149]. Tucker1 is used when variance is only important in one dimension. Therefore, the tensor is usually unfolded through one dimension and then regularPCAisappliedtotheunfoldeddata.Forinstance,inbatch monitoring, Tucker1 model is used on batch-wise or time-wise unfoldedmatrices.

(9)

4.2.2. GTucker2

Tucker2modelisbarelyusedforanomalydetection.Onlyvery recentlyageneralizedversionofTucker2calledGTucker2waspro- posed [150] for fault detection from tensors with unequal slice lengths.GTucker2isequivalenttoPARAFAC2,such thatPARAFAC2 can be viewed as a constraint version of GTucker2. In [150] the superiorityofGTucker2isshownoverTucker1,PARAFAC, Tucker3 andPARAFAC2onthisspecificproblem.

4.2.3. Tucker3

The other model whichpromises more flexibility isknown as Tucker3. This model as is presented in the previous section is normally used when there is multiway variations in all modes [2,2,3,9,48,49,53,72]. For instance, for water quality tensors, we areinterestedindiscoveringabnormallocations,timeinstantsand measurements that are more correlated to anomalies. Therefore, Tucker3isthepreferredmodel[2].

4.2.4. Non-negativeTucker

There are some extensions of NTF for Tucker decomposition, called Nonnegative Tucker decomposition [151,152]. The NTF is usedformodelingEEGtensors[24,83]performingbetterthanNMF (non-negativematrixfactorization)insomecircumstances.

4.2.5. HOSVD

Higher-order singularvaluedecomposition(HOSVD)isagener- alizationofSVDforhigher-ordertensors.HSOVDcanbeviewedas aspecialcaseoftheTucker3modelwhenALSoptimizationisnot performed, rather the tensor is unfolded across different modes andthen regularSVDisappliedontheunfolded matrices.There- fore,HOSVDdoesnotprovidethebestapproximationofatensor, itis rather usedas an initialization stepin Tucker3 forreducing thenumberofiterationsinALSprocedure[13].

4.3.ICA-based

Independentcomponentanalysis(ICA)isapopularmethodfor decomposing a multivariate signal into additive subcomponents.

Thebasic assumptioninICA isthat subcomponents are indepen- dent,non-Gaussiansignals.ExtensionofICAfortensorsisavailable forTucker1(MPCA)[45,132,169],Tucker3[104],andPARAFAC[85]. Allthese methods except the latterone are applied foranomaly detection.

4.4.DEDICOM-based

DEDICOM (DEcomposition into DIrectional COMponents) [145,155] is a generalization of PARAFAC2 for discovering asym- metric relationships between two modes that refer to the same typeofobject(e.g.transactionaldata).Thismodelhasbeenfound to be effective in temporal analysis of social networks [69,156]. Therefore, it can be used for event detection goals in similar scenarios.

4.5.Bayesianmethods

Traditional tensor decompositions are unable to handle is- sues such as missing values, outliers, noises and different data types. Recently, probabilistic methods started to be taken into consideration dueto their flexibility andless restrictive assump- tions.Theyaresuccessfullyappliedtoanomalydetectionproblems [71,75,92,158] and is expected that the number of their applica- tionsbeincreasedinnearfuture,especially whenthemajorityof theseapproachescanestimatethetensorrankduringthedecom- positionprocess.

Bayesian approaches, based on the means of used statistical inference can be divided intothree categories. The first group is based on the Expectation maximization (EM) algorithm, includ- ing pTucker [157], Exponential Family Tensor Factorization (ETF) [158] and Infinite Tucker (InfTucker) [75]. The second group ex- ploitsmaximuma posterior(MAP)estimation,such asAutomatic RelevanceDetermination(ARD)[159]andFullyBayesianCPFactor- ization(FBCP)[160].Finally,thethirdcategoryusesgibbssampling asaninferenceengine.ExamplesareMulti-HD[161],Bayesianten- soranalysis(BTA)[162],BayesianProbabilisticTensorFactorization (BPTF)[163],TriMine[71],multiplicativegammaprocessbasedCP decomposition(MGP-CP)[164]andsp-PARAFAC[165].

4.6. Localitypreservingbasedmethods

Tensor decomposition methods such as Tucker and PARAFAC donot considerthe intrinsiclocalgeometric structure oftensors.

A recent group of techniques is developed for dealingwith this problem on the basis of locality preserving projections (LPP). It hasbeen shownin [40,170]that LPP-based approacheshavebet- terperformancethanconventionalPCA-basedmethodswhichpre- serve only the global Euclidean structure. LPP-based approaches aremoreattractivewhentwodimensionsoftensorsareinapair- wiserelationship(e.g.imagedata).

ThemostpopularmethodforthisfamilyisTensorLocalityPre- serving Projection (TLPP) [166,167] which is applied to detection problems[40,58].Amore sophisticated version ofTLPP hasbeen proposedvery recently,calledTensor Global-LocalPreservingPro- jections(TGLPP) andisappliedforthefaultdetectionproblemin batchprocesses[168]whichisabletocapturebothglobalandlo- calstructuresoftensorssimultaneously.

4.7. Tensorrankestimation

Thequality ofthe tensormodelhasa directrelationshipwith truemodelselection.AlthoughestimationoftensorrankisanNP hardproblem[185],inthemajorityofcases,anoptimallow-rank approximation isdesirable. Inthe majorityofworksdiscussed in thissurvey,itisassumedthatthenumberofcomponentsisknown inadvanceviaknowledgeoftheunderlyingphenomena.However, thismightnotbethecaseinmanyapplications.Someapproaches aredevelopedforestimationofoptimalnumberofranksforboth tensor decomposition approaches. Some of these approaches are listedinthebelowsubsections(SeeTable5forsummary).

4.7.1. Cumulativesumofthepercentageofeigenvaluesorexplained variance

Thisisthemostbasicmethodforchoosingthenumberofcom- ponents.ItismostlyusedforMPCA(Tucker1)models.Thenumber of principal components is chosen based on the cumulative per- centage ofeigenvalues orcumulativepercentageof theexplained variance.Ifthecumulativepercentageoffirstkcomponentsisover a threshold (e.g. 75%), k is selected as the adequate number of components. Forinstance, [43] uses the eigenvalue criterion and [33,47,171,172] use cumulative variance for anomaly detection in processbatchtensors.

Sometimes,instead ofa thresholdcut point, broken stick rule [173]isused.Thisapproachassumesthatpercentageofexplained variance (or eigenvalues)of a random datawhen is divided ran- domly amongst k components followsa broken-stick distribution Gk=1p

p i=k

1

i. Therefore, the k-th principal component is valu- ableifits valueisgreater thanGk (i.e.arandom PC).Thisruleis usedformodelorderestimationofTucker1foranomalydetection [33,134].

(10)

Table 5

Methods for tensor rank estimation.

Method Common use Fast Auto Application to anomaly detection

Cumulative sum of percentage of eigenvalues [43] Tucker1 No Yes [43]

Cumulative sum of explained variance [33] Tucker1 No Yes [33,47,171,172]

Broken stick rule [173] Tucker1 No Yes [33,134]

Cross-validation [174] Tucker1/ Tucker3/ PARAFAC No Yes [33,33,35,45,135,175]

CORCONDIA [176] PARAFAC No Yes [3,18,22,26,49,77,154]

DIFFIT [177] Tucker3 No Yes [83,84]

FastDIFFIT [178] Tucker3 Yes Yes [95]

Multiway scree plot [179] Tucker3 No No [2,37,49,51,53,72]

Split-half analysis [180] PARAFAC No Yes [3,49]

Maximum block improvement [140] Tucker3 Yes+ Yes [95]

Convex hull [181] Generic No Yes Is not yet applied for anomaly detection but is

used for tensor rank estimation [159] . Akaikes information criterion (AIC) [182] Generic No Yes

Bayesian information criterion (BIC) [183] Generic No Yes Automatic relevance determination (ARD) [159] Generic Yes Yes

Genetic algorithm [184] Tucker3 No Yes Used for noise removal [184]

4.7.2. Cross-validation

Apopularmethodforfindingtheadequatemodelorderincom- ponentanalysisiscrossvalidation[174].Thistechniqueisapplied forfaultdetectionproblemin[33,35,45,135]for estimation of num- berofcomponentsinMPCAmodelanditsextensionispresented in[175]forTucker3andPARAFACmodels.Thebasicideaofcross- validationisleavingout asingledataelement[175],aslice[186]

orrandom halfofa slice[187] atatime, performtensordecom- position and then compute the Predictive Residual Error Sumof Squares (PRESS) = I

i=1J j=1

K

k=1(X˜i jkPQRXi jk) for the elements notincludedinthemodelbuilding.Finally,thesumofPRESSval- uesforeach principalcomponent(p,q,r)iscalculatedforallelim- inatedpartstocompute PRESSpqr.Those(p,q,r)that givethemin- imum PRESS are considered a good model dimension. The more sophisticated cross-validationapproachesare developed basedon w-statistics [175]which useF-teststrategy todetermine whether anadditionalcomponentisworthtoaddingornot.

4.7.3. CORCONDIA

Core consistency test (also known as CORCONDIA) [176] is a heuristic method used for the determination of the number of components in PARAFAC model. It is widely applied in anomaly detection from tensors [3,18,22,26,49,77,154]. Assuming P as the numberofcomponentsinPARAFACmodel,CORCONDIAchecksthe superdiagonailty of Tucker3 model with a core size of (P,P,P). If all elements in the core tensor except those with same indices (i=j=k)becomezero,itconcludesthatthePARAFACmodelfits perfectly. The procedure is asfollows. First, coreconsistency cri- terionisdefinedasthesimilaritypercentageofTucker3coresize withsuperdiagonalarrayTofonesandonlythenisPARAFACfitted foraseriesofmodelsfromP=1toF,computingcoreconsistency forall these models.The last modelin theseserieswhich corre- spondingTucker3coreissimilartoTisconsideredastheadequate numberofcomponents.

4.7.4. DIFFIT

DIFFIT(DifferenceinFit)[177]isaresidual-basedheuristicpro- cedureusedfortheestimationofthenumberofcomponentsina Tuckermodel.ItcomputestheTuckerdecompositionforallsensi- blecombinationsofcomponents(i,j,k)andcomputesthemodelfit asFit(m)=1−

X−X˜

F

XF foreach potentialmodelwhere

.

isthe

Frobeniusnormandm=i+j+k.ThentheDIF(m)form-thmodel iscomputedasFit(m)Fit(m−1)andaccordingly,DIFFITiscom- puted asDIFFIT(m)=DIF(m)/DIF(m+1).Themodel withthe largest DIFFIT value is chosen as the most adequate model. The DIFFIT model has been used for estimating tensor model dimension in

EEGtensors [83,84]. DIFFITrequires computingthe Tuckerfit for all combinations of components which is very time-consuming.

[178] proposed a faster version of DIFFIT (so called Fast-DIFFIT) that requires performing a singlecomputation of Tucker decom- position.[178] provide some evidence that this approach can be sufficient asthe exact solution.Fast-DIFFIT is tested foranomaly detectionpurposesin[95].

4.7.5. Multiwayscreeplot

Multi-way score plot [188] projects Tucker3 model onto the convexhull. The mostadequate modelis the one onthe convex hullwith less complexity andbetter fit. This method is used in [2,37,49,51,53,72]fortensor-basedmonitoringandanomaly detec- tion.

4.7.6. Split-halfanalysis

Thistechniquewasprimarily introduced byHarshman andDe Sarbo[180]forPARAFAC.Theproceduresplitsthetensorintotwo (ormore)parts andthe modelwiththesame numberofcompo- nentsisbuiltfortwoparts.Theassumptionofthismethodisthat ifthe model is valid, both models on two separate sides should remain stable. A criterion calledsplit-half stability coefficients is defined and if its value is lower than a threshold (e.g. 0.1), the modelisconsideredstable.However,themainrequirementforthis method is that tensor must be splittable [188] which is restric- tive fornon-stochasticsystems.Limited workssuch as[3,49] use thistechniquetoascertain thenumberof componentsintensors withapplication to anomaly detection.Extension of this method waslaterproposedbyKiersandMechelen[189].

4.7.7. Othermethods

Some other approaches proposed for tensor rank estimation, which may not be used for anomaly detection applications, can be very useful to the area nonetheless. Some of these meth- odsincludeconvexhull[181], Akaikesinformationcriterion (AIC) [182],Bayesianinformationcriterion(BIC)[183]andAutomaticrel- evancedetermination(ARD)[159].Thesefourapproachesareim- plemented formultiway modelsandcompared in[159]in which thesuperiorityofARDis concludedagainst theother threeones.

Bayesian-based tensor decompositions mayalso be a good solu- tion fortensor rank estimation since they automatically find the tensorrankintheir inference procedure [160,164].In [184] adif- ferentapproachnamedGAHNTDisproposedbasedontheGenetic algorithmforfindingtheoptimalTuckerlower rank,butnocom- parisonisperformedagainstother knownapproaches.Brockmeier et al. [190] proposed a greedy approach that builds the tensor

Referências

Documentos relacionados

In section 2.7 we introduce the concept of central charge or conformal anomaly, the Schwarzian derivative is presented and it is showed how the energy-momentum tensor transforms as

Auditores formados na área de contabilidade são mais bem-sucedidos em detetar e restringir os acréscimos discricionários do cliente, contudo o mesmo autor defende que a

Similar calculations must be now developed for the other components of the effective dielec- tric tensor, in order to verify if Onsager symmetry is indeed satisfied by the tensor,

Com o fito de explorar tal imprecisão terminológica e contribuir para a construção de metodologia de identificação de instituições jurídicas comensuráveis para o apoio

Using the uni fi ed system model described in the last section, the received signal can now be viewed as a three-way (tridimen- sional) array, with dimensions associated with

A prescrição médica obedece a várias regras estipuladas legalmente, nomeadamente a obrigatoriedade de inclusão da denominação comum internacional (DCI) da

Foram passados vários instrumentos (Escala de Ansiedade, Depressão e Stresse (EADS-21), Inventário Motivação Intrínseca, Escala de Satisfação Profissional, Inventário

However, in the infrequent cases in which there is no detection of an associated hematologic disease, called idiopathic LCDD, there is no consensus about its therapeutic