Publicações do PESC On Reducing the Complexity of Matrix Clocks

(1)

L ucia M. A. Drummond

UniversidadeFederalFluminense

InstitutodeComputac~ao

RuaPassoda Patria,156

24210-240Niteroi- RJ,Brazil

[email protected]

Valmir C.Barbosa

UniversidadeFederaldo Rio deJaneiro

ProgramadeEngenhariadeSistemaseComputac~ao,COPPE

CaixaPostal 68511

21945-970Rio deJaneiro- RJ, Brazil

[email protected]

Abstract

Matrixclocksareageneralizationofthenotionofvectorclocksthatallowsthelocalrepresentation

of causal precedenceto reach intoan asynchronousdistributed computation'spast with depth x,

where x 1 is an integer. Maintaining matrix clocks correctly in a system of n nodes requires

thateverymessagebeaccompaniedbyO(n x

) numbers, whichre ectsan exponentialdependency

of thecomplexityof matrixclocksupon thedesireddepthx. Weintroducea noveltype ofmatrix

clock,one thatrequiresonlynx numbers tobe attachedtoeachmessagewhilemaintainingwhat

for many applications may be the most signicant portion of the information that the original

matrixclockcarries. Inordertoillustratethenew clock'sapplicability,we demonstrateitsusein

themonitoringof certainresource-sharingcomputations.

Keywords: Vector clocks,matrix clocks.

1. Introduction and background

WeconsideranundirectedgraphGonnnodes. EachnodeinGstandsforacomputationalprocess

andundirectededgesinGrepresentthepossibilitiesforbidirectionalpoint-to-pointcommunication

between pairs of processes. A distributed computation carried out by the distributed system

representedbyG canbe viewedasa setof events occurringatthe various nodes. An event isthe

sendingof amessagebya nodetoanothernodetowhichitisdirectlyconnectedinG(aneighbor

(2)

particularcomputationathand, andare leftunspecied).

Thestandardframework foranalyzingsuchasystemisthepartialorder, oftendenotedby,

that formalizestheusual \happened-before"notionof distributedcomputing [1, 10]. Thispartial

order is the transitive closure of the more elementary relation to which the ordered pair (v;v 0

)

of events belongs if eitherv and v 0

are consecutive events at a same nodein G or v and v 0

are,

respectively,thesending andreceivingofa messagebetween neighborsinG.

Atnodei,we letlocaltimebe assessedbythenumbert

i

ofeventsthatalreadyoccurred. We

interchangeablyadopt thenotationst

i =time i (v) andv=event i (t i

)toindicatethat v isthet

i th

event occurringat nodei, provided t

i

1. Anotherimportant partialorder on the set of events

is therelationthat givesthepredecessorat nodej 6=iof an event v 0

occurringatnodei. We say

thatan event vis sucha predecessor,denotedbyv=pred

j (v

0

),ifv istheevent occurringatnode

j such that v v 0

for which time

j

(v) is greatest. If no such v exists, then pred

j (v 0 ) is undened andtime j pred j (v 0 ) is assumedtobezero.

Therelationpred

j

allowsthedenitionofvectorclocks[8,9,12],asfollows. Thevectorclock

of nodeiattime t

i

(thatis,followingtheoccurrenceof t

i

eventsatnodei),denotedbyV i

(t

i ),is

a vectorwhose jthcomponent,for1jn, iseitherequalto0 (if t

i =0)orgivenby V i j (t i )= ( t i ; ifj=i; time j pred j event i (t i ) ; ifj6=i (1) (if t i 1). In otherwords, V i j (t i

) is eitherthe current time at nodei, ifj =i, or is the time at

nodej that resultsfrom theoccurrence atthat nodeof the predecessorof the t

i

thevent of node

i, otherwise. If no such event exists(i.e., t

i =0), thenV i j (t i )=0.

Vectorclocksevolvefollowingtwo simplerules:

Upon sending a message to node k, node i attaches V i

(t

i

) to the message, where t

i is

assumedtoalreadyaccount forthe messagethatis beingsent.

Upon receiving a message from node k with attached vector clock V k , node i sets V i i (t i ) to t i

(which is assumed to already re ect the reception of the message) and V i j (t i ) to max V i j (t i 1);V k j ,forj6=i.

It is a simplemattertoverifythat theserules do indeedmaintainvectorclocksconsistentlywith

theirdenition[8, 9,12]. Under theserulesorvariationsthereof,vectorclockshave provenuseful

ina varietyof distributedalgorithmstodetectsometypes ofglobal predicates[7, 9].

One naturalgeneralizationof the notion of vector clocks is the notion of matrix clocks [14,

16]. Foran integerx1,thex-dimensionalmatrix clockof nodei attimet

i ,denotedbyM i (t i ), has O(n x ) components. For 1 j 1 ;:::;j x n, component M i j1;:::;jx (t i ) of M i (t i ) is only dened for i=j 1 == j x andfori6=j 1 6=6= j x . Asin the denitionof V i (t i ), M i j1;:::;jx (t i )=0 if t i =0. Fort i

1,on the otherhand,wehave

M i j 1 ;:::;j x (t i )= ( t i ; ifi=j 1 ==j x ; time jx pred j x :::pred j 1 event i (t i ) ; ifi6=j 1 6=6=j x , (2)

(3)

1 x 1 i

node i, then the predecessor at node j

2

of that predecessor, and so on through node j

x

, whose

local time after the occurrence of the last predecessorin the chain is assignedto M i j 1 ;:::;j x (t i ). It

is straightforwardto see that, for x= 1, thisdenition is equivalentto the denition of a vector

clockin(1). Similarly,themaintenanceof matrix clocks followsrules entirely analogoustothose

used tomaintainvectorclocks[9].

Whilethejthcomponentofthevectorclockfollowingtheoccurrenceofeventv 0

atnodei6=j

gives the time resulting atnode j from the occurrence of pred

j (v

0

), the analogous interpretation

that exists for matrix clocks requires the introduction of additional notation. Specically, the

denition of a set of events encompassing the possible x-fold compositions of the relation pred

j

with itself, denoted by Pred (x)

j

, is necessary. If an event v occurs at node j, then we say that

v2Pred (x) j (v 0 ) foran event v 0

occurringat nodeiifone ofthe followingholds:

x=1,j6=i,andv=pred

j (v

0

).

x>1andthereexists k6=isuch that anevent v occursatnodek forwhichv=pred

k (v 0 ) andv2Pred (x 1) j (v).

Note that this denition requires j 6= k, in addition to k 6= i, for v 2 Pred (x) j (v 0 ) to hold when x=2. For x > 1 and j 1 6= 6= j x

= j, this denition allows the following interpretation of

entry j

1 ;:::;j

x

of the matrix clock that follows the occurrence of event v 0

at node i, that is, of

M i j 1 ;:::;j x 1 ;j time i (v 0 )

. It givesthetimeresultingatnodej from theoccurrenceofan event that

isintheset Pred (x)

j (v

0

). Ofcourse,thenumberof possibilitiesforsuchan event isO(n x 1

)inthe

worstcase, owingtotheseveralpossiblecombinationsof j

1 ;:::;j

x 1 .

Interestingly, applications that do require the full capabilities of matrix clocks may be still

unknown. Infact,itseemsasimplemattertoarguethataslightlymoresophisticateduseofvector

clocksor simply the useof two-dimensional matrixclocks suÆcesto tackle someof the problems

that have been oered as possible applicationsof multidimensional matrix clocks [9, 13]. What

we do in thispaper is todemonstrate howthe distributed monitoringof certain resource-sharing

computationscanbenetfromtheuseofmatrixclocksandthat,atleastforsuchcomputations,it

ispossibletoemploymuchlesscomplexmatrixclocks(thatis,matrixclockswithmanyfewer

com-ponents),whichnonethelessretainthe abilitytoreachintothe computation'spast with arbitrary

depth.

The key to this reduction in complexity is the use of one single event in place of the set

Pred (x)

j (v

0

). As we argueinSection3, followinga briefdiscussion of theresource-sharingcontext

inSection2, thissimplicationleadstomatrix clocks of sizenx, thereforeconsiderablyless

com-plex than the O(n x

)-size originalmatrix clocks. Concluding remarks follow in Section4, after a

(4)

Theresource-sharingcomputationweconsiderisoneof theclassicalsolutionstotheparadigmatic

DiningPhilosophersProblem(DPP)[6]ingeneralizedform. Inthiscase,G'sedgesetisconstructed

from a given set of resources and from subsets of that set, one for each node, indicating which

resourcescanbe everneededbythat node. Thisconstructionplacesan edgebetweennodes iand

jifthesetsofresourcesevertobeneededbyiandjhaveanonemptyintersection. Noticethatthis

constructionisconsonantwiththeinterpretationofedgesasbidirectionalcommunicationchannels,

becausewhatit doesis todeploy edgesbetweenevery pairof nodesthat mayever competefor a

sameresourceandmust thereforebeabletocommunicatewith each othertoresolve con icts.

InDPP,thecomputationcarriedoutbyanodemakes itcycleendlesslythroughthreestates,

which are identiedwith the conditionsof being idle, beingin the process of acquiringexclusive

accesstotheresourcesitneeds,andusingthoseresourcesfora niteperiodof time. Whileinthe

idlestate,the nodestartsacquiringexclusiveaccesstoresourceswhentheneedarisestocompute

on sharedresources. It isarequirementof DPPthatthenodemustacquireexclusive accesstoall

the resourcesit shares with allits neighbors,so it suÆcesfor the nodeto acquirea token object

it shares with each of its neighbors (the \fork," as it is called), each object representing all the

resourcesit shares with that particular neighbor. When in possession of all forks, the node may

startusingthe sharedresources[1, 5].

The process of collecting forks from neighbors follows a protocol based on the sending of

requestmessages bythenodethatneedsthe forksandthesending ofthe forksthemselvesbythe

nodesthathavethem. Morethanoneprotocolexists,eachimplementingadierentruletoensure

the absence of deadlocks andlockouts during the computation. The solution we consider inthis

sectionisbased on relative prioritiesassignedto nodes. Another prominent solutionis alsobased

on theassignment ofrelativepriorities,buttothe resourcesinsteadof tothenodes[11,15].

The priority scheme of interest to us is based on the graph-theoretic concept of an acyclic

orientationofG,whichisanassignmentofdirectionstotheedgesofGinsuchawaythatdirected

cycles are not formed. Such an acyclic orientation is then a partial order of the nodes of G,

and is as such suitable to indicate,given a pair of neighbors, which of the two has priorityover

the other. Most of the details of this priorityschemeare not relevant to our present discussion,

but before stating its essentials we do mention that the lockout-freedom requirement leads the

acyclic orientation to be changed dynamically (so that relative prioritiesare never xed), which

in turn leads to a rich dynamics in the set of all the acyclic orientations of G and to important

concurrency-relatedissues[2{4].

Onceapriorityschemeisavailableovertheset ofnodes,whatmatterstousishowit isused

inthefork-collectingprotocol. Whenarequestforforkarrivesatnodej fromnodei,jsendsithe

fork theyshareifj iseitheridleorisalsocollectingforks butdoesnothave priorityoveri. Ifj is

alsocollecting forks and haspriorityover i, orif j is usingshared resources, then the sending of

the fork toi ispostponedto untilj has nishedusing theshared resources. Note that two types

(5)

andultimatelytosenditheforkisintheworstcasen 1[1,3,5]. Thereasonforthisissimple: j

iswaitingfor afork from anothernode,whichmayinturnbe waitingforafork from yetanother

node,andsoon. Becausethepriorityschemeisbuilton acyclic orientationsofG, such achain of

waitsdoes necessarilyendandisn 1 nodeslong intheworstcase.

Whethersuchlongwaitsoccurornotisofcoursedependentuponthedetailsofeachexecution

oftheresource-sharingcomputation. Butiftheydooccur,onepossibilityforreducingtheaverage

wait is to increasethe availabilityof certaincritical resourcesso that G becomesless dense[1, 3,

4]. Perhaps another possibility would be to ne-tune some of the characteristics of each node's

participationin the overall computation,such as the durationof its idle period,for example. In

anyevent,the abilitytolocallydetect longwaits(aglobal property,since itrelates tothenotion

of timeinfully asynchronousdistributed systems[1]) iscrucial.

ToseehowthisrelatestotheformalismofSection1,supposeourresource-sharingcomputation

consists of the exchange of fork-bearing messages only (that is, request messages and all other

messages involved in the computation, such as those used to update acyclic orientations, are

ignored). For distinctnodes i and j, and for an event v 0

occurring atnode i at the reception of

a fork, the set Pred (x)

j (v

0

) for x 1 is either empty or only contains events that correspondto

thesendingof forksbynodej. Now supposewesystematicallyinvestigate thesetsPred (x)

j (v

0

)for

every appropriatej by lettingx increase from 1through n 1. Supposealsothat, foreachj,we

record the rst x that is found such that Pred (x)

j (v

0

) contains the sending of a fork as response

to the reception of a request message without any wait for fork collection on the part of j. The

greatest such value to be recorded, say x

, has special signicance: it means that the eventual

receptionof a fork bynodeithroughv 0

is theresultofan x

-longchain ofwaits.

The matrix clock that we introduce in Section3 is capable of conveying this informationto

nodeisuccinctly,solongasthereexistsenoughredundancyinthesetsPred (x)

j (v

0

) thatattaching

only nX integers toforks suÆces, where X is a threshold inthe interval[1;n 2] beyondwhich

wait chains are consideredtoo long. It so happensthat such redundancy clearlyexists: theonly

eventsinPred (x)

j (v

0

) thatmatterarethosecorrespondingtothesendingofforkswithoutanywait

forfork collection. Detectinganyoneof themsuÆces, sowe mayaswellsettleforthelatest,that

is,onesingleeventfrom the wholeset. Wereturn tothisinSection4.

3. A simpler matrix clock

For x 1, the new matrix clock we introduce is an x n matrix. For node i at time t

i (i.e.,

following the occurrence of t

i

events at node i), it is denoted by C i (t i ). For 1 y x and 1 j n, component C i y;j (t i ) of C i (t i ) is dened as follows. If t i =0, thenC i y;j (t i ) =0, asfor

vectorclocksandthe matrixclocksof Section1. Ift

i 1, thenwe have C i y;j (t i )= ( t i ; ify=1andj =i; max i6=j16=6=jy=j n time jy pred j y :::pred j 1 event i (t i ) o ; ify>1orj 6=i. (3)

(6)

(1). Thus,therstrowofC i

(t

i

) isthevectorclockV i

(t

i

). Fory>1,thedenitionin(3)implies,

accordingtotheinterpretationof matrixclocksthat follows ourdenitionin(2), that

C i y;j (t i )=max time j (v) v2Pred (y) j (v 0 ) ; (4) where v 0 is thet i

thevent occurringat nodei. Whatthismeansis that,of allthe O(n y 1

) events

thatmayexistinPred (y)

j (v

0

),onlyone(theonetohave occurredlatestatnodej) makesittothe

matrix clock C i

(t

i

). Note also that (3) implies the equality in (4) for y = 1 as well, so long as

j6=i. In thiscase,Pred (y) j (v 0 ) isthe singleton pred j (v 0 ) .

Now,weknowfromthedenitionofPred (y)

j

that,fory>1,v2Pred (y)

j (v

0

)ifandonlyifthere

exists k6=isuchthat an event v occurs atnodek forwhich v=pred

k (v 0 ) andv 2Pred (y 1) j (v).

Thus,ifweassumethatt

k

=time

k

(v)andconsiderthematrixclocksresultingfromtheoccurrence

of v andof v 0 ,thatis,C k (t k ) andC i (t i

),respectively,thenit is aconsequenceof (4)that

C i y;j (t i )C k y 1;j (t k );

where the concomitant occurrence of j = k and y = 2 is automatically precluded. Letting

K i

(t

i

;y;j) be theset comprisingevery appropriatek,we then obtain

C i y;j (t i )= max k2K i (t i ;y;j) C k y 1;j (t k ); ifK i (t i ;y;j) 6=;; 0; ifK i (t i ;y;j) =; (5) fory>1.

By(5),wearethenleftwiththefollowingtworulesfortheevolutionofournewmatrixclocks:

Upon sending a message to node k, node i attaches C i

(t

i

) to the message, where t

i is

assumedtoalreadyaccount forthe messagethatis beingsent.

Upon receivinga message from nodek withattachedmatrixclockC k ,nodei setsC i 1;i (t i ) to t i

(which is assumed to already re ect the reception of the message) and C i y;j (t i ) to max C i y;j (t i 1);C k y 1;j

, for1<y x or j6=i, provided y >2 orj 6=k. Fory =2 and

j =k,C i

y;j (t

i

) retains thevalueof C i

y;j (t

i 1).

Accordingtotheserules,every messagecarriesan attachment ofnx integers.

4. Discussion and concluding remarks

We are then in position to return to the problem posed in Section 2, namely the problem of

monitoringexecutionsof the solutiontoDPP thatemploys apartial orderon G'sset of nodesto

establish priorities. As we discussed in that section, the goalis to allow nodes to detect locally,

uponreceivingafork,whetherthedeliveryofthatforkistheresultofachainofforkdeliveriesthat

startedtoofarbackinthepast. In theaÆrmative case,thewait sincethefork wasrequestedwill

have beentoolong, intermsof the usualnotionsof time complexityinasynchronousdistributed

(7)

Morespecically,ifv istheeventcorrespondingtothereceptionof aforkatnodei,thenthe

goalistobeabletodetecttheexistenceofaneventinPred (y)

j (v

0

)thatcorrespondstothesending

ofa forkbynodej eitherimmediatelyuponthereceptionof therequestforthatfork or,ifnodej

wasusingsharedresourceswhentherequestarrived,immediatelyuponnishing. Here1y X

andj isany node,provided y =1 andj =ido notoccur inconjunction. The valueof X issuch

that1Xn 2,andischosenasa thresholdtore ect themaximumallowablechainofwaits.

Asa consequence,the setsPred (y)

j (v

0

) mustincludefork-relatedeventsonly.

ThenewmatrixclocksintroducedinSection3canbeusedforthisdetectionwithonlyminimal

adaptation. The key ingredientsare:

The onlymessages sent by nodei to be taggedwith the matrix clockC i

are forks. If the

sending ofafork bynodeidoesnotdependon furtherforkcollectionbyi,thenC i

isreset

tozerobeforeitis attachedto thefork. Matrix clocks areXn, andallfurther handling

of themfollows therules giveninSection3.

Uponreceivingaforkwithattachedmatrixclock,andhavingupdatedC i

accordingly,node

i looks for componentsof C i

that containthe valuezero. Ifnone exists,then a wait chain

that is consideredtoo longhas beenfound.

Thisstrategyre ectsthegeneralmethodofSection3whenappliedtoeventsthatrelatetothe ow

offorksonly. Whenevertherequestforaforkisreceivedandtheforkcanbesentwithouttheneed

foranyforkstobereceivedbythenodeinquestion,zeroesgetinsertedintothe matrixclockand

are sent alongwith the fork. The sending of this fork mayunleash the sending of forks by other

nodes in a chain of events, and alongthe way those zeroes may get replacedby greaterintegers,

re ecting the increasinglength of thewait chain rootedat the originalnode. The reception of a

forkwhosematrixclockhasno zeroesisthen anindicationthatsuchreceptionispartofchainsof

lengthatleast X.

In summary, we have in this paper introduced a novel notion of matrix clocks. Similarlyto

thematrix clocks originallyintroducedin[14,16] andwhosedenitionappearsin(2),ourmatrix

clockhasthepotentialof re ectingcausaldependenciesinthe owof messagesthatstretchasfar

as depth x into the past. Unlike thoseoriginal matrix clocks, however, ours increases with x as

nx,whileintheoriginalcasethegrowthisaccordingtoan O(n x

) function,that is,exponentially.

Wehaveillustratedtheapplicabilityofthenewmatrixclockswithanexamplefromtheareaof

resource-sharingproblems. Whatwehavedemonstratedisameansofcollectinginformationlocally

during the resource-sharing computation so that exceedingly long global waits can be detected,

possiblyindicatingtheneed foroverallsystem re-structuring.

Acknowledgments

Theauthorsacknowledgepartialsupportfrom CNPq, CAPES,thePRONEX initiativeofBrazil's

(8)

1. V.C. Barbosa,An Introduction to Distributed Algorithms, The MITPress, Cambridge,MA,

1996.

2. V. C. Barbosa, An Atlas of Edge-Reversal Dynamics, Chapman & Hall/CRC, London, UK,

2000.

3. V. C. Barbosa,\Thecombinatorics of resource sharing," in R. Corr^eaet alii (Eds.), Models

forParallel and Distributed Computation: Theory, Algorithmic Techniques and Applications,

KluwerAcademic Publishers,Dordrecht,The Netherlands, toappear.

4. V. C. Barbosaand E. Gafni,\Concurrency in heavilyloaded neighborhood-constrained

sys-tems,"ACM Trans. on Programming Languages and Systems 11(1989),562{584.

5. K.M.ChandyandJ.Misra,\Thedrinkingphilosophersproblem,"ACM Trans. onProgr

am-ming Languages and Systems 6(1984),632{646.

6. E. W. Dijkstra, \Hierarchical ordering of sequential processes," Acta Informatica 1 (1971),

115{138.

7. L.M.A.DrummondandV.C.Barbosa,\Distributedbreakpointdetectioninmessge-passing

programs,"J. ofParallel and Distributed Computing 39 (1996),153{167.

8. L. J. Fidge, \Timestamp in message passing systemsthat preserves partial ordering," Proc.

ofthe 11thAustralian Computing Conference,56{66,1988.

9. V. K. Garg, Principles of Distributed Systems, Kluwer Academic Publishers, Boston, MA,

1996.

10. L.Lamport,\Time,clocks,andtheorderingofeventsinadistributedsystem,"Comm. ofthe

ACM 21 (1978),558{565.

11. N. A. Lynch, \Upper bounds for static resource allocation in a distributed system," J. of

Computer andSystem Sciences 23 (1981),254{278.

12. F. Mattern, \Virtual time and global states in distributed systems," in M. Cosnard et alii

(Eds.),Parallel and Distributed Algorithms: Proc. of the Int. Workshop on Parallel and

Dis-tributed Algorithms,215{226, North-Holland,Amsterdam, The Netherlands,1989.

13. M. Raynal, \Illustrating the use of vector clocks in property detection: an example and a

counter-example," in P. Amestoy et alii (Eds.), Euro-Par'99|Parallel Processing, 806{814,

LectureNotesinComputerScience1685,Springer-Verlag,Berlin,Germany,1999.

14. S.K.SarinandL.Lynch,\Discardingobsoleteinformationinareplicateddatabasesystem,"

IEEETrans. onSoftwareEngineering SE-13 (1987),39{46.

15. J.L.WelchandN.A.Lynch,\Amodulardrinkingphilosophersalgorithm,"Distributed

(9)

problems,"Proc.ofthe 3rdAnnualACMSymposiumonPrinciplesof DistributedComputing,