L ucia M. A. Drummond
UniversidadeFederalFluminense
InstitutodeComputac~ao
RuaPassoda Patria,156
24210-240Niteroi- RJ,Brazil
Valmir C.Barbosa
UniversidadeFederaldo Rio deJaneiro
ProgramadeEngenhariadeSistemaseComputac~ao,COPPE
CaixaPostal 68511
21945-970Rio deJaneiro- RJ, Brazil
Abstract
Matrixclocksareageneralizationofthenotionofvectorclocksthatallowsthelocalrepresentation
of causal precedenceto reach intoan asynchronousdistributed computation'spast with depth x,
where x 1 is an integer. Maintaining matrix clocks correctly in a system of n nodes requires
thateverymessagebeaccompaniedbyO(n x
) numbers, whichre ectsan exponentialdependency
of thecomplexityof matrixclocksupon thedesireddepthx. Weintroducea noveltype ofmatrix
clock,one thatrequiresonlynx numbers tobe attachedtoeachmessagewhilemaintainingwhat
for many applications may be the most signicant portion of the information that the original
matrixclockcarries. Inordertoillustratethenew clock'sapplicability,we demonstrateitsusein
themonitoringof certainresource-sharingcomputations.
Keywords: Vector clocks,matrix clocks.
1. Introduction and background
WeconsideranundirectedgraphGonnnodes. EachnodeinGstandsforacomputationalprocess
andundirectededgesinGrepresentthepossibilitiesforbidirectionalpoint-to-pointcommunication
between pairs of processes. A distributed computation carried out by the distributed system
representedbyG canbe viewedasa setof events occurringatthe various nodes. An event isthe
sendingof amessagebya nodetoanothernodetowhichitisdirectlyconnectedinG(aneighbor
particularcomputationathand, andare leftunspecied).
Thestandardframework foranalyzingsuchasystemisthepartialorder, oftendenotedby,
that formalizestheusual \happened-before"notionof distributedcomputing [1, 10]. Thispartial
order is the transitive closure of the more elementary relation to which the ordered pair (v;v 0
)
of events belongs if eitherv and v 0
are consecutive events at a same nodein G or v and v 0
are,
respectively,thesending andreceivingofa messagebetween neighborsinG.
Atnodei,we letlocaltimebe assessedbythenumbert
i
ofeventsthatalreadyoccurred. We
interchangeablyadopt thenotationst
i =time i (v) andv=event i (t i
)toindicatethat v isthet
i th
event occurringat nodei, provided t
i
1. Anotherimportant partialorder on the set of events
is therelationthat givesthepredecessorat nodej 6=iof an event v 0
occurringatnodei. We say
thatan event vis sucha predecessor,denotedbyv=pred
j (v
0
),ifv istheevent occurringatnode
j such that v v 0
for which time
j
(v) is greatest. If no such v exists, then pred
j (v 0 ) is undened andtime j pred j (v 0 ) is assumedtobezero.
Therelationpred
j
allowsthedenitionofvectorclocks[8,9,12],asfollows. Thevectorclock
of nodeiattime t
i
(thatis,followingtheoccurrenceof t
i
eventsatnodei),denotedbyV i
(t
i ),is
a vectorwhose jthcomponent,for1jn, iseitherequalto0 (if t
i =0)orgivenby V i j (t i )= ( t i ; ifj=i; time j pred j event i (t i ) ; ifj6=i (1) (if t i 1). In otherwords, V i j (t i
) is eitherthe current time at nodei, ifj =i, or is the time at
nodej that resultsfrom theoccurrence atthat nodeof the predecessorof the t
i
thevent of node
i, otherwise. If no such event exists(i.e., t
i =0), thenV i j (t i )=0.
Vectorclocksevolvefollowingtwo simplerules:
Upon sending a message to node k, node i attaches V i
(t
i
) to the message, where t
i is
assumedtoalreadyaccount forthe messagethatis beingsent.
Upon receiving a message from node k with attached vector clock V k , node i sets V i i (t i ) to t i
(which is assumed to already re ect the reception of the message) and V i j (t i ) to max V i j (t i 1);V k j ,forj6=i.
It is a simplemattertoverifythat theserules do indeedmaintainvectorclocksconsistentlywith
theirdenition[8, 9,12]. Under theserulesorvariationsthereof,vectorclockshave provenuseful
ina varietyof distributedalgorithmstodetectsometypes ofglobal predicates[7, 9].
One naturalgeneralizationof the notion of vector clocks is the notion of matrix clocks [14,
16]. Foran integerx1,thex-dimensionalmatrix clockof nodei attimet
i ,denotedbyM i (t i ), has O(n x ) components. For 1 j 1 ;:::;j x n, component M i j1;:::;jx (t i ) of M i (t i ) is only dened for i=j 1 == j x andfori6=j 1 6=6= j x . Asin the denitionof V i (t i ), M i j1;:::;jx (t i )=0 if t i =0. Fort i
1,on the otherhand,wehave
M i j 1 ;:::;j x (t i )= ( t i ; ifi=j 1 ==j x ; time jx pred j x :::pred j 1 event i (t i ) ; ifi6=j 1 6=6=j x , (2)
1 x 1 i
node i, then the predecessor at node j
2
of that predecessor, and so on through node j
x
, whose
local time after the occurrence of the last predecessorin the chain is assignedto M i j 1 ;:::;j x (t i ). It
is straightforwardto see that, for x= 1, thisdenition is equivalentto the denition of a vector
clockin(1). Similarly,themaintenanceof matrix clocks followsrules entirely analogoustothose
used tomaintainvectorclocks[9].
Whilethejthcomponentofthevectorclockfollowingtheoccurrenceofeventv 0
atnodei6=j
gives the time resulting atnode j from the occurrence of pred
j (v
0
), the analogous interpretation
that exists for matrix clocks requires the introduction of additional notation. Specically, the
denition of a set of events encompassing the possible x-fold compositions of the relation pred
j
with itself, denoted by Pred (x)
j
, is necessary. If an event v occurs at node j, then we say that
v2Pred (x) j (v 0 ) foran event v 0
occurringat nodeiifone ofthe followingholds:
x=1,j6=i,andv=pred
j (v
0
).
x>1andthereexists k6=isuch that anevent v occursatnodek forwhichv=pred
k (v 0 ) andv2Pred (x 1) j (v).
Note that this denition requires j 6= k, in addition to k 6= i, for v 2 Pred (x) j (v 0 ) to hold when x=2. For x > 1 and j 1 6= 6= j x
= j, this denition allows the following interpretation of
entry j
1 ;:::;j
x
of the matrix clock that follows the occurrence of event v 0
at node i, that is, of
M i j 1 ;:::;j x 1 ;j time i (v 0 )
. It givesthetimeresultingatnodej from theoccurrenceofan event that
isintheset Pred (x)
j (v
0
). Ofcourse,thenumberof possibilitiesforsuchan event isO(n x 1
)inthe
worstcase, owingtotheseveralpossiblecombinationsof j
1 ;:::;j
x 1 .
Interestingly, applications that do require the full capabilities of matrix clocks may be still
unknown. Infact,itseemsasimplemattertoarguethataslightlymoresophisticateduseofvector
clocksor simply the useof two-dimensional matrixclocks suÆcesto tackle someof the problems
that have been oered as possible applicationsof multidimensional matrix clocks [9, 13]. What
we do in thispaper is todemonstrate howthe distributed monitoringof certain resource-sharing
computationscanbenetfromtheuseofmatrixclocksandthat,atleastforsuchcomputations,it
ispossibletoemploymuchlesscomplexmatrixclocks(thatis,matrixclockswithmanyfewer
com-ponents),whichnonethelessretainthe abilitytoreachintothe computation'spast with arbitrary
depth.
The key to this reduction in complexity is the use of one single event in place of the set
Pred (x)
j (v
0
). As we argueinSection3, followinga briefdiscussion of theresource-sharingcontext
inSection2, thissimplicationleadstomatrix clocks of sizenx, thereforeconsiderablyless
com-plex than the O(n x
)-size originalmatrix clocks. Concluding remarks follow in Section4, after a
Theresource-sharingcomputationweconsiderisoneof theclassicalsolutionstotheparadigmatic
DiningPhilosophersProblem(DPP)[6]ingeneralizedform. Inthiscase,G'sedgesetisconstructed
from a given set of resources and from subsets of that set, one for each node, indicating which
resourcescanbe everneededbythat node. Thisconstructionplacesan edgebetweennodes iand
jifthesetsofresourcesevertobeneededbyiandjhaveanonemptyintersection. Noticethatthis
constructionisconsonantwiththeinterpretationofedgesasbidirectionalcommunicationchannels,
becausewhatit doesis todeploy edgesbetweenevery pairof nodesthat mayever competefor a
sameresourceandmust thereforebeabletocommunicatewith each othertoresolve con icts.
InDPP,thecomputationcarriedoutbyanodemakes itcycleendlesslythroughthreestates,
which are identiedwith the conditionsof being idle, beingin the process of acquiringexclusive
accesstotheresourcesitneeds,andusingthoseresourcesfora niteperiodof time. Whileinthe
idlestate,the nodestartsacquiringexclusiveaccesstoresourceswhentheneedarisestocompute
on sharedresources. It isarequirementof DPPthatthenodemustacquireexclusive accesstoall
the resourcesit shares with allits neighbors,so it suÆcesfor the nodeto acquirea token object
it shares with each of its neighbors (the \fork," as it is called), each object representing all the
resourcesit shares with that particular neighbor. When in possession of all forks, the node may
startusingthe sharedresources[1, 5].
The process of collecting forks from neighbors follows a protocol based on the sending of
requestmessages bythenodethatneedsthe forksandthesending ofthe forksthemselvesbythe
nodesthathavethem. Morethanoneprotocolexists,eachimplementingadierentruletoensure
the absence of deadlocks andlockouts during the computation. The solution we consider inthis
sectionisbased on relative prioritiesassignedto nodes. Another prominent solutionis alsobased
on theassignment ofrelativepriorities,buttothe resourcesinsteadof tothenodes[11,15].
The priority scheme of interest to us is based on the graph-theoretic concept of an acyclic
orientationofG,whichisanassignmentofdirectionstotheedgesofGinsuchawaythatdirected
cycles are not formed. Such an acyclic orientation is then a partial order of the nodes of G,
and is as such suitable to indicate,given a pair of neighbors, which of the two has priorityover
the other. Most of the details of this priorityschemeare not relevant to our present discussion,
but before stating its essentials we do mention that the lockout-freedom requirement leads the
acyclic orientation to be changed dynamically (so that relative prioritiesare never xed), which
in turn leads to a rich dynamics in the set of all the acyclic orientations of G and to important
concurrency-relatedissues[2{4].
Onceapriorityschemeisavailableovertheset ofnodes,whatmatterstousishowit isused
inthefork-collectingprotocol. Whenarequestforforkarrivesatnodej fromnodei,jsendsithe
fork theyshareifj iseitheridleorisalsocollectingforks butdoesnothave priorityoveri. Ifj is
alsocollecting forks and haspriorityover i, orif j is usingshared resources, then the sending of
the fork toi ispostponedto untilj has nishedusing theshared resources. Note that two types
andultimatelytosenditheforkisintheworstcasen 1[1,3,5]. Thereasonforthisissimple: j
iswaitingfor afork from anothernode,whichmayinturnbe waitingforafork from yetanother
node,andsoon. Becausethepriorityschemeisbuilton acyclic orientationsofG, such achain of
waitsdoes necessarilyendandisn 1 nodeslong intheworstcase.
Whethersuchlongwaitsoccurornotisofcoursedependentuponthedetailsofeachexecution
oftheresource-sharingcomputation. Butiftheydooccur,onepossibilityforreducingtheaverage
wait is to increasethe availabilityof certaincritical resourcesso that G becomesless dense[1, 3,
4]. Perhaps another possibility would be to ne-tune some of the characteristics of each node's
participationin the overall computation,such as the durationof its idle period,for example. In
anyevent,the abilitytolocallydetect longwaits(aglobal property,since itrelates tothenotion
of timeinfully asynchronousdistributed systems[1]) iscrucial.
ToseehowthisrelatestotheformalismofSection1,supposeourresource-sharingcomputation
consists of the exchange of fork-bearing messages only (that is, request messages and all other
messages involved in the computation, such as those used to update acyclic orientations, are
ignored). For distinctnodes i and j, and for an event v 0
occurring atnode i at the reception of
a fork, the set Pred (x)
j (v
0
) for x 1 is either empty or only contains events that correspondto
thesendingof forksbynodej. Now supposewesystematicallyinvestigate thesetsPred (x)
j (v
0
)for
every appropriatej by lettingx increase from 1through n 1. Supposealsothat, foreachj,we
record the rst x that is found such that Pred (x)
j (v
0
) contains the sending of a fork as response
to the reception of a request message without any wait for fork collection on the part of j. The
greatest such value to be recorded, say x
, has special signicance: it means that the eventual
receptionof a fork bynodeithroughv 0
is theresultofan x
-longchain ofwaits.
The matrix clock that we introduce in Section3 is capable of conveying this informationto
nodeisuccinctly,solongasthereexistsenoughredundancyinthesetsPred (x)
j (v
0
) thatattaching
only nX integers toforks suÆces, where X is a threshold inthe interval[1;n 2] beyondwhich
wait chains are consideredtoo long. It so happensthat such redundancy clearlyexists: theonly
eventsinPred (x)
j (v
0
) thatmatterarethosecorrespondingtothesendingofforkswithoutanywait
forfork collection. Detectinganyoneof themsuÆces, sowe mayaswellsettleforthelatest,that
is,onesingleeventfrom the wholeset. Wereturn tothisinSection4.
3. A simpler matrix clock
For x 1, the new matrix clock we introduce is an x n matrix. For node i at time t
i (i.e.,
following the occurrence of t
i
events at node i), it is denoted by C i (t i ). For 1 y x and 1 j n, component C i y;j (t i ) of C i (t i ) is dened as follows. If t i =0, thenC i y;j (t i ) =0, asfor
vectorclocksandthe matrixclocksof Section1. Ift
i 1, thenwe have C i y;j (t i )= ( t i ; ify=1andj =i; max i6=j16=6=jy=j n time jy pred j y :::pred j 1 event i (t i ) o ; ify>1orj 6=i. (3)
(1). Thus,therstrowofC i
(t
i
) isthevectorclockV i
(t
i
). Fory>1,thedenitionin(3)implies,
accordingtotheinterpretationof matrixclocksthat follows ourdenitionin(2), that
C i y;j (t i )=max time j (v) v2Pred (y) j (v 0 ) ; (4) where v 0 is thet i
thevent occurringat nodei. Whatthismeansis that,of allthe O(n y 1
) events
thatmayexistinPred (y)
j (v
0
),onlyone(theonetohave occurredlatestatnodej) makesittothe
matrix clock C i
(t
i
). Note also that (3) implies the equality in (4) for y = 1 as well, so long as
j6=i. In thiscase,Pred (y) j (v 0 ) isthe singleton pred j (v 0 ) .
Now,weknowfromthedenitionofPred (y)
j
that,fory>1,v2Pred (y)
j (v
0
)ifandonlyifthere
exists k6=isuchthat an event v occurs atnodek forwhich v=pred
k (v 0 ) andv 2Pred (y 1) j (v).
Thus,ifweassumethatt
k
=time
k
(v)andconsiderthematrixclocksresultingfromtheoccurrence
of v andof v 0 ,thatis,C k (t k ) andC i (t i
),respectively,thenit is aconsequenceof (4)that
C i y;j (t i )C k y 1;j (t k );
where the concomitant occurrence of j = k and y = 2 is automatically precluded. Letting
K i
(t
i
;y;j) be theset comprisingevery appropriatek,we then obtain
C i y;j (t i )= max k2K i (t i ;y;j) C k y 1;j (t k ); ifK i (t i ;y;j) 6=;; 0; ifK i (t i ;y;j) =; (5) fory>1.
By(5),wearethenleftwiththefollowingtworulesfortheevolutionofournewmatrixclocks:
Upon sending a message to node k, node i attaches C i
(t
i
) to the message, where t
i is
assumedtoalreadyaccount forthe messagethatis beingsent.
Upon receivinga message from nodek withattachedmatrixclockC k ,nodei setsC i 1;i (t i ) to t i
(which is assumed to already re ect the reception of the message) and C i y;j (t i ) to max C i y;j (t i 1);C k y 1;j
, for1<y x or j6=i, provided y >2 orj 6=k. Fory =2 and
j =k,C i
y;j (t
i
) retains thevalueof C i
y;j (t
i 1).
Accordingtotheserules,every messagecarriesan attachment ofnx integers.
4. Discussion and concluding remarks
We are then in position to return to the problem posed in Section 2, namely the problem of
monitoringexecutionsof the solutiontoDPP thatemploys apartial orderon G'sset of nodesto
establish priorities. As we discussed in that section, the goalis to allow nodes to detect locally,
uponreceivingafork,whetherthedeliveryofthatforkistheresultofachainofforkdeliveriesthat
startedtoofarbackinthepast. In theaÆrmative case,thewait sincethefork wasrequestedwill
have beentoolong, intermsof the usualnotionsof time complexityinasynchronousdistributed
Morespecically,ifv istheeventcorrespondingtothereceptionof aforkatnodei,thenthe
goalistobeabletodetecttheexistenceofaneventinPred (y)
j (v
0
)thatcorrespondstothesending
ofa forkbynodej eitherimmediatelyuponthereceptionof therequestforthatfork or,ifnodej
wasusingsharedresourceswhentherequestarrived,immediatelyuponnishing. Here1y X
andj isany node,provided y =1 andj =ido notoccur inconjunction. The valueof X issuch
that1Xn 2,andischosenasa thresholdtore ect themaximumallowablechainofwaits.
Asa consequence,the setsPred (y)
j (v
0
) mustincludefork-relatedeventsonly.
ThenewmatrixclocksintroducedinSection3canbeusedforthisdetectionwithonlyminimal
adaptation. The key ingredientsare:
The onlymessages sent by nodei to be taggedwith the matrix clockC i
are forks. If the
sending ofafork bynodeidoesnotdependon furtherforkcollectionbyi,thenC i
isreset
tozerobeforeitis attachedto thefork. Matrix clocks areXn, andallfurther handling
of themfollows therules giveninSection3.
Uponreceivingaforkwithattachedmatrixclock,andhavingupdatedC i
accordingly,node
i looks for componentsof C i
that containthe valuezero. Ifnone exists,then a wait chain
that is consideredtoo longhas beenfound.
Thisstrategyre ectsthegeneralmethodofSection3whenappliedtoeventsthatrelatetothe ow
offorksonly. Whenevertherequestforaforkisreceivedandtheforkcanbesentwithouttheneed
foranyforkstobereceivedbythenodeinquestion,zeroesgetinsertedintothe matrixclockand
are sent alongwith the fork. The sending of this fork mayunleash the sending of forks by other
nodes in a chain of events, and alongthe way those zeroes may get replacedby greaterintegers,
re ecting the increasinglength of thewait chain rootedat the originalnode. The reception of a
forkwhosematrixclockhasno zeroesisthen anindicationthatsuchreceptionispartofchainsof
lengthatleast X.
In summary, we have in this paper introduced a novel notion of matrix clocks. Similarlyto
thematrix clocks originallyintroducedin[14,16] andwhosedenitionappearsin(2),ourmatrix
clockhasthepotentialof re ectingcausaldependenciesinthe owof messagesthatstretchasfar
as depth x into the past. Unlike thoseoriginal matrix clocks, however, ours increases with x as
nx,whileintheoriginalcasethegrowthisaccordingtoan O(n x
) function,that is,exponentially.
Wehaveillustratedtheapplicabilityofthenewmatrixclockswithanexamplefromtheareaof
resource-sharingproblems. Whatwehavedemonstratedisameansofcollectinginformationlocally
during the resource-sharing computation so that exceedingly long global waits can be detected,
possiblyindicatingtheneed foroverallsystem re-structuring.
Acknowledgments
Theauthorsacknowledgepartialsupportfrom CNPq, CAPES,thePRONEX initiativeofBrazil's
1. V.C. Barbosa,An Introduction to Distributed Algorithms, The MITPress, Cambridge,MA,
1996.
2. V. C. Barbosa, An Atlas of Edge-Reversal Dynamics, Chapman & Hall/CRC, London, UK,
2000.
3. V. C. Barbosa,\Thecombinatorics of resource sharing," in R. Corr^eaet alii (Eds.), Models
forParallel and Distributed Computation: Theory, Algorithmic Techniques and Applications,
KluwerAcademic Publishers,Dordrecht,The Netherlands, toappear.
4. V. C. Barbosaand E. Gafni,\Concurrency in heavilyloaded neighborhood-constrained
sys-tems,"ACM Trans. on Programming Languages and Systems 11(1989),562{584.
5. K.M.ChandyandJ.Misra,\Thedrinkingphilosophersproblem,"ACM Trans. onProgr
am-ming Languages and Systems 6(1984),632{646.
6. E. W. Dijkstra, \Hierarchical ordering of sequential processes," Acta Informatica 1 (1971),
115{138.
7. L.M.A.DrummondandV.C.Barbosa,\Distributedbreakpointdetectioninmessge-passing
programs,"J. ofParallel and Distributed Computing 39 (1996),153{167.
8. L. J. Fidge, \Timestamp in message passing systemsthat preserves partial ordering," Proc.
ofthe 11thAustralian Computing Conference,56{66,1988.
9. V. K. Garg, Principles of Distributed Systems, Kluwer Academic Publishers, Boston, MA,
1996.
10. L.Lamport,\Time,clocks,andtheorderingofeventsinadistributedsystem,"Comm. ofthe
ACM 21 (1978),558{565.
11. N. A. Lynch, \Upper bounds for static resource allocation in a distributed system," J. of
Computer andSystem Sciences 23 (1981),254{278.
12. F. Mattern, \Virtual time and global states in distributed systems," in M. Cosnard et alii
(Eds.),Parallel and Distributed Algorithms: Proc. of the Int. Workshop on Parallel and
Dis-tributed Algorithms,215{226, North-Holland,Amsterdam, The Netherlands,1989.
13. M. Raynal, \Illustrating the use of vector clocks in property detection: an example and a
counter-example," in P. Amestoy et alii (Eds.), Euro-Par'99|Parallel Processing, 806{814,
LectureNotesinComputerScience1685,Springer-Verlag,Berlin,Germany,1999.
14. S.K.SarinandL.Lynch,\Discardingobsoleteinformationinareplicateddatabasesystem,"
IEEETrans. onSoftwareEngineering SE-13 (1987),39{46.
15. J.L.WelchandN.A.Lynch,\Amodulardrinkingphilosophersalgorithm,"Distributed
problems,"Proc.ofthe 3rdAnnualACMSymposiumonPrinciplesof DistributedComputing,