Tarjan’s reducibility test algorithm

Estimation of gas costs

4.2 Loop analysis

4.2.1 Tarjan’s reducibility test algorithm

Tarjan’s algorithm is described in pseudocode in Algorithm 2. N D(v) is the number of descendants ofv inT. In the description of the algorithm and subsequent analysis we will drop the references to the data structure in the functions makeset,f ind, and unionsince that will always be clear. Initially, the data structure is empty; during the first for loop of Algorithm 2 the structure S

v∈V

({v}, v) is built; after that it is only modified by the functionsf indandunionand it is assumed that they operate over the current structure in each call.

In the original description of the algorithm, Tarjan also considerstree edges in line 14. Tree edges are the edges of Gthat are also edges of T. Since these can also be classified as forward edges, we do not distinguish between them.

Proof of correction

We will now prove that this algorithm is correct. We follow the structure of the paper where the algorithm was presented [17] and write the proofs in more detail. In order to do that, we will need some auxiliary results. The following lemma allows us to check if there is a path between two vertices in T using the functionN D:

Lemma 2. For every v, w ∈ G, there exists a path v → w in T if and only if ord(v) ≤ ord(w) ≤ ord(v) +N D(v).

Proof. We start by noticing that the result holds trivially if|V|= 1. Assume now that|V|>1. Suppose that ord(v)≤ord(w)≤ord(v) +N D(v). Then wis first visited after the first visit to v. Between the

Algorithm 2 Tarjan’s algorithm

Input: control flow graphG= (V, E) such that|V|=n

1: construct a DFST ofG, obtaining the preorder number ord(v) of each vertex v, from 1 ton, and calculatingN D(v) for each vertexv

2: forvsuch thatord(v) = 1 until ord(v) =ndo

3: make lists of cycle edges, forward edges and cross edges that enterv 4: makeset(v)

5: end for

6: forwsuch thatord(w) =nstep−1 untilord(w) = 1do

7: P =∅

8: foreach cycle edge (u, w)do 9: addf ind(u) toP

10: end for

11: Q=P

12: whileQ6=∅ do

13: select a vertexxfromQand delete it fromQ 14: foreach forward edge or cross edge (y, x)do 15: y⁰=f ind(y)

16: if ord(w)> ord(y⁰) or ord(w) +N D(w)≤ord(y⁰)then 17: return false

18: end if

19: if y⁰ 6∈P andy⁰ 6=wthen 20: addy⁰ to P and to Q

21: end if

22: end for 23: end while 24: forx∈P do 25: union(x, w) 26: end for 27: end for 28: return true

first and the last visits to v, only descendants of v were visited, and every descendant ofv was visited.

This means that the value of count when v is pushed to the stack for the last time by Algorithm 1 is ord(v) +N D(v). Then,wis in the subtree rooted atv, which means that there is a path from vtowin T.

Suppose now that there exists v →win T. Then wis a descendant of v and, so,w belongs to the subtree rooted at v, which means that w will be visited between the first and the last times that v is pushed to the stack. The value ofcountwhenvis pushed to the stack for the last time isord(v) +N D(v).

Therefore,ord(v)≤ord(w)≤ord(v) +N D(v).

The following results, that lead to Theorem 1, are attributed to Hecht and Ullman and until that point we follow the proofs in [9], after which we return to Tarjan’s proof. We should note that the definition of reducibility used by Hecht and Ullman also considers the transformation T1: If (v, v) is an edge ofG, remove it from G. In this case, a graph is reducible if the successive application of T1 and T2, in any order, results in a graph with a single vertex. We will consider that, if Ghas edges of the form (v, v), they are removed before any other operation. Edges of the form (v, v) that are added to the graph by application ofT₂are ignored by Tarjan’s algorithm when the vertices ofP are merged intow. Therefore, the definitions can be seen as equivalent and the results obtained in both papers are compatible.

An important characterization of reducibility is given by a particular type of irreducible graph. The family of graphs represented in Figure 4.2 is known in the literature as (*). eis the entry node and can be the same node asa. The wavy arrows represent disjoint paths – paths that do not share nodes except possibly the first or the last. Formally, two paths u1, . . . , uk, v1, . . . , vl aredisjoint ifui 6=vj for every (i, j)∈ {1, . . . , k} × {1, . . . , l} \ {(1,1),(1, l),(k,1),(k, l)}.

b ++

kk c

Figure 4.2: Irreducible graph (*)

Proposition 1. Gis irreducible if and only if it contains a graph of the family (*) as a subgraph.

Proof. (⇐) Assume that Ghas a subgraph of the family (*). The proof is by induction on the number of vertices, n, which is at least 3.

Base: n= 3. ThenGis the graph shown in Figure 4.1a (wheree=a), or the same graph with the edge (1,2), or (2,3), or both. No transformation can be applied to any of those graphs, so they are all irreducible.

Induction hypothesis: IfGis a graph withn−1 vertices, wheren≥4, with (*) as a subgraph, then Gis irreducible.

Step: LetGbe a graph withn≥4 vertices containing (*) as a subgraph. Suppose thatGis reducible.

Then there is a sequence of T1, T2 transformations that result in a graph with 1 vertex. At some point, the transformed graphG⁰ hasn−1 vertices. G⁰ still contains (*) as a subgraph: the final edges of the paths a →b, a → c, b →c, c → b cannot be contracted by T2. Then, by induction hypothesis, G⁰ is irreducible, which is a contradiction.

(⇒) Suppose now that Gis irreducible. The proof is, again, by induction on the number of vertices n.

Base: n= 3. Let ebe the entry node and let a, b be the other two vertices ofG. e does not have entering edges and there exist pathse→a,e→b. It is easy to verify that, if every vertex has at most one entering edge, the graph is reducible. Ifaandb have two entering edges each, then it is not possible to apply T2 and G is irreducible. (*) is obviously a subgraph of this graph. If one vertex, a, has two entering edges and the other, b, has only one, we can applyT2 and b is collapsed into aor into e. The resulting graph is always reducible.

Induction hypothesis: Ifn≥4 and the irreducible graphGhask < nvertices, then (*) is a subgraph ofG.

Step: LetGbe an irreducible graph with n≥4 vertices. IfT2 can be applied to a vertex ofG, the resulting graphG⁰is an irreducible graph withn−1 vertices, so by induction hypothesis (*) is a subgraph ofG⁰and, so, ofG. Assume now that it is not possible to applyT2 to any vertex ofG. LetT be a DFST of G andy the rightmost child of the entry node ein T. Since T2 cannot be applied, y has two edges entering it in G: one from e and other from a vertex x6=e. (x, y) is not a forward edge: x6→y in T sincey is a child of the root. By contradiction, if (x, y) is a cross edge, ord(y)< ord(x). Consequently, sincey is the rightmost child of the root,xis a descendant ofy, and thus there isy→xin T. Yet this cannot hold by definition of cross edge. Then (x, y) is a cycle edge, and so there isy→xinT.

Consider the subgraphH of Ginduced by the vertices x, y, and all verticesz such that there exists z →xin Gthat does not containy. Ife∈H, theny does not dominatex. There exist an edge (e, y), a path e→xwithouty, an edge (x, y) and a path y→x. Noticing that the paths are disjoint and that the edges ofT are contained in the edges ofG, we conclude thatGhas a subgraph of the family (*).

Ife /∈H, theny dominatesx. Furthermore, it dominates every vertex of H. Letz ∈H\ {x}. If y did not dominate z, there would exist e→z not containing y and, so,e→xnot containingy, which is a contradiction. Then H is a directed graph with entry nodey andk < n vertices and so by induction hypothesis it contains (*), soGalso contains (*).

This property, although useful to prove other results about reducible flow graphs, does not provide a direct algorithm to find out if a graph is reducible, since determining whether a graph has a given subgraph is an NP-complete problem.

Theorem 1. Gis reducible if and only if for everyv, w∈V such that(v, w)is a cycle edge,wdominates v in G.

Proof. (⇐) Assume thatG is reducible and that (v, w) is a cycle edge ofG. Ifv=wor if w=e, then w dominatesv. Let us now assume thatv 6=wandw6=e. Suppose thatwdoes not dominatev. Then

there is a pathe→vthat does not containw. Letpbe this path, and letqbe a pathe→wandrbe the pathw→vthat exists because (v, w) is a cycle edge. We can assume thatqandrdo not share edges.

Letp∩qbe the set of vertices that are both inpand inq. This set is not empty: e∈p∩q. Letabe the vertex of p∩q which is “closest” to w: there existsa→w containingk vertices and, if a⁰ ∈p∩q, then all pathsa⁰→whave kor more vertices. Similarly, letp∩rbe the set of vertices that are both in pand inrand letbbe the vertex ofp∩rthat is “closest” tow: there existsw→bcontainingk⁰ vertices and, if b⁰ ∈p∩r, then all pathsw→b⁰ have k⁰ or more vertices. Thena,b, andwcorrespond to a, b, andc of the irreducible graph (*) shown in Figure 4.2. a→b anda→w are disjoint by definition ofa.

a→b andw→b are disjoint by definition ofb. Then all paths are disjoint and (*) is a subgraph ofG, so Gis irreducible; contradiction.

(⇒) Assume that G is irreducible. Then by Proposition 1 G has a graph of the family (*) as a subgraph. Lete, a,b andc be as in Figure 4.2, and let dbe the vertex such that (d, b) is the last edge of the path c→b. There exists a DFSTT ofGsuch that the paths a→b,b→c, andc→dare inT.

(d, b) is a cycle edge since there isb →d in T. b does not dominate d: the concatenation of the paths e→a, a→c, andc→dyields a path from the entry point todthat does not contain b.

Lemma 3. Gis reducible if and only if for all wand for allv∈P(w), there existsw→v in T. Proof. (⇐) Assume that Gis not reducible. Then by Theorem 1 there exists a cycle edge (v, w) in G such thatwdoes not dominatev, which means that there is a pathe→v inGthat does not containw.

Since (v, w) is a cycle edge,v∈C(w). By definition ofP(w), we havee, w∈P(w), and by definition of root vertex, we havew6→ein T.

(⇒) Suppose now that there existswand v∈P(w) such that there is no path fromwto v inT. So there is a path e→v in T that does not containw. Since v ∈P(w), there existsz ∈ C(w) such that there is v →z in T that does not contain w. Combining both paths, there existse→z in T, and, so, also in G, that does not containw, which means thatwdoes not dominatez, but (z, w) is a cycle edge, which is a contradiction with Theorem 1.

Consider the set of vertices with entering cycle edges. In the following three lemmas, w represents the element of that set with the highest value oford(w). Suppose that, for allv∈P(w), there is a path w → v in T. This condition means that there are no jumps to the middle of the loop headed at w.

Consider the graphG⁰ obtained fromGby collapsing all vertices ofP(w) into w.

Lemma 4. Every edge (v⁰, w⁰)of the graphG⁰ corresponds to an edge(v, w⁰)of the graphGwhere v is such that there exists v⁰→v in T.

Proof. If (x⁰, y⁰) is an edge ofG⁰, either (x⁰, y⁰) was an edge ofG, in which case the result trivially holds, or it is the result of a contraction of a vertex of P(w) intow, so eitherx⁰=wory⁰ =w.

Ify⁰ =w, then the edge (x⁰, w) of G⁰ is the result of the contraction of (x, y) of Gwhere y∈P(w), so there exists z∈C(w) such that there is y →z that does not containw. Then x∈P(w) combining (x, y) withy→z, soxwas contracted towandx⁰=w. (x⁰, y⁰) = (w, w) is not a proper edge ofG⁰.

Ifx⁰=w, then the edge (w, y⁰) ofG⁰ is the result of the contraction of (x, y) ofGwherey /∈P(w): if y ∈P(w) theny⁰ =wand the edge (w, y⁰) would not be considered. Soy⁰ =y. Sincexwas contracted to w,x∈P(w) and, by the previous assumption, we havew→xinT.

LetT⁰ be the subgraph of G⁰ whose vertices are those ofG⁰ and whose edges are corresponding to the edges ofT.

Lemma 5. (T⁰,e) is a DFST of G⁰. Cycle edges of G⁰ correspond to cycle edges of G, forward edges of G⁰ correspond to forward edges or cross edges ofG, and cross edges ofG⁰ correspond to cross edges ofG.

Proof. We start by noticing that e was not collapsed since it has no ingoing edges. T⁰ is a tree: the correspondence between edges ensures that there is one and only one path frometo any vertexv6=eof G⁰.

Let (v⁰, w⁰) be a cycle edge of G⁰. Then there isw⁰ →v⁰ in T⁰. SinceT⁰ ⊆T, it is also a path of T.

By Lemma 4, (v⁰, w⁰) corresponds to an edge (v, w⁰) ofGsuch that there isv⁰ →vinT. Combining both paths, we conclude that there exists a pathw⁰→v in T, so (v, w⁰) is a cycle edge.

Let (v⁰, w⁰) be a forward edge ofG⁰: v⁰→w⁰ is in T⁰. By Lemma 4 this edge corresponds to an edge (v, w⁰) ofG⁰ such thatv⁰→v is inT. We want to show that (v, w⁰) is a forward edge or a cross edge or, equivalently, that it is not a cycle edge. Assume by contradiction that it is: there exists a pathw⁰→vin T. Then we can combine these paths: v⁰→w⁰→v. Ifv⁰=vthis situation is impossible inT. Ifv⁰6=v, the edge (v, w⁰) was replaced by (v⁰, w⁰) during the construction ofG⁰, which means thatv⁰=w, by the proof of the previous lemma. Thereforew⁰ is a vertex with an entering cycle edge that is visited after w in the traversal of T, so ord(w)< ord(w⁰), which contradicts the definition ofw.

Let (v⁰, w⁰) be a cross edge ofG⁰: w⁰ 6→v⁰ in T⁰,v⁰ 6→w⁰ in T⁰ andw⁰ < v⁰. Letv be the source of the edge given by Lemma 4 and recall thatv⁰ →vis inT. Suppose that there existsv→w⁰ inT. Then there is alsov⁰ →w⁰ inT, which is a contradiction. Suppose now that there existsw⁰ →v in T. Then, sincev has indegree 1, either w⁰ is an ancestor ofv⁰ or v⁰ is an ancestor ofw⁰, that is, eitherw⁰ →v⁰ is in T orv⁰→w⁰ is in T, but both options are impossible.

Lemma 6. For any vertexxsuch thatord(x)< ord(w), letP⁰(x)andC⁰(x)be defined in G⁰ relative to T⁰ asP(x)andC(x)were defined inG relative toT. Then x→y inT⁰ for ally∈P⁰(x)if and only if x→y in T for ally∈P(x).

Proof. (⇐) Assume by contrapositive that there exist x, y∈ V such that y ∈ P⁰(x) and x6→y in T⁰. Then it also holds thatx6→y inT. Since y ∈P⁰(x), there existsz⁰ ∈C⁰(x) such that there is y →z⁰ in G⁰ not containingx. By Lemma 4 and Lemma 5, the cycle edge (z⁰, x) of G⁰ corresponds to a cycle edge (z, x) of G. Moreover, there exists a pathy →z in Gcontaining edges from the path y→z⁰ and possibly some deleted edges with vertices in P(w)∪ {w}. We are assuming that there isw→vin T for every v∈P(w), so by Lemma 2 this path cannot containxsinceord(x)< ord(w). So y∈P(x).

(⇒) Assume again by contrapositive that there exist x, y∈V such thaty∈P(x) andx6→y inT.

• Ify6∈P(w), theny∈P⁰(x) andx6→y in T⁰;

• Ify∈P(w), then it was collapsed towand, recalling the correspondence between the edges before and after the transformation of the graph,x6→winT⁰. Sincey∈P(x),w∈P⁰(x).

Let us now return to the algorithm. Every vertexvof Gis analysed once inside the for loop in lines 6–27. Notice that if v is not the target of a cycle edge, the algorithm skipsv and starts analysing the next vertex, sinceQis empty when the while loop starts at line 12. The proof of this proposition is only sketched in [17]; herein we present the entire proof.

Ifwis the vertex of Gwith entering cycle edges with the highest value oford, then a contraction of Gis the operation that consists of collapsing all vertices ofP(w) into w. LetG⁽⁰⁾ =GandG^(k)be the result of applying a contraction toG^(k−1). Also, letT^(k)the DFST of G^(k)and P^(k)(w), for any vertex w of G^(k), be defined relative to G^(k), T^(k) as P(w) was defined relative toG, T. Similarly to what we did previously, we may writeG⁰ instead ofG⁽¹⁾.

Proposition 2. Algorithm 2 returns true if and only ifGis reducible.

Proof. Let {w₁, . . . , w_l} be the set of vertices of G with entering cycle edges, numbered such that ord(w1)>· · ·> ord(wl).

(⇐) We will prove that, ifGis reducible, then the algorithm returnstrueby induction on the number of vertices with entering cycle edges,l.

Base: IfGdoes not have cycle edges, then it is a tree and it is reducible by repeated application of T₂: every vertex has indegree 1 so, starting from the leaves, it is always possible to collapse a leaf into its parent (the only modification is the removal of one vertex and one edge) until the resulting graph is only the entry node. The algorithm, when applied to a tree, scans every vertex without modifying the tree, eventually reaching line 28 where it returnstrue.

Step: We start by observing that the set of vertices ofG^(k)with entering cycle edges is{wi:i≥k+1}, for all k≤l. By Lemma 4 and Lemma 5, which are applicable sinceGis reducible, every cycle edge of G^(k)corresponds to a cycle edge ofG, so the set of vertices with entering cycle edges ofG^(k)corresponds to the set of vertices with entering cycle edges of G. f ind(w_i) for i ≤ k+ 1 does not have entering cycle edges since they were all collapsed in the previous steps. Finally, w_j is a vertex of G^(k) for all j ≥k+ 1, that is, f ind(wj) =wj: assume thatwj was previously collapsed; thenwj ∈ P⁽ⁱ⁻¹⁾(wi) for somei≤k+ 1, so there iswi →wj inT⁽ⁱ⁾and also inT sincewi andwj were not already collapsed, but this is a contradiction withord(wi)> ord(wj).

We will show that, ifG is reducible, thenG⁰ is reducible. By Lemma 3∀x∈V ∀v∈P(x), there is x→vinT. By Lemma 6,∀x∈V:ord(x)< ord(w1)∀v∈P⁰(x) there isx→vinT⁰. Since every vertex xofG⁰ with entering cycle edges is such that ord(x)< ord(w1),∀x∈V⁰ ∀v∈P⁰(x) there isx→v and G⁰ is reducible.

Let us now assume that the firstk < lcontractions were already made, so the current graph isG^(k). We will show that the next iteration of the for loop that changes the graph (w=w_k+1) transformsG^(k) in G^(k+1).

During the iteration wherew =wk+1, Q⊆ P ⊆P^(k)(wk+1): The first inclusion is trivial. For the second inclusion, notice that (u, w) is a cycle edge ofGif and only if (f ind(u), w) is a cycle edge ofG^(k), by Lemma 4 and Lemma 5; for every x∈V,f ind(x)→xinG; (y, x) is a forward, tree or cross edge if and only if (f ind(y), x) is a forward, tree or cross edge; andwk+16∈P. A vertexz⁰ =f ind(z) is added to P if (z⁰, w_k+1) is a cycle edge or if (z⁰, x), for somex∈Q ⊆P, is an edge of G^(k). Thus, for every z∈P there exists someu⁰ such that (u⁰, w_k+1) is a cycle edge ofG^(k)there exists a pathz→u⁰ inG^(k) that does not contain w_k+1.

After the while loop,P =P^(k)(w_k+1): Ifz∈P^(k)(w_k+1) then there existsusuch that (u, w_k+1) is a cycle edge in G^(k)and there is z→uin G^(k) that does not contain wk+1. u∈P by construction of P.

Line 14 builds, backwards, every path of G^(k) ending in uthat does not contain wk+1. Notice that in line 14xhas no entering edges in G^(k)sinceord(x)> ord(wk+1) was verified before adding xtoP. In particular,z is added toP at some point in the execution, so, in the end of the while loop,z∈P.

So the last for loop, in lines 24–26, collapses every vertex of P(wk+1) into wk+1 and the result is G^(k+1).

We conclude that, ifGis a reducible graph withlvertices, the algorithm successively transformsGin reducible graphs, reducing the number of vertices with entering cycle edges by 1 in each iteration, until G^(l)is a tree, and the program returnstrue.

(⇒) Assume that G is an irreducible graph. Then there exists k such that ∃v ∈ P(wk) :wk 6→ v in T. Since, for all v, there is f ind(v) → v in T, this means that wk 6→ f ind(v) in T, considering the function f indapplied to any of the graphs G^(j), j ≤l. By Lemma 2, ord(w_k) > ord(f ind(v)) or ord(w_k) +N D(w_k)≤ord(f ind(v)). In iterationk,P eventually analyses all elements ofP^(k−1)(w_k), by the results obtained in the proof of the previous implication, so it will scanv and returnfalse.

Complexity analysis

Although Tarjan stated in his 1974 paper ([17]) that the time complexity of this algorithm wasO(|E|log^∗|E|), where log^∗ is the iterated logarithm, this bound has been later reduced due to results that the same au-thor obtained for the complexity of the operations of the disjoint-set data structure. We will now show that, for nelements, a sequence ofm f ind,union, andmakesetoperations of whichnaremakesethas worst-case running timeO(m α(n)), whereαis related to the inverse Ackermann function.

The Ackermann function was defined in 1928 by William Ackermann ([1]) and it was the first example of a computable function that is not primitive recursive. Since then, several functions that share the same property have been constructed for various purposes. These functions grow faster than any multiple exponential, which means that their inverses grow very slowly – for every practical input, they are always below 5. Our proof and, so, our version of the Ackermann function, follows [4].

We will use amortized analysis to prove the desired time bound. Amortized analysis was first used in the 1980s to prove upper bounds on the time complexity of several algorithms, and it was described in more detail in 1985 by Tarjan in [18]. It is a technique that allows us to obtain tighter bounds

on the execution time of an algorithm by noticing that the worst-case running time of a sequence of operations may be smaller than the sum of the worst-case running times of each operation. One of the ways to perform amortized analysis is to use a potential function. IfDrepresents all possible states of the data structure used by the algorithm during the execution, then any Φ :D →Nis a potential function.

The choice of Φ, for each algorithm, strongly depends on the data structure and on the operations that are performed. Having chosen Φ, the amortized time of an operation, relative to Φ, is defined as a=t+ Φ(D⁰)−Φ(D), wheretis the actual time of the operation andD, D⁰ ∈ Dare the data structure states before and after the operation, respectively. Therefore, a sequence ofk operations has amortized time

are the initial and the final states of the data structure andai,ti are the amortized and actual times of theith operation. If Φ(D0)<Φ(Dk), then the amortized time is an upper bound on the actual time of the sequence of operations.

As it was already described, the disjoint-set data structure models subsets of a given set. For

No documento Formal analysis and gas estimation for Ethereum. smart contracts. Mathematics and Applications (páginas 55-70)