Filtering primitive
5.1 Efficient subgraph enumeration
5.1.1 Depth-first subgraph enumeration
In a naïve depth-first subgraph exploration, subgraphs are generated recursively considering as possible extensions the neighborhood of the current subgraph. Such a simple approach has a major issue: subgraphs may need to be verified several times, because this extension unit can be connected to the current subgraph from multiple sites.
In this case, the amount of spurious work for processing the generated subgraphs has the potential to be exacerbated beyond the necessary. In this context, we propose a modified in-depth subgraph exploration that combines two phases: a subgraph expansion phase followed by an in-depth exploration phase. Thus, new subgraphs are generated by computing the possible extensions of the current subgraph (computeExtensions – CE), immediately extending the subgraph to one of them in depth until the enumeration depth targeted (extend – EX), and backtracking for consuming remaining extensions.
In particular, the expansion phase is used to reduce the redundant work experienced by the naïve approach, as the valid subgraph extensions are unique and generated once per subgraph to be consumed after. Specifically, the algorithm starts with an extension unit (e.g. a vertex for induced subgraphs) representing the root of the enumeration tree, generates valid extensions (vertices) connecting to this root, add the first valid extension to the current subgraph, and proceeds the processing with this larger subgraph composed of two vertices. The algorithm repeats this process up to a certain enumeration depth, determined based on the size of the subgraphs of interest and while still remains unexplored computed extensions.
5.1. Efficient subgraph enumeration 70 Figure 5.3 presents an example of this method. In this example, we consider the enumeration of induced subgraphs with three vertices, starting from enumeration root v0. The execution starts with root v0 (step 0), computes the valid extensions for subgraph{v0}(step 1), consumes extensionv1and extends the current subgraph obtaining a new subgraph{vo, v1} (step 2), computes the extensions of subgraph {v0, v1} (step 3), consumes the first extension v2 yielding the first subgraph with three vertices{v0, v1, v2} (step 4), backtracks to subgraph {v0, v1} (step 5), consumes the next extension v3 from step 3 yielding the second subgraph with three vertices {v0, v1, v2} (step 6), and so on.
Figure 5.3: Depth-first exploration for subgraph enumeration. In this example, the target is subgraphs with three vertices.
1: CE
3: CE
2: EX 4: EX
6: EX5 7
v
0v
1v
3v
2v
4v
50 subgraph {v0}
1 CE extensions {v1, v4} 2 EX subgraph {v0, v1} 3 CE extensions {v2, v3, v4, v5} 4 EX subgraph {v0, v1, v2}
5 backtrack
6 EX subgraph {v0, v1, v3}
7 backtrack
· · · Source: Made by the author.
To make this process transparent and extensible, we propose a new data structure specifically designed to represent this two phase in-depth enumeration method. We call this structure subgraph enumerator because it represents a checkpoint for the subgraph enumeration process. Figure 5.4 shows the structure of a subgraph enumerator.
Figure 5.4: Subgraph enumerator abstraction.
subgraph-enumerator {
currentSubgraph; // current subgraph
computeExtensions(); // compute the valid extensions of the current subgraph extend(); // consume next extension
next(); // returns the subgraph enumerator in the next enumeration depth }
Source: Made by the author.
Each subgraph enumerator is identified by a current subgraph under extension process. Extensions candidates of this subgraph are generated with computeExtensions()
, i.e., it implements the first phase of subgraph expansion (CE). In case the current subgraph is empty, this function generates the set of initial extension units of the input graph. For example, for edge-oriented extension type (TE) it generates the set of edges of
5.1. Efficient subgraph enumeration 71 the input graph; for vertex-oriented extension type (TV) it generates the set of vertices of the input graph; and for pattern-oriented extension type (TP(Ä)) it also generates the set of vertices of the input graph. Subgraph enumerators work as hierarchical work queues, where a computeExtensions() call produces the work items (i.e. extension units) based on the current subgraph and an extend() call consumes a work item from this queue.
The consuming process viaextend() generates a new state in the subgraph enumerator of the next enumeration level (next()), sharing the current subgraph extended by one extension unit consumed for expansion. This strategy allows the enumeration engine to maintain an in-place current subgraph that grows whenever it includes new extension units and that shrinks whenever it backtracks.
Algorithm 6 describes this two-phase depth-first subgraph enumeration method.
Its input is an application step, i.e., a sequence (array) of primitives to be executed (Definition19). The algorithm initiates by creating an empty subgraph enumerator, which is given as parameter to the functionprocess (lines 1-2). This function (lines 3-12) applies the primitives over the subgraph enumerators recursively. For example, the first primitive is indexed by zero in thestep array. In case of extension (lines 4-8), the algorithm invokes the method recursively for each possible extension of the current subgraph. In case of filtering (lines 9-10), we only call the process function pointing to the next primitive if subgraph passes the filter. Finally, the algorithm handles the aggregation according to the user’s reduction function (lines 11-12), which marks the end of the recursive call.
Algorithm 6 dfs-processing(step)
1: se ← create-subgraph-enumerator()
2: process(se,step,0)
3: function process(se,step,idx)
4: P ←step[idx]
5: S ←se.currentSubgraph
6: if is-extension(P)then ▷ E(T, M)
7: se.computeExtensions()
8: while se.extend()do
9: process(se.next(),step,idx + 1)
10: else if is-filter(P)and filter(S)then ▷ F(p)
11: process(se.next(),step,idx + 1)
12: else if is-aggregation(P)then ▷ A(g, h, r)
13: aggregate(key(S), value(S))
The main advantage of this method is to allow a reduced space cost for application step processing. Specifically, in this strategy we only need to maintain onesubgraph enu- merator per enumeration level at a time. In particular, considering induced subgraphs, each execution thread has to maintain at most O(kn) words representing computed ex- tensions on each enumeration depth, wheren is the number of vertices in the input graph and k is the target size for subgraphs. However, we notice that k is usually orders of
5.1. Efficient subgraph enumeration 72 magnitude smaller thann and moreover, for sparse graphs (which is usually the case for real-world datasets) we observe thatn is an overestimation since most vertices have low degree (N(v)jn for v ∈V(G)).
This is not true for breadth-first approaches, where all subgraphs of certain size must be materialized for the subsequent level. We highlight that this in-depth enumeration method is targeted for the processing of single application steps. Indeed, Algorithm 6 assumes as input a single step represented by its array of sequential primitives. A re- maining question is how to model multi-step applications, issue that we address next in Section5.1.2.