Performance Evaluation - Fast Hierarchical Graph Clustering

For a final evaluation of the methods, we measured the execution time needed for each for the datasets used.

The following results were obtained on an Intel(R) Xeon(R) CPU E7 - 4830 @2.13GHz computer, with 64GB of RAM, running Linux, version 4.4.29.

LLP Louvain Faststep

Airports 2.21 0.431 5.76

Amazon 32.76 36.00 949.8

Youtube 54.09 68.99 2443.03

Web-google 116.24 59.53 4138.67

Wiki 271.32 234.65 18270.34

Table 6: Execution time (in seconds) of each algorithm, for each dataset. Rows sorted by number of edges in each dataset.

LLP and Louvain have very efficient implementations and it is easily observable that they have similar execution times. It is also interesting to observe that there is no clear “winner” between them: LLP is faster in two datasets, Louvain method is faster in the other three.

Faststep was much slower. Its execution time can be justified by our approach to the problem. Faststep allows to set both the minimum number of iterations and final error we want to obtain. Until the defined

criteria are met, the algorithm keeps iterating over the matrix. As we have seen in the previous tests, the results were far from good, and we chose to set a very low error value, trying to obtain the best results possible. It could also be argued that we could have compromised the quality of the clusterings to improve the running time. However, in our point of view, the most important contribution of this work is to identify if Faststep could or could not be used for these problems in study, i.e., check if the quality of its results is similar to quality the other tested algorithms. In that way, much less importance is given to the running time of the algorithm. In a situation where the quality of the results was better, a greater effort could have been put in improving the overall performance of our Faststep version.

5 Final Remarks

In this work, we proposed an adaptation of Faststep for the problem of hierarchical clustering. We compared its results with two state-of-the-art algorithms, LLP and Louvain method. We tested how these three algorithms could be used for graph reordering and compression of networks using the Webgraph framework. Finally, we implemented a tool which allows the retrieval of graph clusterings and reorderings and the use of these clusterings or reorderings in the compression of the graph.

Due to the fact that communities could not be retrieved directly using the method provided by the authors, we did a possible approach to this problem. Our method tries to pick one community at a time, in contrast to other algorithms that propagate labels or merge clusters incrementally. This has a important impact on overlapping nodes. Overlapping nodes are usually hubs and tend to have many valid communities. In this situation, they cannot be placed on more than one community, and so Faststep places them on the first that is found. The other methods can somehow balance this effect, as communities are constructed incrementally and can gather the more important nodes to themselves. This approach can lead to a lot of variations between the communities found by Faststep and the communities found by another algorithm. Also, the cut-off added to the method to decide whether a node belongs or not to the community introduces a lot of error. As said previously, the simultaneous construction of the communities allows them to control each other; Faststep, on the other side, uses a much more crude approach which can place too much nodes in the cluster, but also too few. The method can have a reasonable performance when dealing with difficult graphs, especially when communities are heterogeneous, as the boundaries between communities are not well defined and the error is acceptable. The results obtained on graph compression through reordering prove this idea. However, when dealing with graphs with clear clusterings, even ones that are perfectly bounded, the implemented cut-off is incapable of accessing this information and introduces a lot of error.

This details of the method obtained worse results than we could expected in the beginning of the work.

It was specially inaccurate for artificial communities, which don’t have the features that characterize real networks. When applied to real networks, the clusterings obtained were very different from the ones provided by LLP or Louvain method. Despite these results, it managed to obtain reasonable ones in graph reordering/compression.

The performance of Faststep could benefit a lot with an improved method to decide if a node belongs or not to a cluster. However, we understood the limitations of the algorithm. In fact, Faststep was designed to solve a different problem, matrix factorization. As such, its results, although interesting for the comparison of the two problems, cannot be expected to be better than the ones obtained by algorithms specifically proposed for this matter.

We believe that our study of Faststep as a clustering, hierarchical clustering and reordering algorithm allowed to explore the possibilities of the method. Even though there are other approaches to this problem, we consider that there is not much that can be done after this, as the results are a long way from those of LLP and Louvain method, and any possible improvement would not, in our point of view, be enough to make it competitive in both quality and execution time.

However, we think that Faststep could be improved in a lot of ways, which could perhaps lead to much better results. These improvements were mentioned in this work and are directly related to the results in tests performed. A proper technique of defining which nodes belong or not to a community, which works for all types of networks would be the most urgent. Currently, Faststep works with a fixed number of communities. Extending the algorithm to allow it to search or refine the number of communities in the network could possible extend the usage of algorithm to other interesting problems. Another useful improvement would be increasing the efficiency of the algorithm, as it takes much a longer time to run that other algorithms with similar time and space complexities.

In our opinion, the results with Louvain method to obtain a reordering of the graph seem promising and, as previously suggested, they could benefit if the hierarchical structure of the graph, obtained internally in the maximization of the modularity, was used in the reordering algorithm. We could also understand that the clusterings obtained by LLP have, in general, a higher granularity than those obtained by Louvain method, which presents an opportunity to study how their reorderings could be used together for an improved result.

References

[1] M. Araujo, P. Ribeiro, and C. Faloutsos, “Faststep: Scalable boolean matrix decomposition,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 461–473, Springer Interna- tional Publishing, 2016.

[2] M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,”Physical review E, vol. 69, no. 2, p. 026113, 2004.

[3] S. Fortunato, “Community detection in graphs,”Physics reports, vol. 486, no. 3, pp. 75–174, 2010.

[4] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wagner, “On modularity clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 2, pp. 172–188, 2008.

[5] P. Boldi, M. Rosa, M. Santini, and S. Vigna, “Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks,” 2010.

[6] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008.

[7] R. Albert and A.-L. Barab´asi, “Statistical mechanics of complex networks,”Rev. Mod. Phys., vol. 74, pp. 47–97, Jan 2002.

[8] P. J. M. Mason A. Porter, Jukka-Pekka Onnela, “Communities in networks,” 2009.

[9] P. Boldi and S. Vigna, “The webgraph framework i: Compression techniques,” inProceedings of the 13th International Conference on World Wide Web, WWW ’04, (New York, NY, USA), pp. 595–602, ACM, 2004.

[10] M. Araujo, S. G¨unnemann, G. Mateos, and C. Faloutsos, “Beyond blocks: Hyperbolic community detection,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 50–65, Springer Berlin Heidelberg, 2014.

[11] G. Golub and W. Kahan, “Calculating the singular values and pseudo-inverse of a matrix,”Journal of the Society for Industrial and Applied Mathematics Series B Numerical Analysis, vol. 2, no. 2, pp. 205–224, 1965.

[12] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,”

Nature, vol. 401, no. 6755, pp. 788–791, 1999.

[13] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in large-scale networks,”Phys. Rev. E, vol. 76, p. 036106, Sep 2007.

[14] P. Ronhovde and Z. Nussinov, “Local resolution-limit-free potts model for community detection,”

2008.

[15] A. Noack and R. Rotta, “Multi-level algorithms for modularity clustering,” CoRR, vol. abs/0812.4073, 2008.

[16] K. Wakita and T. Tsurumi, “Finding community structure in mega-scale social networks,” CoRR, vol. abs/cs/0702048, 2007.

[17] L. Danon, A. D´ıaz-Guilera, and A. Arenas, “The effect of size heterogeneity on community identifi- cation in complex networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2006, no. 11, p. P11010, 2006.

[18] S. Fortunato and M. Barth´elemy, “Resolution limit in community detection,” Proceedings of the National Academy of Sciences, vol. 104, no. 1, pp. 36–41, 2007.

[19] U. Brandes, “A faster algorithm for betweenness centrality,”The Journal of Mathematical Sociology, vol. 25, no. 2, pp. 163–177, 2001.

[20] W. W. Zachary, “An information flow model for conflict and fission in small groups,” Journal of Anthropological Research, vol. 33, no. 4, pp. 452–473, 1977.

[21] “Stanford large network dataset collection.”https://snap.stanford.edu/data/index.html, 2017.

[Online; accessed 2-January-2017].

[22] S. Fortunato, “Benchmark graphs to test community detection algorithms.” https://sites.

google.com/site/santofortunato/inthepress2, 2017. [Online; accessed 6-January-2017].

[23] L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas, “Comparing community structure identifi- cation,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2005, no. 09, p. P09008, 2005.

[24] A. Lancichinetti, S. Fortunato, and J. Kert´esz, “Detecting the overlapping and hierarchical community structure in complex networks,”New Journal of Physics, vol. 11, no. 3, p. 033015, 2009.

[25] A. F. McDaid, D. Greene, and N. Hurley, “Normalized mutual information to evaluate overlapping community finding algorithms,” Oct. 2011.

[26] “Stanford large network dataset collection, youtube dataset.”https://snap.stanford.edu/data/

com-Youtube.html, 2012. [Online; accessed 2-July-2017].

[27] M. M. Duarte, “Fast hierarchical graph clustering tool on github.”https://github.com/michel94/

fhgc-tool, 2017. [Online; created 13-October-2017].

[28] “Openflights: Airport and airline data.” https://openflights.org/data.html, 2017. [Online;

accessed 7-July-2017].

[29] “Stanford large network dataset collection, amazon dataset.”https://snap.stanford.edu/data/

com-Amazon.html, 2012. [Online; accessed 2-July-2017].

[30] “Stanford large network dataset collection, google web graph dataset.” https://snap.stanford.

edu/data/web-Google.html, 2009. [Online; accessed 2-July-2017].

[31] “Stanford large network dataset collection, wikipedia network of top categories dataset.” https:

//snap.stanford.edu/data/wiki-topcats.html, 2009. [Online; accessed 2-July-2017].

No documento Fast Hierarchical Graph Clustering (páginas 52-61)