Abstract

A cluster algorithm for graphs called the Markov Cluster algorithm (MCL algorithm) is introduced. The algorithm provides basically an interface to an algebraic process defined on stochastic matrices, called the MCL process. The graphs may be both weighted (with nonnegative weight) and directed. Let G be such a graph. The MCL algorithm simulates flow in G by first identifying G in a canonical way with a Markov graph G1. Flow is then alternatingly expanded and contracted, leading to a row of Markov Graphs G(i). Flow expansion corresponds with taking the kth power of a stochastic matrix, where k ∈ IN. Flow contraction corresponds with a parametrized operator $\Gamma$r, r > 0, which maps the set of (column) stochastic matrices onto itself. The image $\Gamma$rM is obtained by raising each entry in M to the rth power and rescaling each column to have sum 1 again. The heuristic underlying this approach is the expectation that flow between dense regions which are sparsely connected will evaporate. The invariant limits of the process are easily derived and in practice the process converges very fast to such a limit, the structure of which has a generic interpretation as an overlapping clustering of the graph G. Overlap is limited to cases where the input graph has a symmetric structure inducing it. The contraction and expansion parameters of the MCL process influence the granularity of the output. The algorithm is space and time efficient and lends itself to drastic scaling. This report describes the MCL algorithm and process, convergence towards equilibrium states, interpretation of the states as clusterings, and implementation and scalability. The algorithm is introduced by first considering several related proposals towards graph clustering, of both combinatorial and probabilistic nature.

Links and resources

Tags

community

  • @jullybobble
  • @lillejul
@jullybobble's tags highlighted