Skip to main content

Community detection algorithm

About the code

The code is an implementation of the algorithm described in [1]. It provides an efficient way to perform community detection on complex networks using modularity maximization. It also returns the effect size of the partition found as a z-score computed using an Erdős-Rényi null model. The algorithm consists of carefully controlled iterations of leading eigenvalue bisection, followed by several tuning steps, including a local Kernighan-Lin-like fine-tuning, a global final-tuning and an agglomeration step.

To download the code, click here.

If you use this code for your research, I kindly ask you to cite Ref. 1 in your publications.

For questions, requests or notifications, please write to C dot I dot del-Genio at warwick dot ac dot uk.


Usage

The code consists of the files ComDet.c, ComDet.h and Comm.h. To use it, just include Comm.h in your code, and compile ComDet.c with your other source files, remembering to use the option -lm as it needs to link to the math library.

The code defines a new data structure, called comstruct. Its members are

  • double modularity
  • double zscore
  • int *community

Before starting partitioning the network, the algorithm needs to be initialized. If the network is stored as an adjacency matrix, this is done by invoking the function modinitmat. The prototype of this function is
void modinitmat(int **adj, const int nodes)
where adj contains the adjacency matrix, and nodes is the number of nodes in the network. If, instead, the network is stored as an adjacency list, such as those produced by the graph sampling algorithm, then the function to invoke is modinitlist. Its prototype is
void modinitlist(int **list, const int *sequence, const int nodes)
where list contains the adjacency list, sequence is an array of integers containing the degree sequence of the network, and nodes is the number of nodes.

Once the method is initialized, to detect the community structure, invoke the function modularity. The prototype of this function is
comstruct modularity(double (*rng)(void), const double tol, const double tolpm, const double taccpt)
The function returns a partition of the network that aims to maximize modularity, using the user-specified random number generator rng. The random number generator must be a function taking no input parameters and returning a double precision floating point number between 0 and 1. The generator must be already seeded. This leaves the user the choice of the generator to use. The variables tol, tolpm and taccpt allow to specify tolerances for the various steps of the method. In particular, tol is the tolerance used for internal comparisons in the refining steps, tolpm is the one used in finding the leading eigenvalue of the modularity matrix, and taccpt is the one used in the main cycle to decide whether to accept a refinement of the partitioning. So, given a network, a set of tolerances and a choice of random number generator, the user creates a partitioning by declaring some variable P of type comstruct, and then assigning it the return value of modularity:
P = modularity(rng,tol,tolpm,taccpt);

After the assignment, P.modularity is the modularity of the partitioning, P.zscore is its z-score, and P.community contains the community labels of each node. A new partitioning of the same network can be obtained invoking modularity again. Please note that the label list of a previous partitioning is destroyed with further calls to the sampler, even if the sample is assigned to a different variable. Thus, for instance, the lines
P = gsam(rng);
Q = gsam(rng);
will result in the same community label list stored in P.community and in Q.community.

After finishing working with a given network, the memory used should be cleaned by invoking modclean(). Afterwards, the sampler is ready to be initialized again with another network.

Included are also two simple proofs of concept (POC). Both proofs of concept illustrate how to run the code on a network stored as adjacency matrix or adjacency list, with the second POC including an external call to the freely available Graphviz software (www.graphviz.org), which, if installed on your computer, produces a visualization of the networks with the communities separated by distance and distinguished by colour, for increased ease of understanding, using a tuned force-driven spring model.
To compile the POCs, use the commands
gcc -lm -o poc poc.c ComDet.c
gcc -lm -o poc2 poc2.c ComDet.c

For questions, requests or notifications, please write to C dot I dot del-Genio at warwick dot ac dot uk.

References

[1] Treviño III, Nyberg, Del Genio and Bassler, J. Stat. Mech. - Theory E. (2015) P02003


Release information

Current version
6.3: Bug fixed: in networks with a very sparse largest component and a smaller, very dense component of similar size, the code could underestimate the memory needed.
6.2: Initial release.

License and Copyright

Copyright 2012-2014 Charo I. Del Genio
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.