# Community detection algorithm

## About the code

The code is an implementation of the algorithm described in [1]. It provides an efficient way to perform community detection on complex networks using modularity maximization. It also returns the effect size of the partition found as a *z*-score computed using an Erdős-Rényi null model. The algorithm consists of carefully controlled iterations of leading eigenvalue bisection, followed by several tuning steps, including a local Kernighan-Lin-like fine-tuning, a global final-tuning and an agglomeration step.

To download the code, click here.

If you use this code for your research, I kindly ask you to cite Ref. 1 in your publications.

For questions, requests or notifications, please write to C dot I dot del-Genio at warwick dot ac dot uk.

## Usage

The code consists of the files `ComDet.c`, `ComDet.h` and `Comm.h`. To use it, just include `Comm.h` in your code, and compile `ComDet.c` with your other source files, remembering to use the option -lm as it needs to link to the math library.

The code defines a new data structure, called `comstruct`. Its members are

`double modularity``double zscore``int *community`

Before starting partitioning the network, the algorithm needs to be initialized. If the network is stored as an adjacency matrix, this is done by invoking the function `modinitmat`. The prototype of this function is

`void modinitmat(int **adj, const int nodes)
`where

`adj`contains the adjacency matrix, and

`nodes`is the number of nodes in the network. If, instead, the network is stored as an adjacency list, such as those produced by the graph sampling algorithm, then the function to invoke is

`modinitlist`. Its prototype is

`void modinitlist(int **list, const int *sequence, const int nodes)`

where

`list`contains the adjacency list,

`sequence`is an array of integers containing the degree sequence of the network, and

`nodes`is the number of nodes.

Once the method is initialized, to detect the community structure, invoke the function `modularity`. The prototype of this function is

`comstruct modularity(double (*rng)(void), const double tol, const double tolpm, const double taccpt)
`The function returns a partition of the network that aims to maximize modularity, using the user-specified random number generator

`rng`. The random number generator must be a function taking no input parameters and returning a double precision floating point number between 0 and 1. The generator must be already seeded. This leaves the user the choice of the generator to use. The variables

`tol`,

`tolpm`and

`taccpt`allow to specify tolerances for the various steps of the method. In particular,

`tol`is the tolerance used for internal comparisons in the refining steps,

`tolpm`is the one used in finding the leading eigenvalue of the modularity matrix, and

`taccpt`is the one used in the main cycle to decide whether to accept a refinement of the partitioning. So, given a network, a set of tolerances and a choice of random number generator, the user creates a partitioning by declaring some variable

`P`of type

`comstruct`, and then assigning it the return value of

`modularity`:

`P = modularity(rng,tol,tolpm,taccpt);`

After the assignment, `P.modularity` is the modularity of the partitioning, `P.zscore` is its *z*-score, and `P.community` contains the community labels of each node. A new partitioning of the same network can be obtained invoking `modularity` again. Please note that the label list of a previous partitioning is destroyed with further calls to the sampler, **even if the sample is assigned to a different variable**. Thus, for instance, the lines

`P = gsam(rng);
Q = gsam(rng);
`will result in the same community label list stored in

`P.community`and in

`Q.community`.

After finishing working with a given network, the memory used should be cleaned by invoking `modclean()`. Afterwards, the sampler is ready to be initialized again with another network.

Included are also two simple proofs of concept (POC). Both proofs of concept illustrate how to run the code on a network stored as adjacency matrix or adjacency list, with the second POC including an external call to the freely available Graphviz software (www.graphviz.org), which, if installed on your computer, produces a visualization of the networks with the communities separated by distance and distinguished by colour, for increased ease of understanding, using a tuned force-driven spring model.

To compile the POCs, use the commands

`gcc -lm -o poc poc.c ComDet.c
gcc -lm -o poc2 poc2.c ComDet.c
`

For questions, requests or notifications, please write to C dot I dot del-Genio at warwick dot ac dot uk.

## References

[1] Treviño III, Nyberg, Del Genio and Bassler, J. Stat. Mech. - Theory E. (2015) P02003

## Release information

Current version

** 6.3**: Bug fixed: in networks with a very sparse largest component and a smaller, very dense component of similar size, the code could underestimate the memory needed.

**6.2**: Initial release.

## License and Copyright

Copyright 2012-2014 Charo I. Del Genio

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.