Modularity of graph: why are probabilities of self-loop included?

Question

I'm trying to understand the Newman Modularity (doi:10.1073/pnas.0601602103) by investigating its calculation on the Wikipedia example . My question is why are probabilities of self-loops included in the Q formula while the adjacency matrix A has no self-loops? i.e., given the standard definition of modularity Q...

$Q=\frac{1}{2m}\Sigma_{ij} (A-P_{ij})\delta(s_{i},s_{j})$

where A is the adjacency matrix and $P_{ij}$ is probability of a link between i and j, and $\delta()=1$ if entries are in the same community. While $A_{ii}=0$, this is not true for P, where $P_{ii}=\frac{k_{i}k_{i}}{2m} \neq0$, $k_i = \Sigma_{j}a_{ij}$ is the degree of i.

It would seem to me that probabilities of self loops should NOT be included. A-P leads to negative entries for the community of i. Here is R code to show how self-loop probabilities need to be included for agreement with both wikipedia example and the same calculation in the 'igraph' package function 'modularity'

# modularity function, input matrix A and membership vector mem
Q <- function(A,mem){ diag(A) <- rep(0,ncol(A)) 
m <- sum(A)/2 # number of links
k <- colSums(A) #degree
kk <- outer(k,k,"*") 
memmem <- outer(mem,mem,"==")*1 # diag is 1!
Pij <- kk/(2*m) # probability of link
1/(2*m)*(sum((A-Pij)*memmem))}

# Make a graph from Wikipedia entry on Modularity (Q=0.4895)
Aw <- matrix(c(0,1,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,
0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,
0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,1,0,
0,1,0,0,1,0,0,0),10,10)
membership <- c(1,1,1,2,2,2,3,3,3,1)
Q(Aw, membership) # = 0.4895833

library(igraph) # check in igraph package
modularity(graph.adjacency(Aw),membership) # = 0.4895833

I think that the entries $P_{ii}$ are needed to fulfill the constraint $\Sigma_{j} p_{ij} = \Sigma_{j} a_{ij} = k_i$ , i.e, the expectation of i's degree is its degree, but if $p_{ii}\neq 0$, than the probabilities are not distributed across the other nodes $i\neq j$.

score 3 · Answer 1 · answered Jan 26 '14 at 13:55

I will answer my own question, after stumbling upon Squartini and Garlaschelli 2011 article "Analytical maximum-likelihood method to detect patterns in real networks", in ArXiv. Yes, the expected probability of a link between two nodes in a binary graph, as calculated in the classic Modularity score, includes self-loops, and yes this is not ideal. However, it is known to be an approximation, $\langle p_{ij} \rangle = \frac{k_i \cdot k_j}{2m}$. Squartini and Garleschelli also state that the approximation becomes particularly probablematic in scale-free networks that have high density.

score 0 · Answer 2 · answered Oct 13 '15 at 01:01

Self-loops always have positive contribution to modularity. However if the difference between the self-loops and the other edges is high, the detection algorithm would not be able to detect big communities and instead detects lots of singular communities.

Modularity of graph: why are probabilities of self-loop included?

2 Answers2