I'm trying to understand the Newman Modularity (doi:10.1073/pnas.0601602103) by investigating its calculation on the Wikipedia example . My question is why are probabilities of self-loops included in the Q formula while the adjacency matrix A has no self-loops? i.e., given the standard definition of modularity Q...
$Q=\frac{1}{2m}\Sigma_{ij} (A-P_{ij})\delta(s_{i},s_{j})$
where A is the adjacency matrix and $P_{ij}$ is probability of a link between i and j, and $\delta()=1$ if entries are in the same community. While $A_{ii}=0$, this is not true for P, where $P_{ii}=\frac{k_{i}k_{i}}{2m} \neq0$, $k_i = \Sigma_{j}a_{ij}$ is the degree of i.
It would seem to me that probabilities of self loops should NOT be included. A-P leads to negative entries for the community of i. Here is R code to show how self-loop probabilities need to be included for agreement with both wikipedia example and the same calculation in the 'igraph' package function 'modularity'
# modularity function, input matrix A and membership vector mem
Q <- function(A,mem){ diag(A) <- rep(0,ncol(A))
m <- sum(A)/2 # number of links
k <- colSums(A) #degree
kk <- outer(k,k,"*")
memmem <- outer(mem,mem,"==")*1 # diag is 1!
Pij <- kk/(2*m) # probability of link
1/(2*m)*(sum((A-Pij)*memmem))}
# Make a graph from Wikipedia entry on Modularity (Q=0.4895)
Aw <- matrix(c(0,1,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,
0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,
0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,1,0,
0,1,0,0,1,0,0,0),10,10)
membership <- c(1,1,1,2,2,2,3,3,3,1)
Q(Aw, membership) # = 0.4895833
library(igraph) # check in igraph package
modularity(graph.adjacency(Aw),membership) # = 0.4895833
I think that the entries $P_{ii}$ are needed to fulfill the constraint $\Sigma_{j} p_{ij} = \Sigma_{j} a_{ij} = k_i$ , i.e, the expectation of i's degree is its degree, but if $p_{ii}\neq 0$, than the probabilities are not distributed across the other nodes $i\neq j$.