9

What is the expected global clustering coefficient $\mathbb{E}[C_{GC}]$ for the Erdős–Rényi random graph (ER-graph) $\mathcal{G}(n,p)$ (expectation is over the ensemble of all ER-graphs) as $n \rightarrow \infty$ and $p$ fixed?

The global clustering coefficient $C_{GC}$ is defined as

$C_{GC}={\frac {3\times {\mbox{number of triangles}}}{{\mbox{number of connected triplets of vertices}}}}={\frac {{\mbox{number of closed triplets}}}{{\mbox{number of connected triplets of vertices}}}}$.

A connected triplet is defined to be a connected subgraph consisting of three vertices and two edges. A closed triplet is a connected triplet that induces a triangle.

While it is easy to see that the expected mean local clustering coefficient is $p$ (see next section), the expected global clustering coefficient is not identically $p$ for any $n$.

For example, for $n=3$, $C_{GC} = 1$ only when all edges are present (with probability $p^3$) and is otherwise zero (with probability $1-p^3$). Hence the $\mathbb{E}[E_{GC}] = p^3$ when $n=3$.

Computationally, I have found that $\mathbb{E}[C_{GC}]\approx p$ for large $n$.

Is there a way to prove that $\mathbb{E}[C_{GC}]= p$ as $n\rightarrow\infty$?

My current theory is to use Chebyshev's inequality on this, but I haven't tried it out yet.

Expected local clustering coefficient = p

In contrast, it is easy to see that the expected local clustering coefficient $\mathbb{E}[C_i]$ for any node $i$ is $p$.

The local clustering coefficient $C_i$ of node $i$ (for an undirected network) is defined as

$C_i = \frac{\text{number of triangles that contain $i$}}{k_i (k_i-1)/2}$, where $k_i$ is the degree of $i$.

In other words, it is the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them.

We have $\mathbb{E}[C_i]=p$ in an ER-graph because the probability for an edge between any neighbours of the node is $p$, independent for any other edge. (For an alternative answer involving more algebra, see here).

Hence the expected mean local clustering coefficient $\mathbb{E}[\sum_i C_i]$ is $p$ for any $n$.

Fabian Ying
  • 245
  • 2
  • 7
  • For the case $n = 3$, what is the clustering coefficient of, say, the graph with one edge? It has $0$ connected triplets of vertices, and $C_{GC}$ is not well-defined. If you restrict the probability space of graphs to those with at least one connected triplet, I do believe that the triangle comes up with probability $p$ when $n = 3$ (I didn't check formally though). – Manuel Lafond Feb 08 '18 at 16:10
  • Conventionally, you say that any graph with no connected triplets has a global clustering coefficient of 1. Even if you restrict the space of graphs to those with at least one connected triplet, you have for $n=3$ an expected global clustering coefficient of $p^3 / (p^3 + 3(1-p)p^2)$, which is not $p^2$. – Fabian Ying Feb 08 '18 at 19:52

1 Answers1

9

If there are $3 \binom n3 p^3$ triangles in expectation, and $3 \binom n3 p^2$ connected triples, the global clustering coefficient should approach their ratio (which is $p$).

Of course, naively taking their ratios doesn't work: $\mathbb E[\frac {X}Y]$ is not the same thing as $\frac{\mathbb E[X]}{\mathbb E[Y]}$. This is one of the main challenges in dealing with the expected value of a ratio. Instead, we'll show that both quantities are concentrated around their mean, and proceed that way.

Let $X$ denote the number of triangles in $\mathcal G(n,p)$. It's easy to see if we properly define triangles that $\mathbb E[X] = 3\binom n3 p^3$, which for consistency with connected triplets I want to define as $3\binom n3$ choices of a potential path $P_3$, and a $p^3$ chance that both edges of the path and the edge that makes it a triangle are present.

Moreover, the number of triangles is $3n$-Lipschitz in the edges of the graph (changing one edge changes the number of triangles by at most $3n$) so by McDiarmid's inequality $$ \Pr[|X - \mathbb E[X]| \ge n^{2.5}] \le 2 \exp \left(-\frac{2n^5}{\binom n2 (3n)^2}\right) \le 2 e^{-4n/9}. $$ If we let $Y$ be the number of connected triplets, then the expected number of them is $3\binom n3 p^2$: there are $3\binom n3$ potential copies of $P_3$ in the graph, and each is realized with a probability of $p^2$. This is actually $2n$-Lipschitz in the edges of the graph (each edge is part of at most $2n$ paths on $3$ vertices) but we'll round that up to $3n$-Lipschitz so that the same bound applies.

As a result, with probability $1 - 4 e^{-4n/9}$, both values are within $n^{2.5}$ of their expected value, and so the ratio $\frac{X}{Y}$ satisfies $$ \frac{3\binom n3 p^3 - n^{2.5}}{3\binom n3 p^2 + n^{2.5}} \le \frac{X}{Y} \le \frac{3\binom n3 p^3 + n^{2.5}}{3\binom n3 p^2 - n^{2.5}} $$ and therefore $\frac XY = p + O(n^{-0.5})$ in these cases. The remaining cases occur with probability $4e^{-4n/9}$, and the ratio is between $0$ and $1$ even then, so they can affect the expected value by at most $4e^{-4n/9}$ as well. Therefore $\mathbb E[\frac XY] = p + O(n^{-0.5})$, which approaches $p$ as $n \to \infty$.

Misha Lavrov
  • 129,298
  • 10
  • 116
  • 227
  • Thank you very much! This is a correct approach. I was not clear in the definition of triplets in the original post (I have now edited that). Any triangle counts as three triplets, instead of one, so the expected number of triplets is $\binom{n}{3}p^2$, yielding an expected global clustering coefficient of $p$ as $n\rightarrow \infty$. If you wish to edit that, I'll mark it as accepted! – Fabian Ying Feb 12 '18 at 08:30
  • 1
    I should have thought about it that way in the first place. Anyway, the expectations should be right now. – Misha Lavrov Feb 12 '18 at 14:48
  • Cheers for the help. I've marked it as accepted. – Fabian Ying Feb 12 '18 at 14:51
  • @MishaLavrov Thank you for such a thorough answer (and for pointing it out in another post). Could you please explain how the -0.5 comes in? I don’t see how it follows from the inequality at the end, or what the exact correction term should be. And does rounding everything up to 3$n$-Lipschitz introduce error? – user2561523 Apr 14 '22 at 23:00
  • 1
    @user2561523 (1) We get the $n^{-0.5}$ by comparing the powers of $n$ in $\binom n3$ and $n^{2.5}$. Write $\frac{3\binom n3 p^3 - n^{2.5}}{3\binom n3 p^2 + n^{2.5}}$ as $p \cdot \frac{1-n^{2.5}/(3\binom n3p^3)}{1+n^{2.5}/(3\binom n3p^2)}$ and then we see that this is $p \cdot \frac{1 + O(n^{-0.5})}{1 + O(n^{-0.5})}$, from which the final form follows. – Misha Lavrov Apr 14 '22 at 23:11
  • 1
    (2) Rounding up to $3n$-Lipschitz means we get a weaker bound than we could have. I just didn't want to deal with two mostly-identical-but-slightly-different error terms. But if something is $2n$-Lipschitz, it is also $3n$-Lipschitz, so we're allowed to do this. – Misha Lavrov Apr 14 '22 at 23:11
  • @MishaLavrov Ah, so you are just using the highest power in ${n \choose 3}$. Thanks! Since the $p$ can't be factored out, I take it there is no explicit expression that can be obtained for the clustering coefficient for all $n$? – user2561523 Apr 14 '22 at 23:39
  • 1
    @user2561523 If there is an explicit expression, this is not the method to find it. I do not expect that there is one. Most quantities related to random graphs don't have nice closed-form expressions, just nice asymptotic ones. – Misha Lavrov Apr 14 '22 at 23:41
  • Thanks for all of your help / insight. – user2561523 Apr 14 '22 at 23:43