If you can go through the mathematics of cluster sampling, just follow an explanation of the variance of the total for a cluster survey. See e.g. p. 174 of Lohr's 2nd edition (open the amazon look inside and type "icc" to search for it; the first reference on p. 174 gives you ANOVA table for cluster sampling in a balanced situation). The reference formula (5.7) that Amazon does not show is
$$
\mathbf{V}(\hat t_{\rm cluster}) = N^2(1-\frac nN)\frac{M \, ({\rm MSB})}n
$$
One can construct artificial examples of populations (or rather their cluster structures) when ICC<0, and hence the cluster sample is more efficient than SRS. For instance, the population clustered as $\{ \{1, 6, 8 \}, \{3, 5, 7\}, \{2, 4, 9 \} \}$ will have this weird property:
y = c(1, 6, 8, 3, 5, 7, 2, 4, 9)
i = rep(1:3, each=3)
anova(lm(y~as.factor(i)))
So we see that this population (or rather the way it has been clustered) produces ${\rm MSB}=0$, and hence the variance of the total of the cluster sample of size $m=1$ cluster will be equal to 0, while the variance of the total of the SRS of the same size $n=3$ will be non-zero by that formula you will see on Amazon:
N = length(y)
n = 3
V_SRS = N*N*(1-n/N)*sd(y)*sd(y)/n
The trick is that the mean of each cluster is equal to 5, the population mean (or rather the total of each cluster is equal to 15, as we talk about the variance between cluster totals; it will make a difference in an unbalanced situation), so there indeed is no variability between clusters.
I would suggest that you go through both the derivation of the cluster variance formula, as well as the above computation, step by step, to see how they work, and try to come up with two different cluster structures for the above y
so that the cluster sample (i) will be less efficient than SRS (easy), and (ii) have non-zero MSB, unlike my example above, but still be more efficient than SRS (difficult).