The rule of thumb on choosing the best k for a k-means clustering suggests choosing $k$
$$ k \sim \sqrt{n/2} $$
$n$ being the number of points to cluster. I'd like to know where this comes from and what's the (heuristic) justification. I cannot find good sources around.
The only references I can find about this are a comment on reserchgate and this review, which does not explain it anyway.