I've seen a couple talks by non-statisticians where they seem to reinvent correlation measures using mutual information rather than regression (or equivalent/closely-related statistical tests).
I take it there's a good reason statisticians don't take this approach. My layman's understanding is that estimators of entropy / mutual information tend to be problematic and unstable. I assume power is also problematic as a result: they try to get around this by claiming that they're not using a parametric testing framework. Usually this kind of work doesn't bother with power calculations, or even confidence/credible intervals.
But to take a devil's advocate position, is slow convergence that big of a deal when datasets are extremely large? Also, sometimes these methods seem to "work" in the sense that the associations are validated by follow-up studies. What's the best critique against using mutual information as a measure of association and why isn't it widely used in statistical practice?
edit: Also, are there any good papers that cover these issues?