The Pearson test is popular because it's simple to compute - it's amenable to hand-calculation even without a calculator (or historically, even without log-tables) - and yet generally has good power compared to alternatives; the simplicity means it continues to be taught in the most basic subjects. There might be argued that there's an element of technological inertia in the choice, but actually I think the Pearson chi-squared is still an easily defendable choice in a wide range of situations.
Being derived from a likelihood ratio test, the Neyman-Pearson lemma would suggest that the G-test should tend to have more power in large samples, but generally the Pearson chi-squared test has similar power in large samples (asymptotically it should be equivalent in the Pitman sense - there's some brief discussion about various kinds of asymptotics below - but here I just mean what you tend to see in large samples with a small effect size and at typical significance levels, without worrying about a particular sequence of tests by which $n\to\infty$.)
On the other hand, in small samples, the set of available significance levels has more impact than asymptotic power; I don't think there's usually a big difference, but in some situations one or the other may have an advantage*.
* But in that case the neat trick of combining the two may be even better - that is, using one statistic to break ties in another (non-equivalent) test when you have small samples, increasing the set of available significance levels -- and so improving power by allowing the type I error rate to be closer to a desired significance level without having to do something as unappetizing as randomized tests. (In tests of independence for tables that are larger than 2x2 it can also work with the rxc version of the Fisher exact test.)
Both the Pearson and G-test may be placed into the set of (Cressie-Read) power-divergence statistics (Cressie and Read, 1984 [1]), by setting $\lambda=1$ and $\lambda=0$ respectively; this family of statistics includes several other previously defined statistics, such as the Neyman ($\lambda=-2$) and the Freeman-Tukey statistic ($\lambda=\frac12$) among others, and in that context - considering several criteria - Cressie and Read suggested that the statistic with $\lambda=\frac23$ is a good compromise choice for a statistic.
The efficiency issue is worth a brief mention; each definition compares the ratio of sample sizes under two tests. Loosely, Pitman efficiency considers a sequence of tests with fixed level $\alpha$ where the sample sizes achieve the same power over a sequence of ever-smaller effect sizes, while Bahadur efficiency holds the effect size fixed and considers a sequence of decreasing significance levels. (Hodges-Lehmann efficiency holds the significance level and effect size constant and lets the type II error rate decrease toward 0.)
Other than among some statisticians, it doesn't seem very common that most users of statistics consider using different significance levels; in that sense the sort of behavior we might tend to see if a sequence of increasing sample sizes were available would hold the significance level constant (for all that other choices might be wiser; it can be difficult to calculate). In any case, Pitman efficiency is the most often used.
On this topic, P. Groeneboom and J. Oosterhoff (1981) [2] mention (in their abstract):
the asymptotic efficiency in the sense of Bahadur often turns out to be quite an unsatisfactory measure of the relative performance of two tests when the sample sizes are moderate or small.
On the removed paragraph from Wikipedia; it's complete nonsense and it was rightly removed. Likelihood ratio tests were not invented until decades after Pearson's paper on the chi-squared test. The awkwardness of computing the likelihood ratio statistic in a pre-calculator era was in no sense a consideration for Pearson then, since the concept of Likelihood ratio tests simply didn't exist. Pearson's actual considerations are reasonably clear from his original paper. As I see it, he takes the form of the statistic directly from the term (aside the -\frac12) in the exponent in the multivariate normal approximation to the multinomial distribution.
If I was writing the same thing now, I'd characterize it as the (squared) Mahalanobis distance from the values expected under the null.
it makes you wonder why there isn't an R function for the G-test.
It can be found in one or two packages. However, it's so simple to calculate, I never bother to load them. Instead I usually compute it directly from the data and the expected values that are returned by the function that calculates the Pearson chi-squared statistic (or occasionally - at least in some situations - I compute it instead from the output of the glm
function).
Just a couple of lines in addition to the usual chisq.test
call are sufficient; it's easier to write it fresh from scratch each time than loading a package to do it. Indeed, you can also do an "exact" test based on the G-test statistic (conditioning on both margins) - using the same method that chisq.test
does, by using r2dtable
to generate as many random tables as you like (I tend to use a lot more tables than the default used by chisq.test
in R unless the original table is so large that it would take a very long time)
References
[1]: Cressie, N. and Read, T.R. (1984),
"Multinomial Goodness‐Of‐Fit Tests."
Journal of the Royal Statistical Society: Series B (Methodological), 46, p. 440-464.
[2]: P. Groeneboom and J. Oosterhoff (1981),
"Bahadur Efficiency and Small-sample Efficiency."
International Statistical Review, 49, p. 127-141.