How would combining levels in a contingency table (2x3 $\rightarrow$ 2x2) affect the power of a chi squared test?

Question

A 2x2 table has a critical $\chi^2$ value of $3.84$, whilst a 2x3 table has a critical value of $5.99$ (for a significance level of $0.05$).

Is it reasonable to think, that if you are able to turn your 2x3 table into a 2x2 table (by combining two groups) the power will increase? Because we have two values that have an expected frequency of $1$, which could give us a $χ^2$ which is to "falsely too high". Would you mind giving us a hint how to tackle this problem?

I'm a little puzzled by some of these remarks. Perhaps that's because power would be irrelevant in the situation you posit, since then the $\chi^2$ p-value is known to be incorrect (and perhaps wildly so). As to the possibility of combining (or even splitting) the categories in a contingency table *ex post facto*, the same remark applies: in that circumstance the $\chi^2$ p-value will be very wrong indeed. Conditions for applicability of the $\chi^2$ test (of which there are many that are often overlooked) are given at http://stats.stackexchange.com/a/17148. — whuber, Jun 02 '15 at 19:02
@whuber We have two values which doesn't meet the criteria of Chi-square because two of the expected value is less than one. If we assume our χ2 p-value for a 2x3 contingency table is 0,04 and changed it into a 2x2 contingency table and got a χ2 p-value of 0,4. Doesn't it mean we have "increased the power" or should we conclude with "because we wanted to avoid false rejection of the null hypothesis we combined two groups" — Chi23, Jun 02 '15 at 19:11
No, it means you *decreased* the power. (Maybe you mean test *size* instead of *power*?) However, it is practically meaningless to change the power of a test that does not achieve its nominal size (and in fact is known not to be applicable in the first place). That power comparison makes no sense. If you want to increase the power of *any* hypothesis test, you can do it much more easily than this. Here's fully working code (in `R`): `hypothesis.test — whuber, Jun 02 '15 at 19:49

Glen_b · Answer 1 · 2015-06-02T23:55:04.877

If we attempt to skip around whuber's first objection by using exact p-values for the statistic (say by conditioning on the margins and looking at the exact discrete distribution of the chi-square statistic)*, or by ignoring the detail in the question by assuming large enough sample sizes that the chi-square approximation is adequate in all cases (in either case obviating any urge to consider combining but allowing us to at least consider power), and avoid the second objection by considering a situation where we act to combine categories by only looking at the smallest pair of expected values (given a design with those fixed margins) before the observations are made, then ... it depends on the particular manner of the arrangement of proportions (the form of association in the table) in the particular alternative we're computing power for.

* (and possibly assuming that we consider randomized tests to deal with potential mismatches of significance level between combined and original tables when doing that)

If the alternative is such that the smaller-expected groups that are being combined deviate in a similar manner (deviate from independence in the same direction), power will often tend to increase. If the alternative against which power is under consideration is structured so those small-expectation groups deviate from expected in some dissimilar way, power will tend to decrease, as the deviations will tend to "wash out". When combining groups with similar expected where one has very small deviation (even in the same direction) with one with large deviation, power will typically tend to decrease as well - the combined numerator may not change much from the sums of the squares in the two original numerators but the expected on the bottom is twice as big as the one making the larger contribution to the test statistic, and as a result the overall chi-squared tends to be smaller in a way not equally compensated for by the reduced d.f.

Since we're combining groups only on the basis of their expected values (under a fixed-margin assumption so we can calculate it before seeing the data), we seem to have no basis to assume the smaller expected values will fortuitously line up with some common trend in the table.

[In this context, it's not clear what you mean by "falsely too high" -- your chi square could go down as easily as up. Can you explain or illustrate what you mean by the term?]

How would combining levels in a contingency table (2x3 $\rightarrow$ 2x2) affect the power of a chi squared test?

1 Answers1