Chi-Square-Test: Why is the chi-squared test a one-tailed test?

Question

In my Masters thesis I conducted an experiment in that subjects should decide between two alternatives. I had a control and a treatment group. The hypothesis was that in the treatment group more subjects decide for alternative B than in the control group.

I've got the following results:

Control group: - Alternative A: 15 - Alternative B: 16

Treatment group: - Alternative A: 17 - Alternative B: 6

I am using Pearson's Chi-Square Test to test if the difference is significant. I do that using SPSS, therefore I only get the "2-sided significance". The test says the p-value is 0,059.

My question is if it is correct to divide the p-value by 2 in order the get the p-value for the 1-sided significance test. As far as I understood the concept this would be correct as my hypothesis is formulated only in one direction.

Thanks a lot.

Oh, and another short question. I am not an English native speaker but I have to write my thesis in english, so I have a short question regarding language: Is it correct to write: "the p-level is below the 5%-quantile so the finding is significant"?

Thanks!

This is an odd feature of SPSS: it always performs two-sided tests, without possibility to change this. If you need the test to be one-sided, you have to recalculate the obtained p-value yourself. — StijnDeVuyst, Sep 04 '15 at 10:47

score 11 · Accepted Answer · edited Apr 13 '17 at 12:44

There is a reason why the 'two-tailed chi-squared' is seldomly used: if you do a $\chi^2$ test for contingency tables, then the test statistic is (without the continuity correction):

$X^2 = \sum_{i,j}\frac{(o_{ij}-e_{ij})^2}{e_{ij}}$

where $o_{ij}$ are the observed counts in cell $i,j$ and $e_{ij}$ are the expected cell count in cell $i,j$. Under relatively weak assumptions it can be shown that $X^2$ approximately follows a $\chi^2$ distribution with $1$ degree of freedom (this is for a 2x2 table as in your case).

If you assume independence between the row and column variable, (which is $H_0$ ) , then the $e_{ij}$ are estimated from the marginal probabilities.

This is just for a short intro to $\chi^2$ for contingency tables. The most important thing is that the numerator of each term in $X^2$ is the squared difference between the 'observed counts' and the 'expected counts'. So whether $o_{ij} < e_{ij}$ or $o_{ij} > e_{ij}$ makes no difference in the result for $X^2$.

So the $\chi^2$ test for contingency table tests whether the observations are either smaller or larger than expected ! So it is a two-sided test even if the critical region is defined in one (the right) tail of the $\chi^2$ distribution.

So the point is that the $\chi^2$-test is a two-sided test (it can reject values $o_{ij}$ that are either too small or too large) but it uses a one-sided critial region (the right queue of $\chi^2$).

So how do you have to interpret your result: if $H_0: \text{ 'row variable and column variable are independent' }$ then the probability of observing a value at least as extreme as the computed $X^2$ is 0.059. This is called the p-value of the test.

(Note that, by the above 'independent' includes 'either too high or too low'. )

In order to 'decide' something, you have to first chose a significance level. This is a 'risk that you accept for making type I errors'. The significance level of $5\%$ is commonly used.

You will now reject the null hypothesis when the p-value (0.059) is smaller than the choosen significance level (0.05). This is not so for your table, so you will not reject $H_0$ at a significance level of $5\%$.

As far as your question at the bottom is concerned you should say (but in your example it is not the case) : the p-value is lower than or equal to the choosen significance level of 0.05, so the $H_0$ is rejeceted and we conclude that the rows and column variables are dependent. (but, as said, in your example the p-value is higher than the 0.05 significance level).

Maybe you should also take a look at Misunderstanding a P-value?.

EDIT: to react to questions/remarks in the comments I added this:

@StijnDeVuyst:

An 'extreme' case may make this clear. Assume the we know that the population is normal with an unknown $\mu$ but $\sigma=1$, i.e. $N(\mu,1)$.

We want to test the hypothesis $H_0: \mu = 0$ versus $\mu \ne 0$. If we observe a value $x=2$ from a sample then the p-value of this observation is $0.2275$ (1-pnorm(q=2)) multiplied by two because our critical region is two-tailed.

We would also define the critical region in another way: if $H_0$ is true then the population has a standard normal distribution, so the test statistic $X$ is $X \sim N(0;1)$. Then by definition of a $\chi^2$ with one degree of freedom, we also know that $X^2 \sim \chi^2_{(1)}$.

We had observed $x=2$ thus $x^2=4$ and if we compute the p-value of 4 for a $\chi^2_{(1)}$ we find 1-pchisq(q=4,df=1)=0.0455.

Note that this is exactly equal to (2*(1-pnorm(q=2)).

So we do the same test $H_0: \mu = 0$ versus $\mu \ne 0$ (different or not, so two-sided) with two (equivalent) critical regions one that is one-tailed (the one based on the $\chi^2$ distribution) and another one that is two-tailed (the one based on the normal distribution).

@StepMuc:

I will not try to be precise here, the goal is to give you the 'feeling' of what it is about, because you asked for it in the comment.

For the 'idea behind' hypothesis testing I refer to What follows if we fail to reject the null hypothesis?.

So, in hypothesis testing, if you want to 'find evidence' for something, then you assume the opposite. You want to show that the group you belong to (treatment/control) 'influences' your choice for 'A' of 'B', so you want to show that 'group' (treatment/control) and choice (A/B) are 'dependent'. If you want to show that, then you assume the opposite, so

$H_0: \text{ group and choice are independent }$

and the alternative is then

$H_1: \text{ group and choice are dependent }$.

The next thing you, as a scientist, have to decide is the 'significance level $\alpha$'. This is the probability that the test rejects $H_0$ while in reality it is true. When we reject $H_0$ we conclude (see the What follows if we fail to reject the null hypothesis?) that we found statistical evidence for $H_1$, and we have probability of $\alpha$ that we found 'false evidence'. It is up to you (or your risk appetite) how high you choose this $\alpha$. Common values are 0.001, 0.01, 0.05, 0.1. The higher $\alpha$ the higher the risk that you discover 'false evidence', false evidence for $H_1$ is called a type I error.

The $\chi^2$ test for contingence tables tests the $H_0: \text{ group and choice are independent }$ versus $H_1: \text{ group and choice are dependent }$.

If $H_0$ is true, then it can be shown that the $X^2$ defined supra is comes from a $\chi^2$ distribution. From the table that you have, you can compute $X^2$, this gives you a number.

You have 54 people, 31 in treatment group, 23 in control group. Assume now that you let these people randomly choose A/B en you make e.g. 1000000 tables with these random outcomes and for each of these tables you compute $X^2$ as above, then the probability that the computed $X^2$ is larger or equal than the one for your table is 0.059 (which is the p-value in your question).

So what we have now is that we assumed that $H_0$ is true and if that is the case then we find that the probability to have the value of higher is 0.059. If this a 'low enough' probability that means that we found that 'if $H_0$ is true, then we find a value that is very improbable' so $H_0$ must be false and $H_1$ is 'statistically proven'.

We still have to define what is meant by 'low enough' and that is defined as 'lower than or equal to the choosen significance level $\alpha$.

So if you choose a significance level $\alpha=0.05$ then, as the probability of having the value for $X^2$ or higher was 0.059, you will not reject $H_0$ at the $5\%$ significance level, so you find no evidence for $H_1$ at the $5\%$ significance level. So you have to accept $H_0$ that group and choice are independent.

If you are ready to accept more type I errors and set $\alpha=0.1$ then, as your p-value is lower than 0.1, you will reject $H_0$ and conclude that 'group and choice' are dependent at the $10\%$ significance level.

I don't quite agree with this. In the case of a contingency table, the sum of all observed counts ($o_{ij}$) should be the same as the sum of all expected counts ($e_{ij}$). So 'too small' in one cell means 'too large' in other cells. There is no too small or too large, only 'different' or 'not different'. As you say, the critical region is in the right tail of the $\chi^2$-distribution and that makes it essentially a one-sided test in my opinion. — StijnDeVuyst, Sep 04 '15 at 11:11
@StijnDeVuyst: and if you do a two-sided t-test, then you test $H_0: \mu= 0$ versus $H_1: \mu \ne 0$. So also 'different' or 'not different' , why is this a two-sided test then ? I think you have to **distinguish** between a **'two-sided test'** and a **'two-sided critical region'**. So you can have a two-sided test with a one-sided critical region. The $\chi^2$ test for contigency tables is a two-sided test with a one-sided critial region (because of squaring) and the t-test above is a two-sided test with a two-sided critical region. — , Sep 04 '15 at 11:12
I see what you mean. This is perhaps just a dispute over terminology: for me 'two-sided' is indeed the same as 'two-sided critical region', because I only look at what region of the test statistic distribution supports the alternative hypothesis. In that regard, the t-test you mention is two-sided, the chi-square is one-sided. — StijnDeVuyst, Sep 04 '15 at 11:26
@StijnDeVuyst: I added an example at the bottom of my answer. — , Sep 04 '15 at 11:50
(+1) The two-tailed test has a use in assessing whether results are "too good to be true". Fisher used it on Mendel's pea data - see e.g. [Pires & Branco (2010), "A Statistical Model to Explain the Mendel–Fisher Controversy", *Statist Sci.*, **25**, 4](https://projecteuclid.org/euclid.ss/1300108237). [I should've said "the one-tailed test on the other tail"] — Scortchi - Reinstate Monica, Sep 04 '15 at 13:26
@Scortchi: thanks for the reference (+1), I wil take a look at it, I have some doubts about this two-tailed $\chi^2$ test but I will first read the reference. — , Sep 04 '15 at 13:38
@f coppens: Thanks alot for your explanations. However, I don't think that I understood entirely. My learning would be: It doesn't matter how my hypothesis is formulated, I always use the 2-sided p-value when I use the Chi-Square Test. I am not sure if I entirely understand the part with the hypothesis: You say, the hypothesis is that control group and treatment group are independent. If the test says "yes", than that means that the results of the control group are very likely (100-p%) not a "variation of the results of the treatment group". Is that correct? — StepMuc, Sep 04 '15 at 13:49
@StepMuc: I will adde this to my answer, but I don't have the time right now, I will do it in a few hours. — , Sep 04 '15 at 13:51
@f coppens: That's great - thanks a lot! I just had the idea that it might help if I give the hypothesis that I use. In my experiment, subjects should decide for A or B. My hypothesis was that subjects decide for B in the Treatment group. Does this change anything? — StepMuc, Sep 04 '15 at 13:58
@StepMuc: I will take that into account and try to explain the things so that you can 'see' what is behind and then you will understand better. But that will only be possible in a few hours, at latest tomorrow. — , Sep 04 '15 at 14:21
@f coppens: Again, I get what you mean. But although your criterium of classifying a test as one-sided or two-sided is plausible and valid in itself, I just don't think it is very useful. It is confusing because now you have to explain to people that a test is two-sided but still the p-value is a one-tail probability. This is problematic to say the least. For example, what in ANOVA where $H_1: not(\mu_1=\mu_2=\mu_3)$. How many sides does this test have in your view? In my view it is simple: the F-test is one-sided because the critical region is only for large values of F. — StijnDeVuyst, Sep 04 '15 at 15:20
@StepMuc: I tried to give additional explanation at the bottom of my answer — , Sep 04 '15 at 15:52
@StijnDeVuyst: The ANOVA question; I would say that it is a test with a one tail critical region. As you talk about three means we are in a multi-variate case (3 in your example) and even in the two-variate case the test it-self can not be 'classified' as one of two sided; to say it simple, in two dimensions there is no left side and no right side. But tell me, how would you call the test that I give in my answer, the one based on the normal distribution. There are two equivalent critical regions; one one-tailed and one-two tailed, how would you classify it ? — , Sep 04 '15 at 16:12
@fcoppens: I would say these are two different tests: a two-sided $Z$-test and a one-sided $\chi^2$-test, because my view of one/two-sided depends on where the rejection region is in the support of the $H_0$ test statistic distribution. I know both are essentially the "same" and give the same conclusion, as you convincingly show. But this happens sometimes. For example in ANOVA with 2-level factors: you can either use (one-sided) $F$-tests for the Mean-Square ratios, or use (two-sided) $t$-tests on the regression coefficients. They give exactly the same $p$-values. — StijnDeVuyst, Sep 04 '15 at 17:48
@StijnDeVuyst: can you be more precise on *"you can either use (one-sided) F-tests for the Mean-Square ratios, or use (two-sided) t-tests on the regression coefficients. They give exactly the same p-values"*, what do you mean by that ? — , Sep 05 '15 at 06:44
So you confirm what I say in my example with the normal variable $X$ and its square $X^2$ in my answer because the F in the output of these instructions is ... $t^2$. So you have a **two-sided** test with two equivalent critical regions; one that is **two-tailed** (the one for the $t$) and one that is **one-tailed** (the one for the $F$), but in this case $F=t^2$ that's why the critical regions are equivalent. — , Sep 05 '15 at 11:18
Yes, $F=t^2$. It is the same situation as your $X$ and $X^2$. I confirm, that is not the problem. We just have an argument on what it means for a test to be two-sided. You look at what $H_1$ means in the parameter space ($\mu$ smaller/larger or different than...), whereas I look at what $H_1$ means for the test statistic. Literature is curiously unclear about this. Further, I suggest we end the debate here before the administrators kick us out! — StijnDeVuyst, Sep 05 '15 at 18:01
@StijnDeVuyst: why would they kick us out ? I learn things from this discussion and that's the goal of SE isn't it ? But you teach in Ghent and I do as well, so maybe we can one day continue the discussion there ? — , Sep 05 '15 at 18:12

score 0 · Answer 2 · answered Sep 04 '15 at 09:29

You are correct that the one-tailed p-value is for any hypothesis that is formulated as one-sided and that two-tailed p-values are for two-sided hypotheses, in which either group (in this case) can prove to be "better" than the other.

Since you're talking about treatments, I assume that your field is in medicine, psychology or something similar. It is very rare for hypotheses in these fields to be one-sided (to my experience at least), and if you want to test the effect of a treatment, you must have very convincing arguments for why the treatment effect under no circumstances could be negative.

So in practice, one-tailed p-values are rarely used (I can't recall ever having seen them used in a paper) and I think you should use the two-tailed test as well.

Hi Jonas, I agree that two-sided tests are often applied, but a two-sided $\chi^2$-test is not so frequent and the reason for that I have tried to explain in my answer. — , Sep 10 '15 at 08:05

score 0 · Answer 3 · edited Apr 13 '17 at 12:44

As far as I can see, the twosidedness is not referring to the chisquare test at all, but rather to the corresponding two-sided test of two proportions. The chisquare part is indeed a onetailed test, which it should be. The same kind of vocabulary is used in i e openepi.com. Some more details were covered in another comment, see https://stats.stackexchange.com/a/157005/18276.

If I am correct, then the entire discussion of when a twosided chisquare test should be used is more or less off topic. Or at least the answer to a question that was not raised, but to another question.

It's really odd that SPSS does not have a clear formulation of tests of proportions since that is such a common task. Even if a chisquare test in a 2x2 table is equivalent to a test of proportions, it would be easier to understand the output if it had been based on proportions rather than a substitute. It is also strange that they haven't included tests which are less sensitive to small samples (the Agresti-Coull or mid-P tests for example).

Chi-Square-Test: Why is the chi-squared test a one-tailed test?

3 Answers3

EDIT: to react to questions/remarks in the comments I added this:

Linked