Distribution of test statistics in Pearson's Chi Square test for multinomial data

Question

Let $\mathbf{X} = (X_0,\ldots,X_k) \sim Mul(n,\mathbf{p})$ be a multinomial RV. We define the following hypothesis test: $$H0: \mathbf{p} = \mathbf{p_0} := (p_{01},\ldots, p_{0k})$$ versus $$H1: \mathbf{p} \neq \mathbf{p_0}$$

Pearson's $\chi^2$ test statistics is $$T(\mathbf{X}) = \sum_{i=1}^k \frac{(X_i-np_i)^2}{np_i}$$ Pearson's $\chi^2$ test assumes: $T(\mathbf{X}) \sim \chi^2_{k-1}$.

How does $T(\mathbf{X})$ follow chi-square distribution? For $\chi^2$ distribution each term under summation should be square of standard normal RV. According to my comuptations, $$\sum_{i=1}^k \frac{(X_i-np_i)^2}{np_i(1-p_i)}\sim \chi^2_{k-1}$$ Is there any theoretical explaination for missing $1-p_i$ term in denominatior.

I think you are misstating things. The chi-square distribution for the goodness of fit test is an asymptotic result.. I was recently appropriately admonished on another recent post when I asserted falsely that the terms have to be squares of independent standard normals for the exact distribution to be chi-square. Furthermore there are restriction that the asymptotic result won't even hold if some cells are too sparse. Also note that T(X) has the form (observe -expected)^2 — Michael R. Chernick, Nov 29 '16 at 11:43
I tried to edit and add to my comment but timed out. Here is what got left out. Also note that T(X) has the form (observed-expected)^2/expected. which is the known form for the chi-square test. You seem to possibly be thinking that the denominator terms need to be normalized by variances. Sounds reasonable but it is wrong. The theoretical reason is the proof of the asymptotic result under the conditions for the theorem. The theorem shows that T(X) has the asymptotic distribution stated but your proposal does not. — Michael R. Chernick, Nov 29 '16 at 12:04
Sum of squares of standard normal variables follow chi square distribution. Number of independent variables determine the degree of freedom $k$ of chi-square distribution. I am not worried about the asymptotic result, my concern is that T(X) is not sum of squares of standard normal RV. See, $E[X_i] = np_i$ and $Var(X_i) = np_i(1-p_i)$. $\chi^2$ test somehow assumes that $Var(X_i) = np_i$ — user2329744, Nov 29 '16 at 12:06
@MichaelChernick: Sorry, I can't understand which theorem you are talking about. Could you point me to a reference. Thanks! — user2329744, Nov 29 '16 at 12:09

Distribution of test statistics in Pearson's Chi Square test for multinomial data

0 Answers0

Linked