Goodness of fit between columns of a 10000x2 table (with low counts)

Question

I have a $10000\times2$ table (see below) where column A is made out of observed data and column B is the output of a model that I have which should fit the observed data A.

$n_{A}$ is the total number of points in column A and $n_{B}$ the total number of points in B and they are not necessarily equal.

How should I characterize the goodness of fit of this model output (B) with my observed data (A)?

The Chi-Squared test:

$\chi^{2}=\sum\limits_{i=1}^{N} \frac{(O_i-E_i)^2}{E_i}$

(where $O_i$ is the observed frequency and $E_i$ is the expected frequency) has the issue of what to do I do if $E_i=0$? As you can see in the table above, there's lots of bins where $E_i$ is zero.

The same happens if I use a log likelihood Poisson test. This is how I've seen it expressed (taken from here):

$-2ln\lambda=2 \sum\limits_{i=1}^{N} (E_{i} - O_{i} + O_{i}ln\frac{O_{i}}{E_{i}})$

it will clearly have the same problem as the Chi-Squared test above, whenever $E_i=0$.

Fisher's exact test was mentioned in the original question (here) but later retracted.

So what can I do whit this table?

This question comes from this original post. I've tried to phrase it as simple as possible.

A very similar question was asked here but it was never fully answered.

Did you see that I extensively revised (again) my answer to the original question at http://stats.stackexchange.com/questions/25946/goodness-of-fit-for-2d-histograms, which may render this whole approach of using a contingency table of bin counts not necessary? — Peter Ellis, Apr 19 '12 at 06:45

Goodness of fit between columns of a 10000x2 table (with low counts)

0 Answers0

Linked