0

Given a data set of N points with sample mean $\overline{x} \pm \Delta\overline{x}$ (here $\Delta\overline{x}$ is the standard error of the mean given by $s/\sqrt{N}$) and sample variance $s^2$, I am required to test the hypothesis that the data set is approximated by a Poisson distribution.

I consider the ratio $\frac{s^2}{\overline{x}}$. If the data set were distributed according to a Poisson distribution, we would expect that $\frac{s^2}{\overline{x}}$ is close to 1. Now, in general $\frac{s^2}{\overline{x}}$ is not going to be exactly 1, since my sample size is finite.

What I would like to do then, is to find the "standard error" associated with $s^2$ so that I may find the error associated with $\frac{s^2}{\overline{x}}$ via propagation of uncertainties:

$u(s^2/\overline{x}) = \sqrt{\left(\frac{1}{\overline{x}^2}\right)(\Delta s^2)^2 + \left(\frac{s^4}{\overline{x}^4}\right) \left(\frac{s}{\sqrt{N}}\right)^2} $

How would I do this? What would my $\Delta s^2$ need to be?

I have found a single simple paper on the topic but formula presented therein, $\Delta s^2 = s^2\sqrt{\frac{2}{N-1}}$, strikes me as over simplistic.: https://web.eecs.umich.edu/~fessler/papers/files/tr/stderr.pdf

Jordan Levin
  • 101
  • 1
  • $S^2$ is a sampled statistic, or in other words, come from the quantities from your own sample, so you can re-write it to accommodate those terms. What I mean is: $S^2={n \over n-1}(E(x ^2) - (E (x))^2)$, so start from here. – Firebug Apr 29 '17 at 18:56
  • 4
    Possible duplicate of [Standard deviation of standard deviation](https://stats.stackexchange.com/questions/631/standard-deviation-of-standard-deviation) – gammer Apr 29 '17 at 19:01
  • The formula you have at the end assumes normality, but you can't necessarily rely on normality for this; similarly you wouldn't rely on independence of variance and mean for a similar reason (indeed simulation suggests they can be more than a little related). You also need to remove the possibility that the entire sample is 0. – Glen_b May 01 '17 at 08:00
  • Close voters: I think the Poisson here is important; this one of the reasons is why I prevented closing as a duplicate – Glen_b May 01 '17 at 09:55
  • It looks to me (from a few simulations across a few different values of $\lambda$ - when not too small - and $N$) like the standard error of $s^2/\bar{x}$ might be reasonably well approximated by $\sqrt{\frac{2}{N-1}}$ (more extensive simulations would be needed to confirm that conjecture) ... but as I said earlier, you must exclude the possibility of a mean of 0. – Glen_b May 01 '17 at 10:13

1 Answers1

1

A implies B does not mean B implies A. For a Poisson distribution, mean = variance, but mean = variance does not mean distribution is Poisson. For example, normal distribution $N(\mu, \mu)$ given $\mu > 0$ is distribution with mean = variance, but it is a normal distribution, not Poisson. So your strategy has problem. Even you find a way to test that mean = variance, you still cannot get the conclusion that data come from Poisson distribution.

So my suggestion is to use Kolmogorov-Smirnov Goodness-of-Fit Test. You can find it from textbook or internet. But this method is not so efficient. It means the large sample is needed. If you sample size is < 50, even the data from a distribution far from Poisson, the chance of reject the null hypothesis that data come from Poisson distribution is very lower.

user158565
  • 7,032
  • 2
  • 9
  • 19
  • The usual distribution of the Kolmogorov-Smirnov statistic (i.e. the one in tables and computer packages) is based on the assumption of sampling from a continuous distribution. If used on a discrete distribution like the Poisson, it tends to be conservative (i.e. to yield a lower significance level than the nominal one) -- often considerably lower. This will also affect power. One could consider a simulation study at a range of values of the Poisson mean to see how the critical values are affected (or equivalently to see how to adjust the significance level appropriately). – Glen_b May 01 '17 at 10:03