Is the proporation of outliers detected by mean + 2.5*standard deviation for poisson distribution larger or smaller then 5%

Question

For example:

The distribution looks like a Poisson distribution. If the data outside mean + 2.5*standard deviation is seen as outliers, will the outlier proportion be larger or smaller than 5%? (where 5% is for normal distribution) Thanks!

If the data *really* comes from a Poisson distribution, it depends on the value of the parameter, but it can be worked out. If the data just *looks like* a Poisson, then it depends on the actual distribution it comes from. If you specifically want to look at the 5% of observations that are most outlying, why not do that directly by computing the empirical quantiles instead of making assumptions about the distribution? — Chris Haug, Aug 04 '17 at 18:35
@ Chris Haug I don't want to specify 5% to be outliers. I just wondering the proporation of outliers detected by mean + 2.5*standard deviation for poisson distribution is larger or smaller then 5%. — zero_yu, Aug 04 '17 at 19:09
Consider [this](https://stats.stackexchange.com/q/56402/603) answer. — user603, Aug 05 '17 at 09:03

Chris Haug · Accepted Answer · 2017-08-04T21:48:26.787

If $X \sim \text{Poisson}(\lambda)$, the population equivalent of this probability is:

$$f(\lambda):=1-P[\lambda-2.5\sqrt{\lambda} \leq X \leq \lambda+2.5\sqrt{\lambda}]$$

In general, we can show that $f(\lambda) \leq 1/(2.5^2) = 0.16$ by Chebyshev inequality, for all $\lambda$, so it is no more than 16%. Whether it is greater or smaller than 5% depends on the value of $\lambda$. Here is some R code that illustrates this:

cheb <- 1/(2.5^2)
lambdas <- seq(0.0001, 5, length.out = 10000)
fl <- sapply(lambdas, function(l) 1-(ppois(l+2.5*sqrt(l),l)-ppois(l-2.5*sqrt(l),l)))

plot(lambdas,fl, ylim=c(0,0.2),ylab="", main="Probability of falling outside bounds")
abline(h=cheb, col="red")
abline(h=0.05, col="blue")
legend("topright", legend = c("f(lambda)","5%","Chebyshev bound"), col=c("black","blue","red"),lty=c(0,1,1),pch=c(1,NA,NA))

And the result below:

For very large $\lambda$, it tends to the same value as for the normal (which is not 5%, as you claim, but closer to 1.2%).

What you are suggesting with regards to "outliers" involves estimating $\lambda$ from data, so the empirical coverage of this interval may differ from the one that assumes the true $\lambda$, but the point still stands that the answer to your question is "it depends on the parameter of the Poisson distribution".

@ Chris Haug But the mean is always between 0 and 1. Can I still use Chebishev inequality? — zero_yu, Aug 05 '17 at 17:58
@linghao It holds regardless of $\lambda$. If you're asking whether you can derive a sharper (smaller) bound for Poisson with $0 \leq \lambda \leq 1$, I don't know. The graph above is computed on a grid, so it doesn't prove that $f(\lambda)$ doesn't reach the Chebyshev bound for some specific value. — Chris Haug, Aug 05 '17 at 18:58

Is the proporation of outliers detected by mean + 2.5*standard deviation for poisson distribution larger or smaller then 5%

1 Answers1