0

For example: enter image description here

The distribution looks like a Poisson distribution. If the data outside mean + 2.5*standard deviation is seen as outliers, will the outlier proportion be larger or smaller than 5%? (where 5% is for normal distribution) Thanks!

zero_yu
  • 155
  • 1
  • 5
  • If the data *really* comes from a Poisson distribution, it depends on the value of the parameter, but it can be worked out. If the data just *looks like* a Poisson, then it depends on the actual distribution it comes from. If you specifically want to look at the 5% of observations that are most outlying, why not do that directly by computing the empirical quantiles instead of making assumptions about the distribution? – Chris Haug Aug 04 '17 at 18:35
  • @ Chris Haug I don't want to specify 5% to be outliers. I just wondering the proporation of outliers detected by mean + 2.5*standard deviation for poisson distribution is larger or smaller then 5%. – zero_yu Aug 04 '17 at 19:09
  • 1
    Consider [this](https://stats.stackexchange.com/q/56402/603) answer. – user603 Aug 05 '17 at 09:03

1 Answers1

2

If $X \sim \text{Poisson}(\lambda)$, the population equivalent of this probability is:

$$f(\lambda):=1-P[\lambda-2.5\sqrt{\lambda} \leq X \leq \lambda+2.5\sqrt{\lambda}]$$

In general, we can show that $f(\lambda) \leq 1/(2.5^2) = 0.16$ by Chebyshev inequality, for all $\lambda$, so it is no more than 16%. Whether it is greater or smaller than 5% depends on the value of $\lambda$. Here is some R code that illustrates this:

cheb <- 1/(2.5^2)
lambdas <- seq(0.0001, 5, length.out = 10000)
fl <- sapply(lambdas, function(l) 1-(ppois(l+2.5*sqrt(l),l)-ppois(l-2.5*sqrt(l),l)))

plot(lambdas,fl, ylim=c(0,0.2),ylab="", main="Probability of falling outside bounds")
abline(h=cheb, col="red")
abline(h=0.05, col="blue")
legend("topright", legend = c("f(lambda)","5%","Chebyshev bound"), col=c("black","blue","red"),lty=c(0,1,1),pch=c(1,NA,NA))

And the result below:

enter image description here

For very large $\lambda$, it tends to the same value as for the normal (which is not 5%, as you claim, but closer to 1.2%).

What you are suggesting with regards to "outliers" involves estimating $\lambda$ from data, so the empirical coverage of this interval may differ from the one that assumes the true $\lambda$, but the point still stands that the answer to your question is "it depends on the parameter of the Poisson distribution".

Chris Haug
  • 4,893
  • 1
  • 17
  • 24
  • You seem to conflate $\lambda$ and $\hat\lambda$ – user603 Aug 04 '17 at 21:03
  • @user603 Thanks, I've tried to clarify what I meant – Chris Haug Aug 04 '17 at 21:49
  • @ Chris Haug But the mean is always between 0 and 1. Can I still use Chebishev inequality? – zero_yu Aug 05 '17 at 17:58
  • @linghao It holds regardless of $\lambda$. If you're asking whether you can derive a sharper (smaller) bound for Poisson with $0 \leq \lambda \leq 1$, I don't know. The graph above is computed on a grid, so it doesn't prove that $f(\lambda)$ doesn't reach the Chebyshev bound for some specific value. – Chris Haug Aug 05 '17 at 18:58