0

Suppose I have a sample with sample size $N$ that is obtained experimentally, e.g. I have counted the number of birds at a certain location at a certain time.

Now suppose that the sample (the number of birds) comes from a normal distribution, the mean and standard deviation being unknown.

Looking at the data, I find two peculiar observations: one of them is ridiculously large, that in an instant I know there are no such number of birds on our planet, and the other suggests existence of anti-matter with wings.

So, given the troublesome data, how could I show that the two peculiarities are outliers with a 95% confidence interval?

My gut feeling here is that when the sample size is small, observing such data would be unlikely. When the sample size gets larger, also more extreme observations are seen but too extreme values should still not appear.

birdy
  • 1
  • 2
  • See [here](http://stats.stackexchange.com/a/121075/603) – user603 Feb 13 '15 at 21:35
  • When the subject matter tells you a value is impossible, there is no need (or usefulness) in testing that value statistically. The most important thing is to spend a little time trying to find out how such a value made it into your database: sometimes huge systematic problems are revealed by one tiny little discrepancy. For instance, if it was an error of human data entry, then likely there are *other* such errors, but perhaps of not so obvious a magnitude. – whuber Feb 13 '15 at 22:05
  • May I treat such observations as missing data? – birdy Feb 13 '15 at 23:09
  • That's an interesting idea. I think it depends. For instance, if all such corrupted values arise from a common cause, you would have to be very careful even about removing them. As a hypothetical example, suppose the presence of a "-" caused the processing to be erroneous and neglect that character. All the negative values would be increased, thereby biasing the dataset. Even if you deleted them or treated them as missing at random, the analysis would be biased. (I experienced something like this once where the initial digit of any value above 999999 was silently removed...) – whuber Feb 13 '15 at 23:53

0 Answers0