Boxplot Outliers

Question

I'm looking for outliers in a non-normally distributed dataset:

n: 1,900
Mean: 2,738
StDev: 1,544
Min: 1
Max: 22,102
Anderson-darling: 40
P < 0.005

The boxplot shows the outliers in one direction beyond upper extreme, but not the other way below lower extreme. Why is that?

This is answered (in passing) at http://stats.stackexchange.com/a/1153/919 . — whuber, Jun 21 '16 at 02:58

score 2 · Accepted Answer · answered Jun 21 '16 at 07:33

Your variable is right skewed and probably bounded to be positive. This is maybe easiest to see in graphs:

You can see that in the skewed graphs the outliers are all on one side.

For those who are interested: I created that graph in Stata using the following code:

clear all
set seed 1234567
set obs 4
gen distribution = _n
label define dist 1 "normal"       ///
                  2 "fat tails"    ///
                  3 "right skewed" ///
                  4 "left skewed"
label value distribution dist
expand 1000
gen x     =  rnormal() if dist == 1
replace x =  rt(4)     if dist == 2
replace x =  rchi2(2)  if dist == 3
replace x = -rchi2(2)  if dist == 4

stripplot x , over(dist)           ///
              stack width(0.5)     ///
              box(barw(0.2)) iqr   ///
              boffset(-0.3) h(0.5)

Maarten, there is no doubt that the single-, double-, and some triple-digit values in my dataset are outliers. Is it prudent to just manually remove these and re-run the boxplot? — Harper, Jun 21 '16 at 11:36
Outliers aren't necesserily bad. If they are typos, then by all means drop them, but if they are genuine observations then dropping them would be bad. — Maarten Buis, Jun 21 '16 at 14:17

Boxplot Outliers

1 Answers1

Linked