4

For my research I conducted a creativity test and measured the quantity of ideas subjects had. Some people are extreme outliers as they have a lot of ideas or only 1 or 2 ideas. Intuitively I wanted to 5% trim my data to obtain a more robust estimation for my regression analyses.

  1. Is this justifyable?

  2. Is there a paper that can be cited confirming that 5% trimming is a good approach given skewed data?

user670186
  • 450
  • 2
  • 16
  • Have you looked at whether another type of distribution matches your data, e.g. Poisson or binomial? I'm assuming from your question you've noticed that your data is not normally distributed and haven't investigated alternative distributions. – Michelle Feb 13 '12 at 05:18

1 Answers1

11

The trimmed mean answer a different question than the mean: "What is a typical value for the distribution?". If you take the trimmed distribution, you explicitly state: I am not interested in outliers/ the tails of the distribution. If you belief that the "outliers" are really outliers (i.e., they do not belong to the distribution, but are of "another kind") then do trimming. If you think they belong to the distribution, but you want to have a less skewed distribution, you could think about winsorising.

Besides that, I would not recommend to enter a trimmed data set into a OLS regression. It would be better to use a robust regression technique right away, e.g.:

  • Theil-Sen Regression
  • Least trimmed squares
  • Regression based on MM-estimators

All of these techniques can be computes with the WRS package in R.

A good start for robust statistics is Wilcox's book.

Wilcox, R. R. (2010). Fundamentals of modern statistical methods: Substantially improving power and accuracy. Springer Verlag.

user603
  • 21,225
  • 3
  • 71
  • 135
Felix S
  • 4,432
  • 4
  • 26
  • 34