0

The data I'm trying to analyze are the quadratic estimates from a quadratic fit to a curve. Most of the data vary between -.15 and .15. However, I have outliers in both directions up to things like -23.00. Outlier analysis and removal has been deemed insufficient as the primary coping method. Therefore, I need to transform the data; I wanted to use a log transformation but adding a constant to the data seems sketchy (it would have to be a large constant to cover all extreme outliers).

Any suggestions as to what kinds of transformations might work here? I need normally distributed data in the end so that I can run ANOVAs.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Michelle
  • 51
  • 1
  • 2
  • Why transform, rather than do something that doesn't require normality? What are you trying to find out from the ANOVA? – Glen_b Jun 26 '14 at 01:44
  • Basically because a reviwer on a manuscript specifically asked that I transform the data and it's best to do what they ask. The ANOVA is dead simple: a one way anova with 3 levels and the question is whether the average quadratic estimate varies amongst conditions; more specifically whether the control condition differs from the two treatment conditions and whether the two treatment conditions differ from each other. – Michelle Jun 26 '14 at 15:47
  • Actually, the reviewer specifically asked for a log transform, but may not have realized that I have negatives. I could just add the constant but I'm not very familiar with transformations and am having trouble figuring out whether adding a constant is considered appropriate or not. – Michelle Jun 26 '14 at 16:00
  • Whether it might make sense to add a constant depends on the shape of the distribution and what you're trying to achieve - but from your description adding a constant *won't* help with the left hand side. But in any case, to figure out whether adding a constant helps you get our paper accepted would require you to know what was in the mind of the reviewer. ...(ctd) – Glen_b Jun 27 '14 at 00:11
  • (ctd) ... Since the reviewer has asked for the impossible, you would appear to have some leverage (with the editor) to argue to do something that is both more sensible and actually possible, rather than go further down the rabbit hole of trying to make your method fit a reviewer's flawed understanding of the data. Do you know why the reviewer specifically wanted you to take logs? – Glen_b Jun 27 '14 at 00:14
  • 1
    No, I don't know why they specifically suggested logs, but the spirit of their point was that just using an outlier approach eliminated a number of data points and they wanted to see the full data set analyzed. It seems a little crazy to keep data points that are 4+SDs away from the mean, but I do need to address the point. You're right about the leverage though--I suppose I could just use Kruskal-Wallis on and explain why. – Michelle Jun 27 '14 at 17:12
  • The title of the thread just suggested is more specific than the answers, which cover outcomes that include large positive and large negative values. – Nick Cox Oct 02 '20 at 01:10

0 Answers0