2

The dependent variable of my problem is highly concentrated around zero. Here is a stemplot

  The decimal point is 6 digit(s) to the right of the |

  -2 | 511
  -1 | 92221
  -0 | 87777666666665555555555555555444444444444444444444444444444433333333+2428
   0 | 00000000000000000000000000000000000000000000000000000000000000000000+2113
   1 | 00000000000000000000000000000000000011111111111111111111111111111122+110
   2 | 0000000000111111233333444444555666666677778889
   3 | 00112457778889
   4 | 11233456999
   5 | 0000389
   6 | 01477
   7 | 259
   8 | 033
   9 | 002356
  10 | 9
  11 | 
  12 | 
  13 | 069
  14 | 
  15 | 
  16 | 
  17 | 13

Normally when I have a dependent variable (DV) that looks like this I apply a logarithmic transformation for reasons both economical and mathematical. But obviously this will not work in this case as I have values below zero.

Is there another monotonically increasing transformation function that will reduce the peakedness of this distribution?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
gregmacfarlane
  • 3,242
  • 21
  • 34
  • 3
    Cube root. But that means `sign(x) * abs(x)^(1/3)` or the equivalent in your favourite language. – Nick Cox Aug 19 '13 at 17:45
  • 3
    Why would you transform your DV? It sort of sounds like your reason is "because that's what I'd do if the circumstances were different" ... which doesn't seem like an especially good reason. What's a good reason to transform this particular DV? What does it achieve? – Glen_b Aug 19 '13 at 17:47
  • @Glen_b in my case, I care more about the elasticity with respect to my IV than the effect. Extreme values in the DV can interfere with this. – gregmacfarlane Aug 19 '13 at 17:51
  • Please excuse some level of ignorance on my part with respect to what you're trying to achieve. Is issue that you estimate elasticity in a way that isn't robust to y-values that are a number of sd's from the mean? – Glen_b Aug 19 '13 at 17:56
  • 3
    Elasticity calculation depends on log scale, doesn't it? I don't think any transformation will help there, although a GLM with log link might help. Also, although I gave a specific suggestion, @Glen_b's question of why do you (think you) want to do this is much more fundamental. – Nick Cox Aug 19 '13 at 17:59
  • In my case the DV is a score that I constructed from a series of approximately log-normal distributions: home prices in a ten year period. Some homes lose value, and some homes gain value. But more valuable homes have more to gain or to lose, so I'm right now trying to figure out how robust my analysis is to different specifications. If my IV estimates change sign when I rescale the DV, I'll know that I need to reconsider my DV more fundamentally. I'd actually love to hear your perspectives on this, but it may need to be a new question. – gregmacfarlane Aug 19 '13 at 18:06
  • GLM with log link does not require the data to be positive, just the mean. But if I understand your plot correctly, a strong majority of your data are zero or negative. Is that consistent with elasticity in the sense of http://en.wikipedia.org/wiki/Elasticity_(economics) Are you using a different definition? – Nick Cox Aug 19 '13 at 18:16
  • See also http://meta.stats.stackexchange.com/questions/1707/is-it-reasonable-to-put-try-to-avoid-acronyms-in-the-help-on-asking-questions – Nick Cox Aug 19 '13 at 18:17
  • Since you have values below zeros, why you do not translate your data by subtracting it from the minimum, then you apply your whatever algorithm/ transform? – Ahmad Hassanat Mar 19 '18 at 10:45
  • 1
    If zero makes substantive sense, as it usually does, then that is lost by this procedure. Worse, studies with different empirical minima are hard to compare. – Nick Cox Oct 13 '18 at 11:32

0 Answers0