2

I have transformed my variables using the ln function in Stata in order to solve some issues relating to the assumptions of the linear regression model. Whilst most issues were resolved this way (and this transformation helps out significantly in this), the data seems to be negatively skewed, resulting in a significant IM test as shown below.

Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
              Source |       chi2     df      p
---------------------+-----------------------------
  Heteroskedasticity |       8.42      7    0.2968
            Skewness |      17.92      3    0.0005
            Kurtosis |       0.51      1    0.4735
---------------------+-----------------------------
               Total |      26.86     11    0.0048

I have previously tried to use mboxcox to find appropriate transformations (my data contains zeros and had to add 1), and I do not find any appropriate transformation apart from the the second and third root for the variables - which is not desirable due to difficulties in interpretation and complications which arise.

Should I be bothered about this skewness issue? Skewness is approx -0.7.

Cesare Camestre
  • 699
  • 3
  • 15
  • 28
  • 1
    What assumptions of the linear regression model (I assume that you're referring to OLS) is violated by skewness in the ? – abaumann May 29 '13 at 15:52
  • The skeweness is affecting the normality of residuals, and also the IM-test above. – Cesare Camestre May 29 '13 at 21:04
  • Sounds to me like you have an identification problem, then. – abaumann May 29 '13 at 21:17
  • I tried using various transformations - boxcox won't work cause of the zeros and even if i take them into consideration the boxcox method suggested using the square/cubic root which could lead to problems in interpretation as indicated above. – Cesare Camestre May 29 '13 at 21:39
  • I don't see that square or cube root is more difficult to interpret than log(something + 1). Note that a skewness of 0.7 is positive, not negative. – Nick Cox May 29 '13 at 22:39
  • I do apologise the skewness is -0.7, hence negative. Yes it would be to complex to have independent variable as a square root, and dependent as a cube root (I used mboxcox). – Cesare Camestre May 30 '13 at 08:55
  • I ran the mboxcox again, and it suggest powers of 0.4 and 0.1 for the independent variable, so even worse than if we had to present the square or cube roots! The interpretation would be complex. Also to run the mboxcox, I had to add 1 to the age where it was 0. – Cesare Camestre May 30 '13 at 09:12
  • Anyone interested in following this might note that OP re-posted at http://stats.stackexchange.com/questions/60431/mboxcox-interpreting-difficult-regressions – Nick Cox May 30 '13 at 16:08

2 Answers2

6

1) OLS regression assumes that the residuals are normally distributed, not that the variables are.

2) Skew of 0.7 is pretty minor

3) The idea of "adding 1" to take the log has problems. Why 1? Why not .1? Or .01? Or, for that matter, 100? These would give different results, perhaps substantially so.

4) Often, transformation isn't the right solution to such problems. You could try robust regression, for instance.

Finally, why don't you tell us more about the details of your problem so that we can help more? What are your DV and IVs? What is your N? What are your hypotheses? What are you trying to find out?

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • 1) agreed 2) before log transformation it was much more 3) if i don't add 1 logs can't be used 4) transformation into logs; coupled with regression using robust standard errors solves most of the problems apart from this skewness 5) its a small dataset of approx 700 investments; dependent variable being the investment in millions of pounds, and one of the independent variables is age (which could be 0); and a number of binary variables. – Cesare Camestre May 29 '13 at 21:07
  • So just to clarify, age could 0 because the investment is done at the incorporation date. I suspect that adding 1 to all of ages would do not harm, but how will I know that. – Cesare Camestre May 29 '13 at 21:37
  • 1) may make the rest of your question moot. 2) OK 3) So, do something else. 4)OK 5) Size of data set is irrelevant. I would try robust regression. – Peter Flom May 30 '13 at 10:14
  • 3) such as? Root does not solve the problem. – Cesare Camestre Jun 05 '13 at 14:53
  • Such as robust regression. – Peter Flom Jun 05 '13 at 17:08
  • And the question remains how will I know whether robust regression solved the problem. – Cesare Camestre Jun 06 '13 at 08:53
  • You would have to look at the literature; it's too big a topic for a comment or even an answer, but generally, this is the sort of problem it is designed to solve – Peter Flom Jun 06 '13 at 10:19
2

So your problem is you have a skewed dependent variable, you want to apply the log transformation but your data contains 0s, and you want to be able to interpret the results. This sounds to me like a good case for not transforming the dependent variable but instead using the log link function. Details can be found here: http://blog.stata.com/2011/08/22/

Maarten Buis
  • 19,189
  • 29
  • 59
  • Poisson regression is not relevant here because the dependent variable is not a count. The independent variables are the ones with 0s. – Cesare Camestre Jun 05 '13 at 14:53
  • Poisson regression is very relevant even if your dependent variable is not a count, as you can read in the link... – Maarten Buis Jun 05 '13 at 15:21