1

I am doing panel data analysis and some of my variables have high kurtosis. I am not sure whether I have to transform these variables. I have tried to delete outliers but one of the variables still not normal unless I delete many observations which then change the results to insignificant model

note: this variables has a maximum of around 17,000,000 and min of - 5000,000 which i couldn't use log transform

any help is appreciated. thanks

Ben
  • 175
  • 1
  • 3
  • 16
  • Why is it a problem if some variables have high kurtosis? – Glen_b May 19 '14 at 00:45
  • isn't it? so should i just ignore it? thanks for the reply – Ben May 19 '14 at 06:07
  • Neither the x variables nor the raw y-variable are assumed to be be normal. There are things [assumed to be normal](http://en.wikipedia.org/wiki/Panel_data#Analysis_of_panel_data), however (and you probably shouldn't ignore them), but it doesn't sound like you looked at anything that would tell you about them. – Glen_b May 19 '14 at 10:01
  • what exactly do you suggest me to look into to tell me about them? thanks again – Ben May 19 '14 at 13:16
  • The residuals should tell you about the $\nu_{it}$ (for which you might use a Q-Q plot, say, or [with some caution](http://stats.stackexchange.com/questions/51718/assessing-approximate-distribution-of-data-based-on-a-histogram/51753#51753), perhaps a histogram), while the individual random effect estimates would tell you about the $\mu_i$ that they estimate (though typically the number of effects is relatively small, making those hard to assess, and many people don't worry too much if those don't look especially normal). – Glen_b May 20 '14 at 01:20
  • sorry but I didn't understand how can looking into Q_Q plot or histogram reduce the kurtosis. my N=100 and T=5 years that is 500 cases – Ben May 20 '14 at 17:28
  • The aim was not to reduce the kurtosis of y, since I already explained we didn't make any distributional assumption about y. The point was to look at a Q-Q plot of residuals to see how badly non-normal the residuals were, since we make an assumption about the error term, which the residuals estimate. – Glen_b May 20 '14 at 20:32

1 Answers1

0

You don't have to have normally distributed variables to do panel data analysis (and you probably shouldn't drop outliers unless you think for some reason they are measured incorrectly). Fixed effects estimation will consistently estimate the parameters without normality. However, you may need to correct the standard errors that you compute, but most software packages will probably do this for you. For example, R's plm package looks like it will give you the correct standard errors by default. See Section 3.4 of the vignette: http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf

bmciv
  • 576
  • 2
  • 4
  • that was very helpful. thank you so much @bmciv Do you have any source where I can read more about it because most of the panel data articles that I've read just don't mention the issue of normality. I corrected for standard errors using Driscoll-Kraay Fixed effect by Stata. – Ben May 21 '14 at 13:48
  • @Ben The reason why most panel data articles don't mention normality is that it is a very strong assumption turns out is not needed as long as you have a large number of observations. I don't know of a reference that deals with this for the particular case of panel data models, but you could check out Bruce Hansen's Econometrics text book [link](http://www.ssc.wisc.edu/~bhansen/econometrics/Econometrics.pdf) and compare the assumptions made in the Normal regression model (Sec. 3.18) to the weaker OLS assumptions (Sec 4.3). The same reasoning would apply to panel data models. – bmciv May 21 '14 at 15:20