High kurtosis, skewness and outliers

Question

Currently I am working on my master this which is about excess returns (Sharpe ratio) of Asian REITs. I just transformed all the data in variables which are ready to use in SPSS. In the panel data there are some missing value due to firms which didn’t had a listning in e.g. 2002 but do in 2005. In this case I left the identifiers (cusip codes) but have a space for missing data.

Before I perform a fixed effect regression (LSDV method) I explore the data for normality, skewness and kurtosis. This is where it where I get confused, because some of the variables show high kurtosis (38.024) and skewness (5.480). I did transformed the by taking the ln of all variables. QQ plots and histograms did improve but not all. But is it wise of me to even took the LN, since some of the values are negative. Taking the LN does result in an empty values. What else can I do to improve skewness and kurtosis? Square root can’t also be taken of negative numbers and if I am correct, if one variable is adjusted, all variables in the data set have to be adjusted.

When analyzing for outliers I use the ‘outlier labeling rule’ suggested by Tukey et al (1986). As in line with what the paper recommends, the G value is set on 2.2. But this result in a awfull large amount of outliers. Although ouliers are not represntive, I am afraid that I have to delete (in some cases) more than 20 obeservations, while the average amount per variable is about 800. I could ease the ‘G’ value. Is this correct?

Thanks in advance

Henk

Regression does not require that *data* be normal, it requires that residuals be normal. — Peter Flom, Apr 17 '13 at 21:14
possible duplicate of [Regression on a non-normal dependent variable](http://stats.stackexchange.com/questions/11256/regression-on-a-non-normal-dependent-variable) — Peter Flom, Apr 17 '13 at 21:15
It's not clear to me if you are talking about your *response variable* being possibly non-normal, or your *explanatory variables*. 1st, @PeterFlom is right that the marginal distribution of your response is irrelevant (it may help you to read my answer here: [what-if-residuals-are-normally-distributed-but-y-is-not](http://stats.stackexchange.com/questions/12262//33320#33320)). 2nd, regression makes no assumptions about the distributions of your X variables (although you may worry that some observations have excess leverage & could end up driving your results). — gung - Reinstate Monica, Apr 17 '13 at 21:39
The returns (R), which eventually represent the response variable is not normal distributed. But this is in line with the results of Liow and Sim (2006). The skewness and kurtosis values do not deviate that significantly from normality. The actual response variable however, the Sharpe ratio ((R1-Rf)/st. dev)) has a kurtosis ranging from 11.120 to 239.436 and skewness of -3.002 to -14.975. But for clarification, in my question I was referring to the explanatory variables. Out of 7 explanatory variables, 3 show signs of non-norm. Even after taking LN. — Henk, Apr 18 '13 at 11:00
A bit more explicit; I want to explain which variables explain the risk adjusted returns (sharpe ratio). — Henk, Apr 18 '13 at 11:09

High kurtosis, skewness and outliers

0 Answers0