Currently I am working on my master this which is about excess returns (Sharpe ratio) of Asian REITs. I just transformed all the data in variables which are ready to use in SPSS. In the panel data there are some missing value due to firms which didn’t had a listning in e.g. 2002 but do in 2005. In this case I left the identifiers (cusip codes) but have a space for missing data.
Before I perform a fixed effect regression (LSDV method) I explore the data for normality, skewness and kurtosis. This is where it where I get confused, because some of the variables show high kurtosis (38.024) and skewness (5.480). I did transformed the by taking the ln of all variables. QQ plots and histograms did improve but not all. But is it wise of me to even took the LN, since some of the values are negative. Taking the LN does result in an empty values. What else can I do to improve skewness and kurtosis? Square root can’t also be taken of negative numbers and if I am correct, if one variable is adjusted, all variables in the data set have to be adjusted.
When analyzing for outliers I use the ‘outlier labeling rule’ suggested by Tukey et al (1986). As in line with what the paper recommends, the G value is set on 2.2. But this result in a awfull large amount of outliers. Although ouliers are not represntive, I am afraid that I have to delete (in some cases) more than 20 obeservations, while the average amount per variable is about 800. I could ease the ‘G’ value. Is this correct?
Thanks in advance
Henk