1

at first I want to mention that I am fully aware of this question here as well as the answers. Still, things won't work out as intended (using R).

I have a lot of hourly rainfall data that include zeros (so those are natural zeros, no errors, no missing data, no over-sensitivity whatsoever). Most of my hourly rainfall data is therefore heavily skewed towards the zeros. For some tests I need, however, +- a normal distribution. I tried fiddling around with log10(rain+1) or log1p(rain) without success. Based on the previous question mentioned above I tried to solve this with a Box-Cox transformation. There are several packages for R that are capable of doing so, but none is working the way I need it:

  • BoxCox from forecast library: really strange histogram with high negative values
  • BoxCox from geoR library: locks the lower bound to 0, but still far from normality
  • BoxCox from EnvStats library: Error: All non-missing, finite values of 'x' must be positive

I have some reproducible code here with all my attempts:

rain <- c(0.5,0.0,2.9,3.7,2.8,0.9,0.0,3.4,0.0,1.7,0.0,9.9,0.0,0.7,0.1,0.0,0.0,0.7,0.0,0.9,16.4,0.2,0.8,0.0,1.8,0.1,11.0,9.9,3.9,0.6,0.0,8.9,4.8,0.0,0.0,1.8,0.8,3.4,0.0,0.0,0.3,9.1,6.6,0.3,0.0,11.7,0.0,0.2,1.1,1.7,0.0,1.0,0.0,0.5,0.0,3.6,3.4,1.3,0.5,2.1,1.8,12.1,0.0,0.0,2.3,2.5,0.2,0.0,0.0,0.0,3.2,0.1,1.4,1.8,9.0,3.1,4.8,0.0,1.3,0.0,8.7,1.7,0.0,2.3,0.0,0.0,0.0,0.0,1.0,4.6,1.9,0.0)
hist(rain)
qqnorm(rain)
qqline(rain)

#### Log transform
rain.log10 <- log10(rain+1)
hist(rain.log10)

rain.log1p <- log1p(rain)
hist(rain.log1p)

##### BoxCox from forecast lib

library(forecast)
lda <- BoxCox.lambda(rain, method=c("guerrero"))

trans.rain <- BoxCox(rain,lda)
hist(trans.rain)

#### BoxCox from geoR lib

library(geoR)
ml <- boxcoxfit(rain, lambda2=TRUE)
ml$lambda
trans2.rain <- dboxcox(rain, lambda=ml$lambda[1], lambda2=ml$lambda[2])
#trans2.rain <- dboxcox(rain, lambda=ml$lambda[1], lambda2=NULL)
hist(trans2.rain)
qqnorm(trans2.rain)
qqline(trans2.rain)

#### BoxCox from EnvStats lib

library(EnvStats)
boxcox(rain)
# Error: All non-missing, finite values of 'x' must be positive
GeoEki
  • 111
  • 2
  • 3
    For what "tests [do you] need... +- a normal distribution"? Your data have "natural zeros, no errors, no missing data, no over-sensitivity whatsoever", so you should use tests & models that are appropriate for those data, not try a transformation so that you can shoehorn something inappropriate. – gung - Reinstate Monica Feb 28 '16 at 17:41
  • Ordinary kriging (if the empirical distribution of the data is skewed then the kriging estimators are highly sensitive to a few large data values); but mainly for regression kriging. – GeoEki Feb 28 '16 at 19:14
  • 1
    Trying to transform to normality any sample that's heavily concentrated on a single value is basically hopeless. What exactly are you trying to figure out? There is probably a reasonable nonparametric procedure that would help. – dsaxton Feb 28 '16 at 19:57
  • 1
    There aren't any nonparametric versions of regression kriging (aka universal kriging or kriging with drift), but there *are* versions that are appropriate for such data (such as spatial GLMs). But please notice at the outset that the assumptions for this form of kriging apply to the *residuals,* not to the data themselves. – whuber Feb 28 '16 at 20:03
  • Yeah I'm aware that those assumptions are only relevant for the residuals. I was just curious if I could also improve my variogram (which I'm not very satisfied with) with transformed data. But the answer from @dsaxton might be true, that heavy concentration on a single value is probably hopeless for such a transformation. – GeoEki Mar 01 '16 at 11:15

0 Answers0