at first I want to mention that I am fully aware of this question here as well as the answers. Still, things won't work out as intended (using R).
I have a lot of hourly rainfall data that include zeros (so those are natural zeros, no errors, no missing data, no over-sensitivity whatsoever). Most of my hourly rainfall data is therefore heavily skewed towards the zeros. For some tests I need, however, +- a normal distribution. I tried fiddling around with log10(rain+1)
or log1p(rain)
without success. Based on the previous question mentioned above I tried to solve this with a Box-Cox transformation. There are several packages for R that are capable of doing so, but none is working the way I need it:
- BoxCox from
forecast
library: really strange histogram with high negative values - BoxCox from
geoR
library: locks the lower bound to 0, but still far from normality - BoxCox from
EnvStats
library:Error: All non-missing, finite values of 'x' must be positive
I have some reproducible code here with all my attempts:
rain <- c(0.5,0.0,2.9,3.7,2.8,0.9,0.0,3.4,0.0,1.7,0.0,9.9,0.0,0.7,0.1,0.0,0.0,0.7,0.0,0.9,16.4,0.2,0.8,0.0,1.8,0.1,11.0,9.9,3.9,0.6,0.0,8.9,4.8,0.0,0.0,1.8,0.8,3.4,0.0,0.0,0.3,9.1,6.6,0.3,0.0,11.7,0.0,0.2,1.1,1.7,0.0,1.0,0.0,0.5,0.0,3.6,3.4,1.3,0.5,2.1,1.8,12.1,0.0,0.0,2.3,2.5,0.2,0.0,0.0,0.0,3.2,0.1,1.4,1.8,9.0,3.1,4.8,0.0,1.3,0.0,8.7,1.7,0.0,2.3,0.0,0.0,0.0,0.0,1.0,4.6,1.9,0.0)
hist(rain)
qqnorm(rain)
qqline(rain)
#### Log transform
rain.log10 <- log10(rain+1)
hist(rain.log10)
rain.log1p <- log1p(rain)
hist(rain.log1p)
##### BoxCox from forecast lib
library(forecast)
lda <- BoxCox.lambda(rain, method=c("guerrero"))
trans.rain <- BoxCox(rain,lda)
hist(trans.rain)
#### BoxCox from geoR lib
library(geoR)
ml <- boxcoxfit(rain, lambda2=TRUE)
ml$lambda
trans2.rain <- dboxcox(rain, lambda=ml$lambda[1], lambda2=ml$lambda[2])
#trans2.rain <- dboxcox(rain, lambda=ml$lambda[1], lambda2=NULL)
hist(trans2.rain)
qqnorm(trans2.rain)
qqline(trans2.rain)
#### BoxCox from EnvStats lib
library(EnvStats)
boxcox(rain)
# Error: All non-missing, finite values of 'x' must be positive