3
Error in boxcox.default(y ~ x) : response variable must be positive

I am getting this error in R when I am performing a Box-Cox transformation on data.

Why is this error happening? Here is my data.

This is a time series data and I have to perform logarithmic regression of the form:

$$y=a+b(\log x_1)+c(\log x_2)$$

I need to find a, b, c and then, check if any such type of relation exists or not.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Komal
  • 61
  • 1
  • 3
  • 11
  • 1
    As the error message says, you're getting this error because there are negative values in your response vector $y$. When the Box-Cox procedure determines which transformation to use, it uses the [geometric mean](http://en.wikipedia.org/wiki/Geometric_mean) $(y_1\cdot y_2\cdots y_n)^{1/n}$ in the computation. The geometric mean is only defined when all $y_i$ are positive, as taking roots of negative numbers may lead to [imaginary/complex numbers](http://en.wikipedia.org/wiki/Imaginary_number). Therefore all $y_i$ must be positive in order to use Box-Cox. – MånsT Jan 09 '13 at 09:49
  • 1
    You might want to look into the related Yeo-Johnson transformation within the `boxCox` function in the package `car`, and the `yjpower` function in the same package – Glen_b Feb 15 '14 at 07:49

2 Answers2

5

Yes, the boxcox only works with positive values for the response variable $Y$. More details can be found in wikipedia. To workaround this limitation, you can try to predict a shifted version $Y+\mu$ (with $\mu \gt \min Y$) of your variable instead.

A quick code example:

library(MASS)

## Invent example for x and y
y = c(rnorm(100,3,300), rnorm(30,1600,400))
x = 1:length(y)
## Histogram of y shows that y is skewed
hist(y)
## Define parameters for boxcox
eps = 1e-5
n = 100;
mu = seq(-min(y) + eps, max(y), length = n)
lambda = seq(0, 5, length = n)
## Initialize then calculate log likelihood values
lik = matrix(0, n, n)
for (i in 1:n) lik[, i] = boxcox((y + mu[i])~x, lambda = lambda, plotit = FALSE)$y
## Plot log likelihood values
image(lik, xlab = "mu", ylab = "lambda", main = "likelihood")
geneorama
  • 228
  • 1
  • 6
ThePawn
  • 1,091
  • 6
  • 7
  • Thank you ..I will try it..Is the result equivalent to log transformation ? Can you give any link or explanation with sample data.It would be of great help :))) – Komal Jan 09 '13 at 11:52
  • $\lambda = \mu = 0$ will be equivalent to a log transformation but in the general case, it is going to be different. You can try to run the example with any vector $y$ and $x$ that have the same size. – ThePawn Jan 10 '13 at 00:12
  • [Sample data file link](https://dl.dropbox.com/u/53624395/11.csv) : LINK FOR DATA FILE ON WHICH I WANT TO PERFORM THE OPERATION.This is time series data and i have to perform logarithmic regression of form y=a+b(logx1)+c(logx2). and find a,b,c and then check is there any such type of relation exists or not. – Komal Jan 11 '13 at 05:35
  • Please help. I have given the link for the data file. – Komal Jan 11 '13 at 05:37
  • @ThePawn The edit by geneorama appears to be an improvement, but is quite extensive. Can you please double-check that you're happy that it doesn't radically alter the meaning of your answer? – Glen_b Feb 15 '14 at 00:11
0

Zeros will also block the boxcox() function naturally since "response variable must be positive". However when you have a lot of zeros in your data with a specific meaning (the measured event did not occur at all) then it's a good idea to exclude them from the transformation instead of increasing the value by an arbitrary epsilon.

When you add 1 to the zeros then (1^lambda-1)/lambda becomes 0 after the transformation, but becomes 1 when you reverse transform it: (0*lambda+1)^(1/lambda).

When you add a small fraction to it - at large negative lambdas - the value transformed into arbitrary large negative range:

lambdas <- seq(-6,6,0.01)
lambdas <- lambdas[lambdas!=0]
plot(lambdas, (0.1 ^ lambdas  - 1)/lambdas, type="l")

All in all I felt safer to handle separately the zeros when they have specific meaning.

yImI
  • 1
  • 1
    Zeros that mean zeros may need either no transformation at all (just a suitable model) or a transformation that respects them, as already answered. – Nick Cox Feb 25 '20 at 07:53