Error in boxcox.default(y ~ x) : response variable must be positive

Question

Error in boxcox.default(y ~ x) : response variable must be positive

I am getting this error in R when I am performing a Box-Cox transformation on data.

Why is this error happening? Here is my data.

This is a time series data and I have to perform logarithmic regression of the form:

$$y=a+b(\log x_1)+c(\log x_2)$$

I need to find a, b, c and then, check if any such type of relation exists or not.

As the error message says, you're getting this error because there are negative values in your response vector $y$. When the Box-Cox procedure determines which transformation to use, it uses the [geometric mean](http://en.wikipedia.org/wiki/Geometric_mean) $(y_1\cdot y_2\cdots y_n)^{1/n}$ in the computation. The geometric mean is only defined when all $y_i$ are positive, as taking roots of negative numbers may lead to [imaginary/complex numbers](http://en.wikipedia.org/wiki/Imaginary_number). Therefore all $y_i$ must be positive in order to use Box-Cox. — MånsT, Jan 09 '13 at 09:49
You might want to look into the related Yeo-Johnson transformation within the `boxCox` function in the package `car`, and the `yjpower` function in the same package — Glen_b, Feb 15 '14 at 07:49

score 5 · Accepted Answer · edited Feb 15 '14 at 00:09

5

Yes, the boxcox only works with positive values for the response variable $Y$. More details can be found in wikipedia. To workaround this limitation, you can try to predict a shifted version $Y+\mu$ (with $\mu \gt \min Y$) of your variable instead.

A quick code example:

library(MASS)

## Invent example for x and y
y = c(rnorm(100,3,300), rnorm(30,1600,400))
x = 1:length(y)
## Histogram of y shows that y is skewed
hist(y)
## Define parameters for boxcox
eps = 1e-5
n = 100;
mu = seq(-min(y) + eps, max(y), length = n)
lambda = seq(0, 5, length = n)
## Initialize then calculate log likelihood values
lik = matrix(0, n, n)
for (i in 1:n) lik[, i] = boxcox((y + mu[i])~x, lambda = lambda, plotit = FALSE)$y
## Plot log likelihood values
image(lik, xlab = "mu", ylab = "lambda", main = "likelihood")

edited Feb 15 '14 at 00:09

geneorama

228
1
6

answered Jan 09 '13 at 07:51

ThePawn

1,091
6
7

Thank you ..I will try it..Is the result equivalent to log transformation ? Can you give any link or explanation with sample data.It would be of great help :))) – Komal Jan 09 '13 at 11:52
$\lambda = \mu = 0$ will be equivalent to a log transformation but in the general case, it is going to be different. You can try to run the example with any vector $y$ and $x$ that have the same size. – ThePawn Jan 10 '13 at 00:12
[Sample data file link](https://dl.dropbox.com/u/53624395/11.csv) : LINK FOR DATA FILE ON WHICH I WANT TO PERFORM THE OPERATION.This is time series data and i have to perform logarithmic regression of form y=a+b(logx1)+c(logx2). and find a,b,c and then check is there any such type of relation exists or not. – Komal Jan 11 '13 at 05:35
Please help. I have given the link for the data file. – Komal Jan 11 '13 at 05:37
@ThePawn The edit by geneorama appears to be an improvement, but is quite extensive. Can you please double-check that you're happy that it doesn't radically alter the meaning of your answer? – Glen_b Feb 15 '14 at 00:11

score 0 · Answer 2 · answered Feb 25 '20 at 06:53

Zeros will also block the boxcox() function naturally since "response variable must be positive". However when you have a lot of zeros in your data with a specific meaning (the measured event did not occur at all) then it's a good idea to exclude them from the transformation instead of increasing the value by an arbitrary epsilon.

When you add 1 to the zeros then (1^lambda-1)/lambda becomes 0 after the transformation, but becomes 1 when you reverse transform it: (0*lambda+1)^(1/lambda).

When you add a small fraction to it - at large negative lambdas - the value transformed into arbitrary large negative range:

lambdas <- seq(-6,6,0.01)
lambdas <- lambdas[lambdas!=0]
plot(lambdas, (0.1 ^ lambdas  - 1)/lambdas, type="l")

All in all I felt safer to handle separately the zeros when they have specific meaning.

Zeros that mean zeros may need either no transformation at all (just a suitable model) or a transformation that respects them, as already answered. — Nick Cox, Feb 25 '20 at 07:53

Error in boxcox.default(y ~ x) : response variable must be positive

2 Answers2

Linked

Related