Censored regression in R

Question

I have a Tobit model of the form: \begin{align} Y^*_i &= X_i\beta + \epsilon_i \\ Y_i &= \max(Y^*_i,0). \end{align}

The regressors are one continuous variable and 30 factors (modelling seasonal effects during the day). In my sample there are among $3116$ observations $194$ left-censored and $2922$ uncensored.

I tried the R packages AER and censReg. With tobit() in AER I get an error message:

Error in solve.default(L %*% V %*% t(L)) : 
Lapack routine dgesv: system is exactly singular: U[2,2] = 0

Thus a first question: do I have too many factors? However AER has a predict function. Given the data $Y_i$ (non-negative) I would like to make a prediction of the censored data - $E[Y|Y>0,X_i]$ - is this what the predict function of AER gives me?

How can I predict in the package censReg?

EDIT 1) as Achim Zeileis said, there is a data problem with my factors, they are 0 or NAN too often and the tobit algorithm cannot find a solution. I have to fix the data.

EDIT 2) Please look at the following example:

library(AER)
N = 10
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
set.seed(100) 
x = rnorm(8*N)+1
beta = 5
epsilon = rnorm(8*N,sd = sqrt(1/5))
y.star = x*beta+fcoeff+epsilon ## latent response
y = y.star 
y[y<0] <- 0 ## censored response

fit <- tobit(y~0+x+f)
summary(fit)

coef(fit) # very satisfying estimates

my.range = range(y, y.star, predict(fit))

plot(y, ylim=my.range)
lines(predict(fit), col="red")
lines(y.star, col="blue")

As Achim Zeileis writes predict() predicts $Y^*$ but how can we predict $Y$ efficiently and in an unbiased fashion?

Have you tried the VGAM package, something like `vgam(y ~ factors, family=tobit(Lower=0))`? Not sure how to deal with a mix of censored and un-censored though — kezz_smc, Apr 30 '15 at 16:37
Can you post a reproducible example? I suspect that at least one of your regressors is not identified. If your factors are dummy variables, maybe there is one dummy for which all observations are censored? Or something like that. And as for the `predict()` method: This produces the latent mean of Y*. But you can easily obtain the conditional mean by hand. With a reproducible example, I can show you how. The `crch()` function in the package of the same name has a somewhat more flexible `predict` method. — Achim Zeileis, Apr 30 '15 at 16:54
@AchimZeileis I edited the question. You are right about the reason for the error message. Could you briefly comment on how to predict $Y$? I think I need one of the expressions including the inverse Mills ration as explained [here](https://files.nyu.edu/mrg217/public/tobit1.pdf) Thanks! — Richi W, May 03 '15 at 17:20
@Sha Both the `AER` and the `crch` packages work with all R 3.x.y versions. CRAN even hosts current Windows binaries for R 3.3.y. — Achim Zeileis, Jul 31 '17 at 10:58
How would you compute the censored y with this method if it is doubly censored >=0 and <=1? — sgfPLP, Feb 04 '21 at 05:09

score 8 · Accepted Answer · edited May 03 '15 at 18:57

8

The predict() and fitted() methods for tobit objects compute the estimates for the latent mean $\mu = E[y^*] = x^\top \beta$. Additionally, the scale parameter $\sigma$ is available as $scale in the objects:

mu <- fitted(fit)
sigma <- fit$scale

The probability of a non-zero observation is then $P(y > 0) = \Phi(\mu/\sigma)$, i.e.:

p0 <- pnorm(mu/sigma)

The conditional expectation of the censored $y$ given that it is non-zero is $E[y | y > 0] = \mu + \sigma \cdot \lambda(\mu/\sigma)$, where $\lambda(\cdot)$ is the inverse Mills ratio $\lambda(x) = \phi(x)/\Phi(x)$:

lambda <- function(x) dnorm(x)/pnorm(x)
ey0 <- mu + sigma * lambda(mu/sigma)

Finally, the unconditional expectation is $E[y] = P(y > 0) \cdot E[y | y > 0]$, i.e.:

ey <- p0 * ey0

If you want to visualize everything together in a time series style plot:

plot(y, ylim = my.range)
lines(mu, col = "slategray")
lines(y.star, col = "black")
lines(ey0, col = "green")
lines(ey, col = "blue")

The reason that the predict() method for tobit objects does not provide all of this automatically is that for all the distributions other than the normal / Gaussian, the relationship is not that easy. But maybe we should at least support the normal case.

edited May 03 '15 at 18:57

gung - Reinstate Monica

132,789
81
357
650

answered May 03 '15 at 18:54

Achim Zeileis

13,510
1
29
53

2

What if I want to predict new data in the sense that you describe ? Then I replace the line mu – Richi W May 04 '15 at 06:34
3

Yes. In the standard `tobit()` model, the standard deviation of the underlying Y* does not depend on any covariates and is hence constant. With the `crch` package you can also estimate heteroscedastic tobit models if desired. – Achim Zeileis May 04 '15 at 07:04
Thank you for the explanation. This helps a lot! May I ask how to proceed when there are two limits, e.g. a lower limit at -0.3 and an upper limit at 1? – Nerd Feb 23 '22 at 18:09
1

The probability $P(a < Y < b)$ is then $\Phi((b - \mu)/\sigma) - \Phi((a - \mu)/\sigma)$. The conditional expectation needs to be adapted accordingly. To work out the details, this discussion might help: https://stats.stackexchange.com/questions/166273/expected-value-of-x-in-a-normal-distribution-given-that-it-is-below-a-certain-v – Achim Zeileis Feb 23 '22 at 21:50

Censored regression in R

1 Answers1

Linked