I have a Tobit model of the form: \begin{align} Y^*_i &= X_i\beta + \epsilon_i \\ Y_i &= \max(Y^*_i,0). \end{align}
The regressors are one continuous variable and 30 factors (modelling seasonal effects during the day). In my sample there are among $3116$ observations $194$ left-censored and $2922$ uncensored.
I tried the R packages AER
and censReg
. With tobit()
in AER
I get an error message:
Error in solve.default(L %*% V %*% t(L)) :
Lapack routine dgesv: system is exactly singular: U[2,2] = 0
Thus a first question: do I have too many factors? However AER
has a predict function. Given the data $Y_i$ (non-negative) I would like to make a prediction of the censored data - $E[Y|Y>0,X_i]$ - is this what the predict function of AER
gives me?
How can I predict in the package censReg
?
EDIT 1) as Achim Zeileis said, there is a data problem with my factors, they are 0 or NAN too often and the tobit
algorithm cannot find a solution. I have to fix the data.
EDIT 2) Please look at the following example:
library(AER)
N = 10
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
set.seed(100)
x = rnorm(8*N)+1
beta = 5
epsilon = rnorm(8*N,sd = sqrt(1/5))
y.star = x*beta+fcoeff+epsilon ## latent response
y = y.star
y[y<0] <- 0 ## censored response
fit <- tobit(y~0+x+f)
summary(fit)
coef(fit) # very satisfying estimates
my.range = range(y, y.star, predict(fit))
plot(y, ylim=my.range)
lines(predict(fit), col="red")
lines(y.star, col="blue")
As Achim Zeileis writes predict()
predicts $Y^*$ but how can we predict $Y$ efficiently and in an unbiased fashion?