1

Can someone tell me what are the assumptions of unbiasednes for simple probit model like this $ Prob(y=1|x) = G^{-1}(\beta_0 + x\beta) $

I know that dependent variable models are estimated by MLE so are there any assumptions to check like in the case the classical model? So far I know that multicolinearity won't cause bias but can significantly increase the standard errors and therefore mess up the t-statistics.

Maarten Buis
  • 19,189
  • 29
  • 59
m3div0
  • 569
  • 1
  • 7
  • 11
  • 4
    Multicolinearity won't "mess up" the $t$-statistics. They will exactly portray the amount of information present in your data. The fact that you would like there to be more information in your data than what is available does not mean that the $t$-statistics are wrong. – Maarten Buis Dec 26 '14 at 13:18
  • To extend on @Maarten's comment, multicollinearity makes it hard to pick apart the effects of the different predictors. It is hard to discern which predictor is having what effect. As such it is unsurprising that standard errors on their estimated effects are high, and inference hard (low *t*, high *p*). This doesn't make them wrong, just shows the difficulty in estimation in this situation. – Silverfish Dec 26 '14 at 13:48
  • Ok, thanks for clarification. And do you have any hints for the assumptions? Or about what should I be worried the most when doing a research? – m3div0 Dec 26 '14 at 13:58

1 Answers1

1

When designing a probit model, I worry about whether:

A) I have a sufficiently large number of observations (more than whats needed for linear regression).

B) All relevant variables are included, the latent variable model is correct in theory and that I don't have perfect linear dependence between any variables.

C)My explanatory variables are exogenous (maybe only applicable to econometrics)

The last 2 are fairly obvious and apply to the linear regression as well, the first may require some more explanation which is given below.

Let $\theta$ be the true parameter values, $\hat \theta$ be the parameter values estimated by the probit model, and $n$ be the number of observations.

A very important yet subtle point: nothing saying the probit model is unbiased. What we can say is that the probit model is consistent. This means that as the number of observations approaches infinity the parameter estimates will converge in probability to the true parameter values. More compactly

$$ \mathrm{plim}\:\hat \theta = \theta$$
Where $ \mathrm{plim}$ stands for the probability limit.

This is very different from unbiasedness, which is to say; for any finite $n$, $E[\hat \theta]=\theta$.

For this reason it is important that you have a sufficiently large amount of data to do a probit model. The minimal amount of data you need depends on the problem, this link discusses the issue for a logit model, the idea is the same for probit.

As for multicollinearity, In theory it only effects the standard errors of the estimate (i.e. the t-tests). However, in applications it may cause problems with estimation as well. Unlike the linear model, the probit has no closed form solution so it must be estimated numerically. The numerical estimation can become unreliable if the multicolinearity is strong enough.

Zachary Blumenfeld
  • 3,826
  • 1
  • 14
  • 21