Can one estimate a probit regression using OLS? Or it has to be done with maximum likelihood?

Question

One could take the inverse cumulative probability distribution function and calculate the probability, and then run OLS. Would that be a mistake or it can be done? What are the consequences?

You cannot do this if your $y$-values (outcomes, dependent variables) are binary. The inverse of the Gaussian CDF is $\pm \infty$ for $0$ and $1$. However, if your outcomes are already probabilities (or can be considered such, e.g. because they are empirical percentages), than you can do it. You should, however, check the residuals for normality and homoscedasticity. — Igor F., Jan 30 '20 at 12:35
@IgorF. the outcomes are already probabilities. Then I can use OLS and it is ok if the residuals are normal and homoskedastic? — adrCoder, Jan 30 '20 at 12:49
See the same question about logistic regression: https://stats.stackexchange.com/questions/326350/what-is-happening-here-when-i-use-squared-loss-in-logistic-regression-setting — kjetil b halvorsen, Jan 30 '20 at 12:49
@adrCoder: Yes, it is conceptually the same as the probit regression, only that your error model is different. By doing OLS, you assume Gaussian errors on the *transformed* data. Only by examining the residuals you can see whether this assumption was correct. — Igor F., Jan 30 '20 at 13:00

score 1 · Answer 1 · answered Jan 30 '20 at 12:18

1

This is not quite what you suggest, but what is sometimes done is to estimate a so-called "linear probability model". That entails ignoring the binary nature of the dependent variable and still run OLS on the $y_i$ directly.

That leads to certain non-desirable effects, mainly that predicted values of such a linear regression may of course leave the unit interval and hence produce predicted probabilities below 0 or above 1.

On the other hand, the fitted partial effects (which for OLS are of course nothing but the slope coefficients) are often very similar to those obtained from a probit model estimated by ML in the range where the regressors have their main support.

Hence, from a practical perspective, the differences can often be modest.

answered Jan 30 '20 at 12:18

Christoph Hanck

25,948
3
57
106

So it is OK to take the inverse normal and run OLS? – adrCoder Jan 30 '20 at 12:24
2

As I wrote, the LPM is a bit different from what you have in mind (that said, I am not sure what you precisely aim to take the inverse normal of) - LPM transforms nothing and runs OLS on the data as it comes in. – Christoph Hanck Jan 30 '20 at 12:27
Ok thank you Christoph. I am taking the inverse of probabilities. – adrCoder Jan 30 '20 at 12:49
@adrCoder I believe you wanted to say that you are applying the inverse of the normal CDF to your outcome values, i.e. $y' = \Phi^{-1}(y)$, right? – Igor F. Jan 30 '20 at 13:11

Can one estimate a probit regression using OLS? Or it has to be done with maximum likelihood?

1 Answers1