monotonic transformation, probit vs logit

Question

In my firm I am developing a model using a probit model. I noticed that when benchmarking with a logit specification, the logit slightly improves the model goodness-of-fit.

Talking with a colleague he argued that, this is purely luck because the probit and logit are very similar. He also said that I can apply a monotonic transformation to my data and get better results with a probit.

What's the intuition behind this argument?

Details: I regress a binary variable taking the value 1 if an individual is in financial distress. The data cover a period of 20 years for 1000 individuals. The probit function has the following form $P(Y=1|X)=\Phi(X\beta)$ while the logit function is given by $P(Y=1|X)=\frac{1}{1+e^{-X\beta}}$. The explanatory variables are some macro variables such as GDP, unemployment etc. I computed then an average for the actual vs predicted values across all individuals and for each year. I could then compute a $R^2$ given by the correlation between actual and predicted values squared. When repeating the process with logit I noticed a slightly better increase

You must give more details for this to be answerable. Is this multinomial logit/probit? — kjetil b halvorsen, Apr 09 '17 at 08:34
@kjetilbhalvorsen thanks for your comment, what do you have in mind? — branchwarren, Apr 09 '17 at 10:07
Well, what does your data represent? Sample sizes? Number of predictor variables? binomial logit/probit or multinomial logit/probit? What is your real problem? — kjetil b halvorsen, Apr 09 '17 at 10:29

kjetil b halvorsen · Accepted Answer · 2020-05-19T03:33:39.563

Does this mean that you have data for each individual each of the twenty years? In that case, you should account for that in the modeling (with time series methods or, maybe, a random effect for each person). Apart from that, sample size here is sufficiently large that the better fit with logit just might be real. If it gives better prediction is more doubtful.

My reason for saying so is the following: The logit function approaches its asymptotes much more slowly than the probit. We can take logit coefficients and divide by approximately 1.6 to get probit coefficients, see http://andrewgelman.com/2006/06/06/take_logit_coef/ Then we can make the following plot comparing the two models:

I once had a large bioassay dataset where probit fit (marginally) better than logit. In that case the explanation is clear: the probit, going faster close to $0 / 1$, models better that above certain toxicity level, all the organisms die (and below certain level, there is no toxicity at all). In your case it is the opposite: The covariables can never predict with certainty a default (or not), so the probit gives oversecure predictions, and for that reason the logit is better. With your kind of data I would use the logistic fit for risk calculations in future, even if the probit had happened to fit better under training! (that of course is a kind of Bayesian thinking, if the data goes against prior information, sometimes it is better to stick to the prior!)

Also look at Difference between logit and probit models

thanks for your answer. first yes I considered using a fixed or random effects but the improvements are marginal and I prefer to stay with a simple model. Second, it's not clear for me why in my case the logit performs better (it performs better in 2008 for example, during the financial crisis the goodness-of-fit between the actual vs predicted curves is better for the logit specification, the probit tends to underestimate the 2008 level compared to the logit). do you have any clue why my colleague mentioned the monotonic transformation? — branchwarren, Apr 09 '17 at 11:13
I dont understand what he meant by applying a monotonic transformation. Ask him! I tried to explain why the logit should be expected to fit better: The probit goes to the asymptote (and certain predictions) **way to fast** — kjetil b halvorsen, Apr 09 '17 at 11:16

monotonic transformation, probit vs logit

1 Answers1

Linked