Goodness of fit for Linear Probability Model (LPM)

Question

I'm running a linear probability model (LPM), i.e. my outcome is binary and I have predictors that are categorical and continuous (I'm aware of some of the pros and cons of using LPM for a binary outcome).

Besides checking the robust standard errors, I was wondering what else I should check for this model.

Since R² is not a good measure for a binary outcome, what can test the goodness of fit for LPM? Unfortunately I couldn't find much on this topic.

Thanks!

Why isn't $R^2$ a good measure? Yes, $R^2$ [loses its usual interpretation when you go nonlinear](https://stats.stackexchange.com/questions/551915/interpreting-nonlinear-regression-r2), such as a logistic regression, but your model is linear. — Dave, Dec 06 '21 at 13:15
What is a linear probability model? What is the optimality criterion used to fit it? If the LPM is just OLS, i.e., minimizes sum of squared errors, then you don't need a goodness of fit test because you already know it doesn't fit---it yields negative probabilities or probabilities > 1. — Frank Harrell, Dec 06 '21 at 13:23

score 2 · Answer 1 · edited Feb 04 '22 at 20:35

I think the answer to your question is to use the 'percent correctly predicted' measure. Quoting directly from Woolridge's textbook:

"Still, there are ways to use the estimated probabilities (even if some are negative or greater than one) to predict a zero-one outcome. As before, let y^i denote the fitted values—which may not be bounded between zero and one. Define a predicted value as y|i 5 1 if y^i $ .5 and y|i 5 0 if y^i , .5. Now we have a set of predicted values, y|i, i 5 1, . . . , n, that, like the yi, are either zero or one. We can use the data on yI and y|i to obtain the frequencies with which we correctly predict yi 5 1 and yi 5 0, as well as the proportion of overall correct predictions. The latter measure, when turned into a percentage, is a widely used goodness-of-fit measure for binary dependent variables: the percent correctly predicted."

score 1 · Accepted Answer · answered Dec 06 '21 at 13:30

1

A goodness of fit test generally refers to comparing the posed model with an ANOVA-type model through replications in the sampling design. This is also referred to a test for lack of fit. When replications do not exist, pseudo-replicates are obtained by grouping observations that are near. The LPM is an OLS model, hence the normal lack-of-fit test is applicable. See for instance https://en.wikipedia.org/wiki/Lack-of-fit_sum_of_squares

answered Dec 06 '21 at 13:30

user277126

1,136
3
9

2

Why would the normal lack of fit test be applicable when LPM gets the variance structure **and** distribution incorrect? Goodness of fit is better assessed through directed assessments, e.g., nonlinearity and non-additivity. Replication is not required. – Frank Harrell Dec 06 '21 at 13:34

Goodness of fit for Linear Probability Model (LPM)

2 Answers2