0

I'm running a linear probability model (LPM), i.e. my outcome is binary and I have predictors that are categorical and continuous (I'm aware of some of the pros and cons of using LPM for a binary outcome).

Besides checking the robust standard errors, I was wondering what else I should check for this model.

Since R² is not a good measure for a binary outcome, what can test the goodness of fit for LPM? Unfortunately I couldn't find much on this topic.

Thanks!

  • 1
    Why isn't $R^2$ a good measure? Yes, $R^2$ [loses its usual interpretation when you go nonlinear](https://stats.stackexchange.com/questions/551915/interpreting-nonlinear-regression-r2), such as a logistic regression, but your model is linear. – Dave Dec 06 '21 at 13:15
  • 2
    What is a linear probability model? What is the optimality criterion used to fit it? If the LPM is just OLS, i.e., minimizes sum of squared errors, then you don't need a goodness of fit test because you already know it doesn't fit---it yields negative probabilities or probabilities > 1. – Frank Harrell Dec 06 '21 at 13:23

2 Answers2

2

I think the answer to your question is to use the 'percent correctly predicted' measure. Quoting directly from Woolridge's textbook:

"Still, there are ways to use the estimated probabilities (even if some are negative or greater than one) to predict a zero-one outcome. As before, let y^i denote the fitted values—which may not be bounded between zero and one. Define a predicted value as y|i 5 1 if y^i $ .5 and y|i 5 0 if y^i , .5. Now we have a set of predicted values, y|i, i 5 1, . . . , n, that, like the yi, are either zero or one. We can use the data on yI and y|i to obtain the frequencies with which we correctly predict yi 5 1 and yi 5 0, as well as the proportion of overall correct predictions. The latter measure, when turned into a percentage, is a widely used goodness-of-fit measure for binary dependent variables: the percent correctly predicted."

Karolis Koncevičius
  • 4,282
  • 7
  • 30
  • 47
Smallex
  • 21
  • 2
1

A goodness of fit test generally refers to comparing the posed model with an ANOVA-type model through replications in the sampling design. This is also referred to a test for lack of fit. When replications do not exist, pseudo-replicates are obtained by grouping observations that are near. The LPM is an OLS model, hence the normal lack-of-fit test is applicable. See for instance https://en.wikipedia.org/wiki/Lack-of-fit_sum_of_squares

user277126
  • 1,136
  • 3
  • 9
  • 2
    Why would the normal lack of fit test be applicable when LPM gets the variance structure **and** distribution incorrect? Goodness of fit is better assessed through directed assessments, e.g., nonlinearity and non-additivity. Replication is not required. – Frank Harrell Dec 06 '21 at 13:34