Are consistently negative Efron's pseudo-r2 in logistic regression possible?

Question

I am conducting logistic regression and looking to calculate pseudo-R2 values alongside AIC and BIC for model evaluation. I selected Efron's pseudo-R2 because of its simple calculation and the similarity to a proper R2 value. When I run a series of logistic regressions, it produces a negative value. However, it is commonly stated that pseudo-R2 values fall between 0 and 1 (example here). Am I calculating something wrong, or is this 0 to 1 range false for Efron's pseudo-R2?

The equation I am using for Efron's pseudo-R2 is: $ R^2 = 1 - \frac{\sum(y_i-\pi_i)^2}{\sum(y_i-\bar y)^2} $

Where:

$ y $ is an array of 1s and 0s, representing the true outcome labels in the data
$ \pi $ is an array of 1s and 0s, representing the predicted outcome labels as a result of the logistic regressions
$ \bar y $ is the arithmetic mean of $ y $, calculated as $ \frac{\sum y_i}{n} $ and equivalent to $ p $ or the probability of a 1 outcome

This seems to be the definition consistently given online. (I don't have access to the foundational journal article.)

Looking at the equation, this appears to be happening for three reasons: (1) because of large sample size (n = 4,000), (2) relative inaccuracy of the logistic regression, and (3) the fact that the value for p (mean y = ) is close to 0.5. The large sample size combined with frequent error blow up the numerator and having p close to 0.5 shrinks the denominator. Indeed, when I calculate the value on a subset of my data (about the first 100 rows), I receive a positive pseudo-R2.

Again, though, pseudo-R2 are discussed as being consistently between 0 and 1. It seems there are two possibilities: (1) this range is a simplification and it is the case that a bad model can be negative (as with a conventional R2) or (2) there is something wrong with my understanding.

More information on implementation, although I believe my implementation is correctly reflecting the equation above: I am using the python sklearn implementation, which has limited metrics for inference, so I am writing functions to calculate pseudo-R2 values. I wrote the function based on the formula above and confirmed another function from this site returns the same values, leading me to believe this is a characteristic of the metric and not of an incorrect implementation of the equation.

score 8 · Accepted Answer · answered Feb 10 '22 at 21:29

8

Your problem is here:

$\pi$ is an array of 1s and 0s, representing the predicted outcome labels as a result of the logistic regressions.

That's incorrect. The $\pi$ values should be the predicted probabilities of class membership returned by logistic regression. See the explanation of the formula in the table on the UCLA web page that you cite.

Logistic regression does not return class membership assignments. It sometimes appears to, based on a hidden assumption of p = 0.5 for categorization after modeling. But even if strict assignments are needed, that's not necessarily the best cutoff choice.

Your formula is related to the accuracy of the classification (at the chosen probability cutoff) which is not a good choice for evaluating classification models of any sort. The numerator of the second term in that formula, with $\pi$ values correctly taken as probabilities, is the basis of the Brier score, the mean-square error between observations and predicted probabilities. That's a strictly proper scoring rule, which you might consider using on its own. Chapter 8 of Frank Harrell's course notes covers several ways to evaluate the quality of logistic regression models.

answered Feb 10 '22 at 21:29

EdM

57,766
7
66
187

Thanks @EdM, very helpful clarification and this is helping tease out my misunderstandings about logistic regression. For 100% clarity - the pi probabilities I want would be the probability of the model returning 1 correct? (The sklearn implementation gives for for both 0 and 1.) – stattletail Feb 10 '22 at 21:38
2

Even with this correction (+1), it is still possible to get negative pseudo-$R^2$ especially with out-of-sample forecasting. Suppose that the $y_i$ are actually unpredictable and iid have probability $\frac12$ of being $1$ or $0$; then any model that wrongly thinks it can predict informative probabilities (perhaps as a result of noise in the training data) has a negative expected pseudo-$R^2$. – Henry Feb 10 '22 at 22:02
1

@stattletail if y=1 is "success" then yes you want the probability that y =1. As long as you are internally consistent you should be OK. To double-check, make sure that near-perfect predictions give small (near 0) values in the numerator of the second term of the formula. That said, I think you would be much better off using an alternate measure of model quality, as Harrell discusses in the linked document. Also, try a search on this site. – EdM Feb 10 '22 at 22:40
@Henry your point was my initial guess as to the problem, until I read the details in the question. I agree completely. – EdM Feb 10 '22 at 22:42
Thanks both - very helpful! – stattletail Feb 10 '22 at 23:11

Dave · Answer 2 · 2022-02-20T03:37:12.427

I think it’s important to remember what $R^2$ means in the linear case.

$$R^2= 1-\dfrac{ \sum\bigg( y_i-\hat y_i \bigg)^2 }{\sum\bigg( y_i-\bar y \bigg)^2 } $$

If we want to measure our ability to predict conditional means by how low of a square loss we have, we better have lower square loss than the naïve model that guesses $\bar y$ every time!

This is exactly what is going on in the pseudo-$R^2$. If you do a worse job of predicting the conditional probability (not label) than a naïve model that always predicts the overall prevalence, then the numerator is larger than the denominator, resulting in pseudo-$R^2<0$. In the case of a probability model, square loss is called Brier score and is not the usual loss function. Brier score is, however, a strictly proper scoring rule, which means, a little loosely speaking, that it seeks out the true conditional probability values.

The typical loss function in logistic regression is log loss, which corresponds to maximum likelihood estimation of the coefficients. It makes sense to compare the log loss values in a similar way. This is McFadden’s $R^2$.

Indeed, I say that it always makes sense to compare how your model does on a loss function of interest compared to some baseline model. In OLS linear regression, there is a convenient interpretation about the “proportion of variance explained”, but even if we lack such an interpretation, comparing our performance to the performance of a baseline model gives us some idea of if our model provides value.

UCLA has a nice webpage about $R^2$-style metrics for probability models like logistic regression.

Vanderbilt's Frank Harrell has some thoughts on how to measure the value added by a model.

See some related ideas [here](https://fharrell.com/post/addvalue). — Frank Harrell, Feb 11 '22 at 13:36

Are consistently negative Efron's pseudo-r2 in logistic regression possible?

2 Answers2

Linked