14

Given a data set with binary outcomes $y\in\{0,1\}^n$ and some predictor matrix $X\in\mathbb{R}^{n\times p}$, the standard logistic regression model estimates coefficients $\beta_{MLE}$ which maximize the binomial likelihood. When $X$ is full rank $\beta_{MLE}$ is unique; when perfect separation is not present, it is finite.

Does this maximum likelihood model also maximize the ROC AUC (aka $c$-statistic), or does there exist some coefficient estimate $\beta_{AUC} \neq \beta_{MLE}$ which will obtain a higher ROC AUC? If it is true that the MLE does not necessarily maximize ROC AUC, then another way to look at this question is "Is there an alternative to likelihood maximization which will always maximize ROC AUC of a logistic regression?"

I am assuming that models are otherwise the same: we're not adding or removing predictors in $X$, or otherwise changing the model specification, and I'm assuming that the likelihood-maximizing and AUC-maximizing models are using the same link function.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • 2
    Surely $\beta_{\text{AUC}} \neq \beta_{\text{MLE}}$ if, e.g., some link function generates a better fit than a logit? Other than that, good question, if the data generating process can be assumed as logit. – runr Aug 19 '19 at 13:15
  • Good question but consider this. ROC and AUC are used to compare two different models, so if a solution for the MLE estimation of any model is unique, this means that you can get a different AUC only if you change the specification of the current model and you estimate a new different model via MLE. So at this point another question would be: is there any other “better” estimation method (maximization algorithm ecc) other than the simple MLE applicable to the same model such that I get to different estimates of the coefficients leading to new “better” betas with higher AUC? – Fr1 Aug 19 '19 at 13:18
  • @Nutle exactly, that would be a different specification – Fr1 Aug 19 '19 at 13:20
  • @Fr1 Yes, that is what unique means. What I'm implying in my question is something like "what if there is some alternative to the MLE which achieves a higher AUC?" If it is true that there is a different linear model (a model other than the MLE) which achieves a higher AUC, then that would be interesting to know about. – Sycorax Aug 19 '19 at 13:22
  • @Nutle: I assume the OP means $\beta_{AUC}$ *in which the basic model is the same*. Otherwise "maybe probit" is a sufficient answer :) – Cliff AB Aug 19 '19 at 13:27
  • @Nutle This is why I edited my question to specify "I am assuming that the likelihood-maximizing and AUC-maximizing models are using the same link function." – Sycorax Aug 19 '19 at 13:29
  • @Sycorax exactly this is a tremendously good question. I think the answer may lie into the numerical optimization, which is stuff that I often leave to my friends mathematicians because they know the answer better than I know. In this case indeed, we could say: “given that there is a hidden true relationship between predictors and dependent variable, is the MLE the best possible way to retrieve this true relationship?” Good question. This conceptually would likely involve finding a new way of defining a cost function for the optimization and/or finding better optimizations methods. – Fr1 Aug 19 '19 at 13:30
  • For example, instead of finding the coeff that maximize the likelihood, find the coeff that minimize a certain cost function that performs particularly well with a 1-0 dep variable, based on simulated data.. I am just thinking aloud, because the question is interesting. Clearly I do not know the answer, but I think there are people researching into it. At least I know people researching into new numerical methods, but the two things are not exactly the same – Fr1 Aug 19 '19 at 13:32
  • 1
    @Sycorax what else do we assume?:) Assumptions are important, since if we _know_ the true DGP with link and variables used, the MLE is uniformly most powerful unbiased statistic. – runr Aug 19 '19 at 13:34
  • @Nutle I don't think that further assumptions are necessary. If your observation about the MLE's nice properties in certain settings are relevant, then perhaps we could use that observation to show that the MLE corresponds to the highest AUC among linear models with a certain link. I guess I would hope that the MLE also implies you have the best AUC among that family of models, but it's not obvious to me why that must be the case. – Sycorax Aug 19 '19 at 13:39

1 Answers1

12

It is not the case that $\beta_{MLE} = \beta_{AUC}$.

To illustrate this, consider that AUC can written as

$P(\hat y_1 > \hat y_0 | y_1 = 1, y_0 = 0)$

In otherwords, the ordering of the predictions is the only thing that affects AUC. This is not the case with the likelihood function. So as a mental exercise, suppose we had a single predictors and in our dataset, we don't see perfect separation (i.e., $\beta_{MLE}$ is finite). Now, if we simply take the value of the largest predictor and increase it by some small amount, we will change the likelihood of this solution, but it will not change the AUC, as the ordering should remain the same. Thus, if the old MLE maximized AUC, it will still maximize AUC after changing the predictor, but will no longer maximize the likelihood.

Thus, at the very least, it is not the case that $\beta_{AUC}$ is not unique; any $\beta$ that preserves the ordering of the estimates achieves the exact same AUC. In general, since the AUC is sensitive to different aspects of the data, I would believe that we should be able to find a case where $\beta_{MLE}$ does not maximize $\beta_{AUC}$. In fact, I'd venture a guess that this happens with high probability.

EDIT (moving comment into answer)

The next step is to prove that the MLE doesn't necessarily maximize the AUC (which isn't proven yet). One can do this by taking something like predictors 1, 2, 3, 4, 5, 6, $x$ (with $x > 6$) with outcomes 0, 0, 0, 1, 1, 1, 0. Any positive value of $\beta$ will maximize the AUC (regardless of the value of $x$), but we can chose an $x$ large enough that the $\beta_{MLE} < 0$.

Cliff AB
  • 17,741
  • 1
  • 39
  • 84
  • 2
    (+1) Ah! Of course -- since it's about ordering, we could arbitrarily change the intercept which obviously must change the likelihood value, but the ordering must be the same because none of the feature coefficients have changed, so the AUC will remain fixed. – Sycorax Aug 19 '19 at 13:42
  • +1. Does the _edit_ example work with $n \rightarrow \infty$, though? If we need to take large enough $x$ for this to work with large $n$, doesn't the probability of such values existing quickly converge to 0, for some fixed logit? – runr Aug 19 '19 at 14:17
  • @Nutle: well, depends what you mean about $n \rightarrow \infty$. If we took $n$ copies (predictors + outcomes) of my toy dataset, then yes the result would hold. However, if we took $n$ copies of those set of predictors, and the data really came from a logistic regression model, that would almost never happen (as you point out). Note, however, that something akin to this *could* happen with high probability if the relation between the predictors didn't really follow a logistic regression model. – Cliff AB Aug 19 '19 at 14:28
  • Yes, thanks, was talking about the size. So, assuming such heavy tailed distribution is known, would the example still hold if the MLE estimate was adjusted for the true distribution? What I'm going for, is if the probability of such $x$ existing for any sample $n$ is not approaching 0, shouldn't the MLE estimate react to it accordingly and not act as it would with an outlier? Sorry If I'm not entirely clear here with the wording – runr Aug 19 '19 at 14:37