4

I am predicting a binary outcome (e.g., credit default) with logistic regression.

For each observation, in addition to my own observed predictors, I have obtained a probability estimate from an external source (e.g., a likelihood of default from a black-box algorithm). I'd like to test whether incorporating this estimate improves my model.

The approach I am considering is to include this estimate as a predictor in the model and do a $\chi^2$ test for the improvement. Because I am expecting a linear relationship between the estimate and the underlying probability of the outcome, I suppose that rather than $x$ I should include the term transformed as $\ln\frac{x}{1-x}$.

Is this transformation appropriate, and is the overall approach recommended?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
C8H10N4O2
  • 847
  • 8
  • 25
  • this unaccepted [answer](http://stats.stackexchange.com/a/199417/39632) seems to support this approach in the context of a prior distribution - however, in this case I don't have the distribution, just a black-box estimate -- not sure whether that changes anythinng – C8H10N4O2 Jun 16 '16 at 13:55

1 Answers1

1

It should be OK to include a score from an external source as a predictor. This could be useful especially if the people making the external score has access to information not available to you. Remember that the linear predictor in logistic regression is on the scale of log odds, so if the external score is meant to be interpreted as a probability, like a default probability, you could consider first transforming it to log odds via $\log \left( \text{score}/(1-\text{score}) \right) $. Alternative, if the score varies over a large range, you could model it vith a spline term.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467