4

I'm designing a logistic regression model to predict hospital mortality.

Why? To identify 'adjusted' odds ratios for a variable of interest on mortality.

Methods: - set up using a training dataset (75% of total)

  1. I have started with 19 variables (dataset 1684 observations).
  2. Included all variables with p<0.2 from univariate analysis
  3. Using stepwise selection (stepAIC function in MASS package (R))
  4. Testing for confounding using interaction terms for variables in later models

When I run predictions on the test cohort (25%), I get the following model diagnostics:

  • Sensitivity 12%
  • Specificity 95%
  • Accuracy 78%

Looking at the confusion matrix, the model is predicting the outcome to be the largest class - leading to a high accuracy but very poor model overall.

How can I improve the model?

Possible solutions?

  1. Go back to drawing board and find 'better' variables that may be predictive of mortality?
  2. Balance the data in the training data set via up/down sampling?
  • 1
    How many events do you have? What do you mean by “the model is predicting the outcome to be the largest class”? – Todd D Jul 22 '20 at 16:34

1 Answers1

4

I am almost certain that your logistic regression does not predict only one outcome, i.e., a probability of $\hat{p}_i=0$ or $\hat{p}_i=1$ for the target class for all instances $i$. Rather, it predicts some $\hat{p}_i\in[0,1]$, which you then compare to a threshold $\theta$, which you chose in some way. Possibly, you use $\theta=0.5$. You then label instance $i$ as "target class" or "non-target class" based on $\hat{p}_i$ and $\theta$. And it happens that $\hat{p}_i\geq\theta$ for all $i$ (or, equivalently, $\hat{p}_i\leq\theta$ for all $i$).

The solution to your conundrum is not to use a threshold and hard classification at all, but to deal directly with the probabilistic classification given by $\hat{p}$. More information can be found at Reduce Classification Probability Threshold. I also recommend Why is accuracy not the best measure for assessing classification models?, because every criticism leveled there at accuracy applies equally to precision, recall etc.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • 1
    We really need a canonical thread for these questions. [old-man-yells-at-cloud.png] – Sycorax Jul 22 '20 at 16:53
  • 2
    The use of p-values to select variables is a no-no. – Frank Harrell Jul 22 '20 at 17:32
  • @frank is this to say that using a p value to test any point null hypothesis is a no-no? Because i seldom see the difference between variable selection and point null testing. – JTH Jul 22 '20 at 23:04
  • 1
    @JTH: testing *a single hypothesis* is fine. (Or multiple ones, with proper correction.) What is *not* fine is testing one coefficient, then removing it from the model if $p>0.05$ (or similar), and *then testing other coefficients*. Everything after that first test is invalid. Then again, I am not quite as stringently against stepwise model building as long as the goal is *prediction* as opposed to inference. – Stephan Kolassa Jul 23 '20 at 01:59