0

My question deals with the principles or way of thinking about implementing a logistic regression model in a larger scale forward prediction.

Let's say I have a trained logistic model which can predict whether a machine component will fail or not. My model predicts whether or not failure will occur given hours in service, component make etc.

Normally, on a single prediction, I would use 50% probability as a threshold to label the outcome as failed or not-failed. However, let's say for a specific combination of predictor variables, the probability is always 0.40.

So on a single prediction, the outcome would be false (not-failed). So if I repeat this prediction 100 times, I would get zero failures predicted. But if I assume 100 of those cases, would I not expect to see 40 failures?

So I am thinking of implementing my logistic regression model in the following way for each simulation:

  1. Use the model to calculate the probability of failure
  2. Generate a uniform random variable between 0 and 1
  3. If the random variable is less than the probability, then assign "failure occurs"
  4. Repeat for each simulated case and count or plot failures over time etc.

Is my thinking on this correct? Is there a some fundamental flaw in my reasoning here? Any suggestions or references to books or studies that deal specifically with this issue would be much appreciated.

Fritz45
  • 171
  • 10
  • Isn't this just a roundabout way of reporting the value ($0.40$) computed by the logistic regression?? – whuber Aug 20 '21 at 11:23
  • @whuber yes in a sense it is. However, in this context, my simulation model has to take action - for each specific prediction, based on whether or not failure has occurred & also WHEN it occurred (the model loops over all components for many years). This prediction then has further downstream effects in the simulation because the actions (e.g. repair or replace) have constraints. So it becomes important whether the model blindly looks at the label outcome (fail or no fail) or whether it uses the failure probability coupled with a random chance of occurrence. I hope that explains a bit better? – Fritz45 Aug 20 '21 at 23:37

0 Answers0