My question deals with the principles or way of thinking about implementing a logistic regression model in a larger scale forward prediction.
Let's say I have a trained logistic model which can predict whether a machine component will fail or not. My model predicts whether or not failure will occur given hours in service, component make etc.
Normally, on a single prediction, I would use 50% probability as a threshold to label the outcome as failed or not-failed. However, let's say for a specific combination of predictor variables, the probability is always 0.40.
So on a single prediction, the outcome would be false (not-failed). So if I repeat this prediction 100 times, I would get zero failures predicted. But if I assume 100 of those cases, would I not expect to see 40 failures?
So I am thinking of implementing my logistic regression model in the following way for each simulation:
- Use the model to calculate the probability of failure
- Generate a uniform random variable between 0 and 1
- If the random variable is less than the probability, then assign "failure occurs"
- Repeat for each simulated case and count or plot failures over time etc.
Is my thinking on this correct? Is there a some fundamental flaw in my reasoning here? Any suggestions or references to books or studies that deal specifically with this issue would be much appreciated.