0

Background: I'm trying to determine if there is a difference in the rate of failure between two populations. Population 1 is an engine run on one type of fuel. Population 2 is the same engine run on a different type of fuel.

I have 1000 engines in my first group along with the list of when they failed [5,8,10...] I have 100 engines in the second group [6,12...]

If everything failed I could test if the means were different with a 2 sample unpaired t test, however since most of the engines never failed, I'm not sure how to easily test whether the two populations really the same.

Dave2e
  • 1,441
  • 4
  • 14
  • 18
  • 2
    Since it is unlikely the engine failure times is a normal distribution and many engineers are still running. I would not recommend the t-test. I would recommend a ranked sum test such as the Wilcoxon test. I suggest a Google search on a survival tests. – Dave2e Mar 07 '21 at 18:03
  • @Dave2e the t-test does not require that the distribution of the sample is normally distributed. Even so, performing a t-test on the log transformed performance times to reduce the influence of long-surviving components could avail two times the power of a rank-based test. – AdamO Mar 08 '21 at 17:48

1 Answers1

2

This is a classic situation for survival analysis, which by design takes into account cases for which there has been no failure by the end of the study.

If you only have 2 groups, no predictors other than the type of fuel, and you don't consider any engines that might have been repaired and put back into service, then showing Kaplan-Meier estimates of survival over time for each group and using a log-rank test to compare the 2 groups would be a standard approach. Tools are provided by standard statistical software, for example the survival package in R.

EdM
  • 57,766
  • 7
  • 66
  • 187
  • 1
    +1 Echoing that the `survival` also allows for parametric survival regression with censored data, such as an exponential or Weibull distributions which would allow for predictions and post-hoc estimates of mean survival time and perhaps greater power. – AdamO Mar 08 '21 at 17:51
  • Do you have a recommendation for packages in python? – Jonathan Hay Mar 08 '21 at 18:15
  • @JonathanHay see the [`lifelines`](https://github.com/CamDavidsonPilon/lifelines/blob/master/docs/Survival%20analysis%20with%20lifelines.rst) package. The link is to an explanation of how to use it, starting with Kaplan-Meier and log-rank, then going on to parametric models. – EdM Mar 08 '21 at 18:56