How to apply a t-test for product failure?

Question

Background: I'm trying to determine if there is a difference in the rate of failure between two populations. Population 1 is an engine run on one type of fuel. Population 2 is the same engine run on a different type of fuel.

I have 1000 engines in my first group along with the list of when they failed [5,8,10...] I have 100 engines in the second group [6,12...]

If everything failed I could test if the means were different with a 2 sample unpaired t test, however since most of the engines never failed, I'm not sure how to easily test whether the two populations really the same.

Since it is unlikely the engine failure times is a normal distribution and many engineers are still running. I would not recommend the t-test. I would recommend a ranked sum test such as the Wilcoxon test. I suggest a Google search on a survival tests. — Dave2e, Mar 07 '21 at 18:03
@Dave2e the t-test does not require that the distribution of the sample is normally distributed. Even so, performing a t-test on the log transformed performance times to reduce the influence of long-surviving components could avail two times the power of a rank-based test. — AdamO, Mar 08 '21 at 17:48

score 2 · Accepted Answer · answered Mar 08 '21 at 17:27

2

This is a classic situation for survival analysis, which by design takes into account cases for which there has been no failure by the end of the study.

If you only have 2 groups, no predictors other than the type of fuel, and you don't consider any engines that might have been repaired and put back into service, then showing Kaplan-Meier estimates of survival over time for each group and using a log-rank test to compare the 2 groups would be a standard approach. Tools are provided by standard statistical software, for example the survival package in R.

answered Mar 08 '21 at 17:27

EdM

57,766
7
66
187

1

+1 Echoing that the `survival` also allows for parametric survival regression with censored data, such as an exponential or Weibull distributions which would allow for predictions and post-hoc estimates of mean survival time and perhaps greater power. – AdamO Mar 08 '21 at 17:51
Do you have a recommendation for packages in python? – Jonathan Hay Mar 08 '21 at 18:15
@JonathanHay see the [`lifelines`](https://github.com/CamDavidsonPilon/lifelines/blob/master/docs/Survival%20analysis%20with%20lifelines.rst) package. The link is to an explanation of how to use it, starting with Kaplan-Meier and log-rank, then going on to parametric models. – EdM Mar 08 '21 at 18:56

How to apply a t-test for product failure?

1 Answers1