How to prove that special data is useless for given problem?

Question

I have a dataset of sensor values and machine breakdowns. Based on the sensor values I try to build an early recognition of upcoming breakdowns (classic predictive maintenance task). I've been analyzing the given data for quite some time now and with several methods and came to the assumption that there is no connection between the sensor values and the machine breakdowns.

Therefore, now I want to show more formally that there indeed is no connection. Is there a (common) way to do this?

If I fit a stochastic process* on the sensor values and get normally distributed residuals**, does this prove that the sensor values are of no predictive power regarding the breakdowns?

Many thanks in advance!

*(e.g. random walk with mean reversion, moving average or autoregressive process)

**(for both cases: close to a breakdown and far away from the next breakdown)

score 0 · Accepted Answer · answered Aug 14 '20 at 15:09

The problem I see with your approach is the assumptions that it makes about the temporal association between the sensor values and machine breakdowns.

If you were sure that only aberrations in sensor values "close [in time] to a breakdown" mattered then your approach might make some sense. (I'm assuming that you would perform some test to document minimal differences in characteristics of the process over the two types of time periods.) That assumption is certainly common in survival analysis, which typically assumes that predictor values at any specific time are directly associated with the risk of failure at that time. My initial reaction, however, is that you could test that possibility simply and more directly by evaluating the sensor readings over time rather than trying to model sensor readings themselves.

Say, however, there was another scenario: a sensor anomaly at some time represents non-fatal damage to the machine. Then continued wear and tear on the machine during further continued operation leads to final breakdown sooner than it would have occurred otherwise. Or the breakdown might be associated with the integrated deviations of a sensor reading over time so that the details of the time course matter. Unless your definition of "close to a breakdown" is very broad your modeling and comparison of the stochastic processes would not be informative in those scenarios.

Many thanks for your extensive answer! You're correct: I make the assumption that there is a temporal association between the sensor readings and machine breakdowns. Thanks for explaining the problem with this. However, I think I have to examine temporal associations in order to make a prediction about future breakdowns. Unfortunately, I don't quiet understand what you mean by "by evaluating the sensor readings over time rather than trying to model sensor readings themselves." What do you have in mind for evaluating the sensor readings over time? — Seven Up, Aug 17 '20 at 11:29
@SevenUp I'm thinking of things like integrating readings over time, evaluating the frequency or duration of readings that might be beyond some limits, or even looking at time lags between extreme sensor readings and failure times. There's no way to be much more specific without playing a lot with the data, and you'll have to be careful to avoid overfitting, finding trends that work with the present data but don't extend well to new data. This is a difficult problem. — EdM, Aug 17 '20 at 13:10
Ok, many thanks again for giving some examples. Some of them (and, of course, many others) did I already perform. And, of course, I will try the remaining ones you mentioned, too. However, if the results continue to be of weak quality, I think it will be necessary to "prove" that there isn't much to learn from the data. E.g. to show that "the only 'pattern' in the data is noise." — Seven Up, Aug 17 '20 at 14:17

How to prove that special data is useless for given problem?

1 Answers1