Find correlations based on large multivariable time-based data with one output per dataset

Question

I am not well versed in anything beyond basic statistics but have been tasked with coming up with a "grading" scale for wear on a part based on data we have collected. I am in need of help figuring out the best way to approach the problem.

The problem:

This is very generalized, but essentially I have a tool that wears during its lifetime. Multiple channels of data are recorded throughout the tool's lifetime (coolant temperature, rotational speed, calculated load, etc...) at regular intervals (>1Hz). When the tool is retired, we have a file with millions of data points per channel that span the tools entire lifetime. At this point, the wear of the tool is measured (difference in diameter compared to when it was new). My task is to take the time-based data over the tool's lifetime and use it to create a grading scale that would correspond to the amount of wear the tool sees graded by each of these channels. The end result would be to find the degree to which each variable contributes to the overall tool wear (is it spinning it fast, high operating temperatures, low load, or a combination?).

My attempts:

I figured I would need to have a single number represent each data channel over the entire lifetime if I was going to look for some sort of correlation. So essentially I need to reduce the millions of data points per channel down to one "representative" number. And right here I hit a snag as that doesn't seem trivial. I attempted just integrating the data but understandably that ends up more or less just correlating the wear to the time span the data was taken over since time spans a much greater range than the value of any single channel. But if I were instead to use an average of each channel, then I would lose all time effects (like if it was held at a lower speed for a long time vs a higher speed for a short amount of time).

What I need help with:

Above all, I need to know if there are proven methods for reducing large time-based data down to single representative values? From there I could look into mutlivariate regression models more (see here, here, and here).

Or maybe there are methods that skip that intermediate step and can correlate data that has multiple independent variable samples for a single dependent variable sample?

Do you have only one measurement of wear, when the tool is retired? Or do you have measurements during the tool's lifetime as well? — mkt, Jun 13 '19 at 11:55
Can you tell us how many tools you have data on? And were they retired after similar lengths of time, or did they have different lifetimes? — mkt, Jun 13 '19 at 11:58
About 40 tools (we get data on about 4-5 a week), and they are mostly in the same ballpark of time but can vary by ~25% of the average, with a few that were retired in about half the usual time. — user2731076, Jun 13 '19 at 12:01
From what I understand you are trying to do a regression on the "wear". Maybe I have to disappoint you, but this kind of analysis means actually doing the work yourself. Some techniques may help you, but there is no general way. — cherub, Jun 14 '19 at 14:30
Do you want to take an action to provide a situation that correlation between your measured parameters based on times-series can be learned by ML or DL? — Mario, Jun 16 '19 at 15:28
There are between 100k and 1M samples per channel depending on the channel and length of data file. There isn't really any action that needs to be taken, we are just trying to find what conditions lead to excessive wear. My latest thought was to take histograms of each channel and attempt to correlate to the amount of time in each bin. That would give correlation if more time spent at a particular setting leads to increased wear. — user2731076, Jun 17 '19 at 11:57

Find correlations based on large multivariable time-based data with one output per dataset

The problem:

My attempts:

What I need help with:

0 Answers0