I am not well versed in anything beyond basic statistics but have been tasked with coming up with a "grading" scale for wear on a part based on data we have collected. I am in need of help figuring out the best way to approach the problem.
The problem:
This is very generalized, but essentially I have a tool that wears during its lifetime. Multiple channels of data are recorded throughout the tool's lifetime (coolant temperature, rotational speed, calculated load, etc...) at regular intervals (>1Hz). When the tool is retired, we have a file with millions of data points per channel that span the tools entire lifetime. At this point, the wear of the tool is measured (difference in diameter compared to when it was new). My task is to take the time-based data over the tool's lifetime and use it to create a grading scale that would correspond to the amount of wear the tool sees graded by each of these channels. The end result would be to find the degree to which each variable contributes to the overall tool wear (is it spinning it fast, high operating temperatures, low load, or a combination?).
My attempts:
I figured I would need to have a single number represent each data channel over the entire lifetime if I was going to look for some sort of correlation. So essentially I need to reduce the millions of data points per channel down to one "representative" number. And right here I hit a snag as that doesn't seem trivial. I attempted just integrating the data but understandably that ends up more or less just correlating the wear to the time span the data was taken over since time spans a much greater range than the value of any single channel. But if I were instead to use an average of each channel, then I would lose all time effects (like if it was held at a lower speed for a long time vs a higher speed for a short amount of time).
What I need help with:
Above all, I need to know if there are proven methods for reducing large time-based data down to single representative values? From there I could look into mutlivariate regression models more (see here, here, and here).
Or maybe there are methods that skip that intermediate step and can correlate data that has multiple independent variable samples for a single dependent variable sample?