I have a collection of univariate, irregularly spaced, financial time series. Each series is labeled by its class. The image below shows some example data.
A note on the data:
- The time series could be made evenly spaced by filling in missing values with 0, but I'm not sure if this is the way to go. Based on the data generating process, zeros would make much more physical sense than values obtained via interpolation.
- The stationarity of the time series cannot be guaranteed.
- Other properties of the time series cannot be guaranteed. For example, the time series may have steps and pulses or outliers.
- The time series don't exist in their own worlds. Information from one time series will very likely provide information about another.
The Task:
I will receive new, unlabeled data that will almost always be at future time points. For example, the new data might look like this:
+-----------+-------+
| Date | Value |
+-----------+-------+
| 8/9/2019 | 15.75 |
| 8/5/2019 | 4.72 |
| 8/18/2019 | 28.19 |
| 8/14/2019 | 16.44 |
+-----------+-------+
This new data could be from any of the classes. For example, the 3rd row above could be from class W while the rest of the rows could be from class F. (But, because it is unlabeled, I will not know which class each row is from.) For each row of my new data and for each class, I would like to get the subjective probability that the given row came from that class. Now, I don't think making these inferences row by row would be a good idea. I would like to consider all of the new data simultaneously because otherwise I'm throwing away perfectly good predictive information. (But if there's only a row-by-row solution, then that's fine.)
I know that time series classification is a thing, but I'm not sure how or even if it could be used with my data. From what I have read, it sounds like time series classification works by matching up the series on a common time axis. I don't think this will work for me because my new unlabeled data comes from the future. Also, my new data cannot be assumed to be from the same class. What is the best way to do this? Keywords of what to Google are also welcome. I think that what I am trying to do is called probabilistic classification. I think that one of the first steps is to make probabilistic forecasts for each time series. Someone let me know if that's wrong.