Assume we are receiving a continuous time-series: $$X_1 = \{x_{1,1},\ldots,x_{1,n}\} \in \mathbb{R}^n$$ $$\vdots$$ $$X_i = \{x_{i,1},\ldots,x_{i,n}\} \in \mathbb{R}^n$$
At each step $i$ (knowing all the past history $X_0,\ldots,X_i$) we want to predict some other variable $y_i\in\mathbb{R}$. Time-series $y_1,...,y_i$ is also continuous, and is generally mean-reverting to 0 (i.e. it evolves around 0).
We are given some training data $X_1,\ldots,X_N$ with $y_1,\ldots,y_N$ of length $N$, which we can use to analyse the relationship between $X$ and $y$, find a model that fits this data, etc.
Now, in production (with a time-series of length $M$ where $N\ll M$), we only receive $X_i$ at each time step $i$, but never $y_i$. So while in production, we have no way of knowing whether our current estimate $\hat{y}_i$ has diverged from $y_i$ (since we don't even receive/know past values of $y_1,\ldots,y_{i-1}$ at step $i$).
How would one approach a problem like this? I am asking because I have only come across setups, where one knows $X_1,\ldots,X_i$ and $y_1,\ldots,y_{i-1}$ at step $i$. So this at each step, we can re-calibrate the model. But if we do not receive any $y_i$ then I feel the only option is to calibrate the model once with the training data using e.g. multi-linear regression (and then hope for the best). But I feel doing a single fit is probably not going to be enough to fit all the data well.
So perhaps one could split/identify different regimes in the training data (based on what the current $X_i$ is), and then do a model fit for each regime independently. That is, do localised calibration (almost like a hash table). Then the question is, what sort of technique would one use to identify local clusters?
Any thoughts greatly appreciated.
Edit: It's not that i am missing $y_{N+1},...,y_M$ (= the values which are not in the training data). I theoretically have them, but I have to pretend that whenever I receive the next $X_i$ I do not know the corresponding $y_i$, if $i>N$. That is, the model can only ever know $y_1,...,y_N$ (= the values from the training data).