I am looking for a method of forecasting short time series. I need to make multiple such forecasts at parallel, so I need some simple method that scales to large data. My data looks like $n\times k+1$ (where $n$ is in thousands or more and $k$ is something between 3 and 10) matrix
$$ \begin{array}{ccccc} x_{1,t-k} & x_{1,t-k+1} & \dots & x_{1,t-1} & x_{1,t} \\ x_{2,t-k} & x_{2,t-k+1} & \dots & x_{2,t-1} & x_{2,t} \\ \dots \\ x_{n,t-k} & x_{n,t-k+1} & \dots & x_{n,t-1} & x_{n,t} \\ \end{array} $$
The series may, or may not, be correlated with each other.
I need to make forecases for the $x_{i, t+1}, x_{i, t+2}, x_{i, t+3}$ points (the forecast horizon would hopefully be short) for each $i$-th series. In most cases I can assume some simple linear trends (I am not hoping for much more since the limited data for each of the series), so I could simply try something like a linear regression to approximate AR($k$) model, but in my data, from time to time after a steady linear upward trend I observe a random drop and I would like my method to be somehow sensitive to noticing such change (e.g. when there is a change between $x_{i,t-1}$ and $x_{i,t}$) and not suggesting linear upward trend after noticing it. Below I show a made-up example that looks similar to my data, that was created using the a10 dataset from fpp package by Rob Hyndman.
library("fpp")
s1 <- seq(1, 100, by = 10)
s2 <- seq(10, 110, by = 10)
X <- NULL
for (i in seq_along(s1))
X <- rbind(X, a10[s1[i]:s2[i]])
All this means that the common simple methods like linear regression; random walk forecast that takes $x_{i,t}$ as a forecast for the $t+1, t+2, t+3$ values; random walk with drift; predicting mean of previous timepoints etc. do not seem as a good choice for me. On another hand, limited data and computational limitations make more advanced methods not feasible.
I was thinking of something like using simple exponential smoothing of the changes between the timepoints $\Delta_{i,t} = x_{i,t} - x_{i,t-1}$, i.e. taking
$$ \begin{align} \ell_{i,t} &= (1-\alpha) \ell_{i,t-1} + \alpha \Delta_{i,t} \\ \hat x_{i,t+1} &= x_{i,t} + \ell_{i,t} \\ \hat x_{i,t+2} &= \hat x_{i,t+1} + \ell_{i,t} \\ \dots \end{align} $$
with some pretty hign $\alpha$ (this could be optimized), but I do not want to re-design the wheel, so I'm looking for some hints either on this approach, or suggesting something better then this. I'd need prediction intervals for my forecast, so I'd be grateful also for a comment on this.
Another problem is that this approach ignores the fact that I could also use the other series to build a general model. I believe that using the whole dataset could lead to improvements (some kind of shrinkage). Maybe I could use something like hierarchical forecasting described by Rob Hyndman (see e.g. those slides or the hts vignette) where the parameters to aggregate the individual series could be estimated using regression (this easily scales).