Disclaimer: The method described here is original research, not based on a thorough reading of the litterature. It is my best attempt at improvising a K-fold CV method for a multivariate timeseries analysis with relatively short input window lengths (assuming no/low dependence over longer time spans), where there was a problem with non-homogeneous presence of data sources over the data collection period.
First, the series of observations is transformed into a series of observation history windows of length h and with step 1 between windows. Then the principle is to split the window dataset in S ordered slices (where S>>K, to approximate random splitting), each with length>>h (to not waste data), and hand out the slices alternately (like playing cards) to separate model instances. To keep the resulting subsets more cleanly separated, a quarantine window of length h at the beginning of each slice is held out of training.
The models are trained on all slices except their own, and their own slices are used for validation. Validation of the collection/ensemble of models is done by summing the validation error over all slices, where each slice is processed by the submodel which has not been trained on that slice. Testing on unseen data can be done using an average (or other suitable combination) of the outputs of all the trained model instances. Or one can first distill the ensemble into a single model, training on reproduction of the validation outputs.
This method is intended to reduce dependence on the stationarity of the data-generating process (including measurement reliability) over the collection period. It is also intended to give every part of the data roughly the same influence on the model.
Note that the slice length should not align too well with periods that (are expected to) appear in the data, such as (typically) daily, weekly and yearly cycles. Otherwise, the subsets will be more biased. Imagine, for example (and it is a silly one), a situation where one fold contains all night hours and one contains all day hours and the task is to predict air temperature from radon gas concentration. I have no idea what to expect from the radon gas, but certainly a best guess with no sensible input is lower at night than at day.
One way to test the performance of the resulting CV ensemble is to hold out every K+1-th slice and test the ensemble on the resulting subset. This can be extended to an outer Cross-Validation where different subsets are held out in each fold, at a cost of a factor K+1 on the amount of computation needed.