I have a time-series cross-sectional dataset consisting of 100 individuals that each had 4 features measured yearly for 21 consecutive years. One of the features is binary and the other three are continuous.
Below is a fictitious example of what my dataset looks like:
x1<-rep(1:100, each=21)
x2<-rep(rep(2000:2020), 100)
x3<-round(rnorm(210), digits=2)
x4<-round(rnorm(210), digits=2)
x5<-round(rnorm(210), digits=2)
x6<-sample(0:1, 210, replace=T)
x<-data.frame(cbind(x1, x2, x3, x4, x5, x6))
colnames(x)<-c("Person", "Year", "X1", "X2", "X3", "Y")
> head(x)
Person Year X1 X2 X3 Y
1 1 2000 1.07 -0.38 -2.78 0
2 1 2001 1.03 1.35 0.35 0
3 1 2002 -0.14 -2.23 0.46 1
4 1 2003 -0.88 -0.22 0.12 1
5 1 2004 0.17 1.79 0.64 0
6 1 2005 -0.45 2.10 1.75 0
> tail(x)
Person Year X1 X2 X3 Y
2095 100 2015 0.55 2.21 -0.54 1
2096 100 2016 0.70 0.04 2.12 1
2097 100 2017 -2.49 -1.47 -1.19 1
2098 100 2018 -0.70 1.17 0.79 0
2099 100 2019 1.21 0.47 0.31 0
2100 100 2020 -0.92 -1.53 1.20 0
I wish to train different learning algorithms on this dataset to forecast/predict each individual's class, $Y$.
I am finding it difficult to think how off-the-shelf learning algorithms like decision trees, support vector machines, neural networks, and so on, can be trained and tuned on this type of data in R. I usually use the $caret$ package in R when I am training and tuning learning algorithms on cross-sectional data.
Q1: Is is possible to adapt and apply machine learning methods to solve this type of problem?
Q2: Is this the best way to store time-series cross-sectional data for analysis in R?
Although I do not know where to start with tackling this type of classification problem, I realise that one cannot use $k$-fold cross validation to tune hyperparamters since the data is probably correlated across time. A possible solution would be to use moving/sliding window cross validation?
Q3: Is there a package available in R for doing moving/sliding window cross validation?