Time-series cross-sectional classification problem

Question

I have a time-series cross-sectional dataset consisting of 100 individuals that each had 4 features measured yearly for 21 consecutive years. One of the features is binary and the other three are continuous.

Below is a fictitious example of what my dataset looks like:

x1<-rep(1:100, each=21)
x2<-rep(rep(2000:2020), 100)
x3<-round(rnorm(210), digits=2)
x4<-round(rnorm(210), digits=2)
x5<-round(rnorm(210), digits=2)
x6<-sample(0:1, 210, replace=T)  
x<-data.frame(cbind(x1, x2, x3, x4, x5, x6))
colnames(x)<-c("Person", "Year", "X1", "X2", "X3", "Y")

> head(x)
  Person Year    X1    X2    X3 Y
1      1 2000  1.07 -0.38 -2.78 0
2      1 2001  1.03  1.35  0.35 0
3      1 2002 -0.14 -2.23  0.46 1
4      1 2003 -0.88 -0.22  0.12 1
5      1 2004  0.17  1.79  0.64 0
6      1 2005 -0.45  2.10  1.75 0

> tail(x)
     Person Year    X1    X2    X3 Y
2095    100 2015  0.55  2.21 -0.54 1
2096    100 2016  0.70  0.04  2.12 1
2097    100 2017 -2.49 -1.47 -1.19 1
2098    100 2018 -0.70  1.17  0.79 0
2099    100 2019  1.21  0.47  0.31 0
2100    100 2020 -0.92 -1.53  1.20 0

I wish to train different learning algorithms on this dataset to forecast/predict each individual's class, $Y$.

I am finding it difficult to think how off-the-shelf learning algorithms like decision trees, support vector machines, neural networks, and so on, can be trained and tuned on this type of data in R. I usually use the $caret$ package in R when I am training and tuning learning algorithms on cross-sectional data.

Q1: Is is possible to adapt and apply machine learning methods to solve this type of problem?

Q2: Is this the best way to store time-series cross-sectional data for analysis in R?

Although I do not know where to start with tackling this type of classification problem, I realise that one cannot use $k$-fold cross validation to tune hyperparamters since the data is probably correlated across time. A possible solution would be to use moving/sliding window cross validation?

Q3: Is there a package available in R for doing moving/sliding window cross validation?

Why not give logistic regression a go if you are trying to predict Y given X1, X2, X3? Do you know the data set to not be suitable? — paisanco, Jul 03 '15 at 01:26

Time-series cross-sectional classification problem

0 Answers0

Linked