I am a software developer. I do not have a formal training in time series. I have started reading Chatfield and Brockwell. I have enough wisdom to reach out to professional statisticians in your field for insightful commentary so I can avoid doing something wrong.
Problem
How can I apply leave one out and k-fold cross validation on my time series?
Details
Technically, I have 10 independent time series that is comprised of 10 participants. For each series, we have participant id, timestamp (data taken in one second interval), heart rate, GIS location, GIS zone(The zone is a GIS polygon of special interest for fatigue), and a binary variable indicating if the user is fatigued or not. My goal is to do cross validation so I can build a model to detect the fatigue.
My data is something like as follows:
- participant id, timestamp, heartrate, lat, long, zone, fatigue
- 1, 10:30, 130, 70, 38, 39, 1, 0
- 1, 10:30, 130, 72, 38, 39, 1, 0
- ...
- 10, 10:30, 138, 72, 38,39, 1, 0
- ...
where I can tell which time series I am in based on the participant.
Attempts
Let me divide my time series by participant id. I have [1,2,3,4,5,6,8,9,10]. Where 1 here represents all the data I have for participant 1. Thus, my time series 1. We can consider each series independent from each other. So I can do something like:
Leave one out
- 1 Train: [2,3,4,5,6,7,8,9,10] Test: 1
- 2 Train: [1,3,4,5,6,7,8,9] Test: 2
- 3 Train: [1,2,3,4,5,6,7,9,10] Test: [3]
- 4 Train: [1,2,3,4,5,6,7,8,9,10] Test: [4]
- 5 Train: [1,2,3,4,6,7,8,9,10] Test: [5]
- 6 Train: [1,2,3,4,5,7,8,9,10] Test: [6]
- 7 Train: [1,2,3,4,5,6,8,9,10] Test: [7]
- 8 Train: [1,2,3,4,5,6,7,9,10] Test: [8]
- 9 Train: [1,2,3,4,5,6,7,8,10] Test: [9]
- 10 Train: [1,2,3,4,5,6,7,8,9] Test: [10]
2 - fold validation
I really have confused myself. I was thinking about this approach, but I was told by a colleague that I had it all wrong because I was doing a "within time series approach" and I needed to do a "across time series" approach.
I also checked out this which I think is again for the "within" time series approach because you are taking 1 time series and dividing it in m parts. I have 10 independent time series that supposedly observe the same/similar effect and are independent from each other. I am trying to detect