Supervised learning: setting labels on sliding windows of sensor data

Question

Suppose that I have a set of accelerometer data collected with one sensor and one label for each measured data point. These labels describe different states of my system e.g., $state_A, state_B, state_C$, etc., and I want to use this information to train a classifier to recognize these states mentioned earlier.

Now, let's say that I want to use a fixed sliding window to extract some features rather than feeding the raw data to the classifier. The issue is, that some of these sliding windows could contain more than one unique label: e.g. the time window contains the transition from $state_A$ to $state_C$. What should I do with these kind of windows? Should I discard them? Should I set a threshold to determine if I use them or not (e.g. if 90% or more of the measured points cointain the same label, then it is ok to use the window)? Are there any best practices handling these kind of situations?

In the following figure I add an example of this issue: the image shows a plot of three different states that the system could have, and they are encoded as 1,2, and 3 for visualization purposes. Let's say that I want to take windows of every 25 samples without overlapping, so the vertical red lines show the beginning and the end of every time window.
There are windows that only possess one unique state, but some others contain more than one.

What do you want to happen with such a window during inference? Do you want to know about the transition? Do you prefer the new or old class? Or is either class OK? Or do you want it to be classified as unknown? The answer guides what to do during training/labeling — Jon Nordby, Jun 01 '19 at 08:58

score 1 · Answer 1 · answered Dec 11 '17 at 17:41

1

Your sliding window will generate something at every time period, assuming you slide it by one time slice at a time. So the feature you're generating from the sliding window will have a value at each time point, which corresponds to your target state.

The question is: which time slice that's contained in your window is the one that you attach the value to? For example, say your time window is 10 wide. If the first time is time 0, you won't be able to generate a feature for times 0 through 8, but at time 9 your window will contain 0-9 and you generate a feature for time 9. Then slide it over one (to times 1-10), and calculate your feature again and assign that value to time 10. And so on.

If your feature were the mean of values in your window, this would be called a trailing mean. You may not want to use the centered mean, which is what most people mean when they say a "moving average". Why? Because a centered mean in a window that's 11-wide, at time 20 would use times 15-25, which would include five units of time that would have not yet happened.

If you're doing everything after you've gathered all of your data, peeking "into the future" might not be a problem, but if you're ever going to try to predict what comes next, you won't have access to the future. Also, I've used the example of a "moving average" or "running mean", and both of these can introduce artifacts into your calculations because things snap in and out of the window. You may want to use some kind of "falling off over time" in your window so that things don't pop out.

Or are you wanting to compress your data so that only the sensor readings and state at each jump are recorded/modeled? (Assuming your data is as jumpy as your example.)

answered Dec 11 '17 at 17:41

Wayne

19,981
4
50
99

Following your example, let's say I have 100 samples, where every sample has it's own label (hence, 100 labels in total) and I want that the classifier recognizes in a given time point in which of the possible states is the system. So I want to extract the mean and the variance of 10 time windows of width 10 that I slide through my series (0 - 9, 10 - 19, etc.) to train a classifier. Some windows will only have 1 state (the ten labels will have the same value), while some others will have jumps (8 labels show _state A_ while 2 labels show _state B_) What should I do with the latter windows? – Dec 11 '17 at 18:16
@Fustin: I think you're confusing the creation of features with training a model. The state at any point in time doesn't matter in terms of creating something like a mean and SD of the last 10 measurements as two features. They co-exist with the state, and there will be a mean and an SD for each time slice, just as their is a state. When creating your model, you may need to account for the last N states when predicting the next state. That's a separate concept and depends on what kind of model you're using. – Wayne Dec 11 '17 at 20:26
Every window has to have one unique associated state, right? For example, a KNN should need, for training, feature vectors with only 1 label per vector. So, when testing, the raw data is also mapped into the feature space and based on distances, the testing data receives the label of the closest neighbor(s). In such situations I find confusing how to declare a state of a time window that possess 2 or more states. If I take 10 samples per window, it is clear that I will finish with 1 value for the mean and 1 for the SD, but... if there is more than 1 state, how do I "select" the correct one? – Dec 12 '17 at 10:26
Depends on your model and its use. If you are trying to predict what (hidden) state your system is in at the last time slice you have so far, your _target_ will be the state corresponding to the end of the window. If you are trying to predict the state your system will be in at the next time slice, the next time slice's state would be your _target_. If you can figure out the state before the current/next time slice, you could use those states as predictors, depending on your modeling technique, but don't confuse that with your _target_. – Wayne Dec 13 '17 at 03:50

Supervised learning: setting labels on sliding windows of sensor data

1 Answers1