9

I am trying to create a tool that labels refrigerator temperature readings. A reading is taken every 5 minutes, and its label identifies whether of not it was taken while the refrigerator was defrosting. Periodically, the refrigerator will defrost, meaning that a heater will turn on, causing the temperature to rise. Defrosts vary greatly in their height, duration, and shape. The only thing that is consistent about them is that they occur at constant time intervals (e.g. every 3 hours). During normal operation (when the refrigerator is not defrosting), a refrigerator's temperature will bounce around as its compressor turns on and off and as its door is opened and shut. The image below shows an example of what a time series of temperature readings might look like, with defrosts shown in blue.

Example of Temperature Readings

I need to design a tool that can take a series of temperature readings and classify whether or not the LAST temperature reading (the rightmost one) is part of a defrost. The training data is comprised of a 1D array of temperature readings and an equal length 1D array that represents the class of the temperature readings (the class is either 0 or 1, where 1 means defrost).

I've tried both convolutional and recurrent neural networks without much luck. They don't seem to be able to learn to utilize the spacing between defrosts, and so they keep erroneously classifying last readings as being part of a defrost, even when not enough time has elapsed. For example, in the image below, my trained networks erroneously classify the last reading as being the start of a defrost, even though it is too soon for a defrost. Is there a machine learning or deep learning tool that is better at recognizing periodic patterns? Is there a better technique for classifying my data? Thanks.

Last reading will be mislabeled

GreenBlue
  • 91
  • 2
  • Can you clarify what you mean by "at fairly even time intervals" ? Is the time interval stochastic or deterministic? – Skander H. Jan 25 '20 at 20:03
  • Is your data collected at a fixed interval of time ?. If so please post it a csv file format. – IrishStat Jan 25 '20 at 22:38
  • How long was the analysis window you used for the CNN/RNN? This pattern looks very detectable, especially if you include the delta (first order difference) as a feature. – Jon Nordby Jan 26 '20 at 11:17
  • Do you have any requirements for the detection delay, how long it takes before the event is detected? – Jon Nordby Jan 26 '20 at 11:24
  • @jonnor, I used convolutional filters of length 50 in the first layer and length 100 in the second convolutional layer. The delay needs to be as small as possible. Ideally, there would be no delay at all, so that the defrost is detected at its first reading. If this is unreasonable, then a small delay of one or two readings could work. Anymore than that would be too much. – GreenBlue Jan 27 '20 at 02:29
  • Detecting the defrost using this data with short delay is much much harder than detecting with a long delay. So much that I'd say you should use a different sensor if you need that. Electric current, vibration, or sound. If you can detect after a cycle then a 1x7 CNN should do OK. – Jon Nordby Jan 27 '20 at 08:11
  • @IrishStat, I will attach a csv. The data is collected every 5 minutes. The training data is comprised of a 1D array of temperatures and an equal length 1D binary array whose elements are either 0 or 1 (to represent the classification of the corresponding temperature reading). The test data is just a 1D array of temperature readings. – GreenBlue Jan 27 '20 at 15:12
  • @Skander H., the time interval is roughly constant but with a little bit of error, so it is deterministic with a small stochastic error – GreenBlue Jan 27 '20 at 15:15
  • how is the binary variable assigned or is it observed somehow .... – IrishStat Jan 27 '20 at 18:20
  • @IrishStat, the data came with no labels. Therefore, in order to create training data, I was forced to generate labels myself. I wrote some programs that were somewhat able at guess at where the defrosts were, and then I had to manually check the results with my own eyes and make corrections. It was all very time intensive. – GreenBlue Jan 27 '20 at 19:35
  • It had seemed to me that you wished to use a continuous variable (temperature) to predict a series that was binary. Now given that the observed data is continuous with some unusual values , the statistical problem might be to characterize the observed series and to possibly identify systematic pulses i.e. pulses that arise with fixed regularity. – IrishStat Jan 27 '20 at 19:43
  • @IrishStat, yes I think I agree. But I'm not sure how to build a classifier that captures the "fixed regularity" part. My classifiers keep giving false positives whenever anything looks like a pulse, no matter when it occurs. – GreenBlue Jan 27 '20 at 19:53
  • post your temperature data – IrishStat Jan 27 '20 at 20:02
  • @IrishStat Sorry for the delay. Here is a link to training data. You can open it in Python using "file = open('Data_18K_One_Third_Fake_Defrost.pickle', 'rb')" and then "(X,Y) = pickle.load(file)". Each row of X is a series of temperature readings, and each row of Y is the corresponding class labels. Thanks. https://drive.google.com/open?id=1-S8KoPg4ovC3Yc26fiZddMkMyVCedn0T – GreenBlue Feb 03 '20 at 20:30
  • See https://stats.stackexchange.com/questions/16117 for a similar problem. – whuber Feb 03 '20 at 21:35
  • unfortunately this is what I get … simply take your 500 temperature readings taken every 5 minutes and send me a txt file. why is your file so large ? [![enter image description here](https://i.stack.imgur.com/SfbNl.png)](https://i.stack.imgur.com/SfbNl.png) – IrishStat Feb 03 '20 at 21:20
  • What about fitting a Bayesian structural time series model with your binary series as the target and temperature as a covariate? This would allow you to incorporate seasonality (cyclicality?) into the model, and you could then predict current status with new temperature observations. See "Example 3: recession modeling" in [this post](http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html) for a worked example in R. – ulfelder Feb 03 '20 at 21:40

1 Answers1

0

I would use a decent smooth for the characteristic time (EWMA, Sgolay, ...) on the time-series, and I would look at divergence from that smooth. If you are sampling every 5 minutes then the EWMA weight should be something like 1/12 or 1/24, or the SG window size should be around 12 or 24 units in size.

I would also cyclize the time:
(hour of day) --> [cos(hour/24), sin(hour/24)] (day of week) --> [cos(day/7), sin(day/7)] (week of year) --> [cos(week/53), sin(week/53)]

And add flags for weekends and holidays.

You aren't going to get everything. If someone has a super-bowl (or other sports/ball) party at their house, a kids birthday, or other celebration, then the fridge might get a lot of atypical mileage.

A decent Random Forest should do a solid job here. Feed it the errors, the cyclized time/date and flags for weekend or holiday, and it should do a fair job of predicting the defrost events.

If you had decent dummy data I could show this to you in pseudocode and give decent graphs and fit analyses.

EngrStudent
  • 8,232
  • 2
  • 29
  • 82