2

I'm working with time-series data to train a binary classification model that predicts if an event is going to happen or not in the future. The likelihood of the event depends on the specific time slots of the day. I want to encode the time feature as a continuous variable. I read the article http://blog.davidkaleko.com/feature-engineering-cyclical-features.html. It explains how hours can be encoded through sin and cos to keep its cyclical nature. But what about hours and minutes together (e.g. Half past five)?. The goal would be to have the time encoded as a continuous variable to capture that the likelihood of the event increases as the time advances every minute (e.g from 5 pm to 7 pm).

Brandon
  • 652
  • 5
  • 13
  • Time is continuous, so minutes is handled like hours! Just convert to some decimal representation of time, that is, do not have hours/minutes/seconds as separate variables. Then you can use sin/cos, more general trigonometric polynomials, or periodic splines https://stats.stackexchange.com/questions/225729/what-are-periodic-version-of-splines, https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/splinefun, https://www.fromthebottomoftheheap.net/2014/05/09/modelling-seasonal-data-with-gam/ – kjetil b halvorsen Feb 25 '20 at 15:58

1 Answers1

0

I would say add minutes to hours and then encode cyclically the hours

df.hr = df.hr + df.min/60
df['hr_sin'] = np.sin(df.hr*(2.*np.pi/24))
df['hr_cos'] = np.cos(df.hr*(2.*np.pi/24))
Davide ND
  • 2,305
  • 8
  • 24
  • But if then you divide by 24 that would not make sense? – Brandon Mar 06 '20 at 09:38
  • 1
    Hour will become a float going from 0 to 24, so it does make sense to divide by its max. I am of course assuming that hour can take values in $[0,23]$ and minute can take values in $[0,59]$, which seems the most logical option – Davide ND Mar 06 '20 at 09:46