1

I have a data-set with time of day (0 - 24 hours) as a dependent variable together with some continuous response variable which demonstrate what looks like a skewed sine relationship. For visualization, below there are two example data-sets, one is a sine wave, the other a skewed sine wave involving the arctan function as described in https://math.stackexchange.com/a/2430837 .

Now I would like to quantify that rhythmic behaviour in a type of regression analysis that forces the estimate at 0 and at 24 hours to be the same (same endpoints) and where I am able to adress the following questions:

  • What is the estimated time of day when the maximum (or minimum) occurs?
  • Is the amplitude significantly different from zero?
  • Is there a significant skewness in this rhythmicity?

Which model would you suggest? It does not need to involve a sine wave, but I would need to be able to adress the above mentioned questions. Thank you. sine wave without a skew

enter image description here

Joshua
  • 125
  • 8
  • Judging by your graphs, whatever you are doing already works pretty well. In my experience if the driver is astronomical/climatic/meteorological, sinusoids work nicely. The more human the phenomenon (eating, sleeping, traffic), the more you might need something else. – Nick Cox Jan 09 '20 at 14:02
  • @NickCox the skewed sine function is nice, but I would not know how to estimate it. Here I just generated data from a predefined function. – Joshua Jan 09 '20 at 15:24
  • Nonlinear least squares or a maximum likeihood engine I imagine. I've never tried. – Nick Cox Jan 09 '20 at 15:28
  • Arctic sea ice cover is one series (time of year is the driver) with asymmetric peaks and troughs but a few sine, cosine pairs work fine. – Nick Cox Jan 09 '20 at 17:05

1 Answers1

2

You could do all this with a GAM and allow the model to identify the shape of the deterministic relationship, then answer your questions using the posterior distribution of the model.

You can use a cyclic cubic regression spline to constrain the end points of 0 and 24 to be the same. For example:

knots <- list(ToD = c(0, 24))
m <- gam(y ~ s(ToD, bs = 'cc', k = 15), data = df, method = 'REML',
         family = XXXX, knots = knots)

where XXXX is a suitable family for the conditional distribution of the response variable.

Your questions could be addressed as follows:

  1. You can get this evaluating the estimated smooth function s(ToD) at a fine grid over the range of ToD and then ask at which point in that range is the maximum reached. You can use posterior simulation to generate the uncertainty in that time also. See this answer that I provided on a related problem (it is essentially the same issue): https://stats.stackexchange.com/a/191489/1390

  2. One answer to this could be gleaned directly from the Wald-like test shown when you look at the summary of the model. Strictly this would be a test of the null hypothesis that the smooth function is equal to 0.

    A more direct test would be to proceed as per question/answer 1 and instead of finding the maximum of the curve, find the minimum and the maximum and their difference, and repeat using posterior simulation to put an uncertainty estimate on the amplitude.

  3. You could formulate this as a test on the difference of the derivatives of the estimate smooth functions over some time period before and after the maximum. Again, evaluate the derivative over the two time intervals, compute the absolute difference of these derivatives as the point estimate for asymmetry and then repeat the process as above on a large number of draws from the posterior distribution to get an estimate of the uncertainty.

If you provide some example data I could add some code examples in R to show you what I mean/illustrate what I'm getting at with the above.

Gavin Simpson
  • 37,567
  • 5
  • 110
  • 153