3

I have a dependent binary variable Y, and an independent date variable X. I want to find out if there is any seasonality (at the year level).

A few notes:

  1. The binary variable is in my model non-deterministic, but I believe may have a higher probability of being 1 given the point in the season. (This maybe obvious but feels important to make clear.)

  2. There are usually 0, 1 or very few data points for each date.

  3. There is only about two years worth of data.

  4. There may be one or more underlying trends that need to be accounted for.

Some example data: enter image description here enter image description here enter image description here

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
cammil
  • 233
  • 1
  • 6
  • 2
    I'd use a local polynomial smoother as a function of date. It's clear that your average won't dip much below 1, but you need a smoother to make clear whether there are systematic dips. A coarser method is to average in bins of say 15 or 30 days. Do you expect any influence from day of week? – Nick Cox May 26 '16 at 11:07
  • See also http://stats.stackexchange.com/questions/144745/ – Elvis May 26 '16 at 12:19
  • @NickCox Smoothing makes sense. By polynomial smoothing do you mean a weighted moving average? Or perhaps the Savitzky-Golay? I am not familiar with smoothing so not sure if there various methods, and if there are serious benefits or drawbacks regarding my question. I think the chance of their being an influence by day of week is 0.06. – cammil May 26 '16 at 12:55
  • @NickCox Are you also suggesting that I use regular time series methods after the smoothing? How would I account for variable number of data points for each X? – cammil May 26 '16 at 12:56
  • 1
    There are many, many smoothing methods. You need one that doesn't assume regular spacing. I don't think Savitzky-Golay qualifies as usually implemented. For one explanation of local polynomial see http://www.stats.uwo.ca/faculty/bellhouse/stat%209945b/locpoly.pdf What is variously implemented as lowess, loess or locfit has the same broad flavour. – Nick Cox May 26 '16 at 13:00
  • @NickCox the link you gave in your comment is now inaccessible. Do you have another source? I'd really like to hear more about this. – StatsSorceress May 30 '20 at 00:19
  • https://www.researchgate.net/publication/312888374_Local_polynomial_regression_in_complex_surveys may be the same paper. (I don't think I recorded any details for myself to be sure) .https://www.stata.com/manuals/rlpoly.pdf gives Stata details you can skip if not useful but also some examples and several references. – Nick Cox May 30 '20 at 09:39

1 Answers1

3

I'd be inclined to fit either a dummy-seasonal or a trigonometric-seasonal set of predictors in a logistic regression model.

In the case of dummy-seasonal for example you might fit an effect for each month.

If you think seasonality will be smooth across a year you might prefer the low-order terms in a trigonometric parameterization.

If you expect serial dependence you might need to consider form of time series model (but you have varying gaps between points, so it would not be the usual discrete-time form)

Glen_b
  • 257,508
  • 32
  • 553
  • 939