0

I am making a linear regression model and my dependent variable is highly seasonal. Now, I want to add months as independent variables like follows (it is incomplete):

enter image description here

First of all, is it a correct way of adding seasonality? Also Someone said we should not include all the twelve months in the model but he could not remember why. Is that so? Why?

Mahdi
  • 123
  • 6
  • Possible duplicate of [Linear regression - date as dummy variable](https://stats.stackexchange.com/questions/425585/linear-regression-date-as-dummy-variable) – mkt Sep 26 '19 at 08:32
  • @mkt How is this a duplicate? Am I missing something? The other one was about coiding years as dummy variables, this is about months . That seems very different. And this one is about seasonality. – Peter Flom Sep 26 '19 at 12:21
  • 1
    @PeterFlom IMO, the advice in that answer - to not treat time as categorical because it is interval data - addresses the basic question here. Perhaps it's not the ideal duplicate target though. – mkt Sep 26 '19 at 12:27
  • 1
    Both answers to date make excellent points. Much depends on what you are modelling. which is not stated. Economic analyses often use indicator variables for months, partly because seasonality can be spikey (e.g. high sales in December, low productivity in August). Climatic analyses often find that sine and cosine pairs work well to mimic variations in temperature, even rainfall, etc. With your data, watch out because e.g. there may be 52 weeks in some years and 53 in others and even Christmas may fall across different weeks in different years. The small print is the exact definition of week. – Nick Cox Sep 26 '19 at 12:58

2 Answers2

2

I would argue that seasonality, in this case, would no longer be linear. The effects of summer, for example, aren't of the same magnitude at the middle of summer as at the end of it.

This is however exactly what you are doing when coding the months this way. The effect of 'January' is stated to be the same among all weeks in January, including the weeks only partially in January. Additionally the split weeks would have the combined effect of both the end and the beginning month.

Coding the different months instead of the seasons would actually be better for this part of the problem, since you would allow more variability in the effects, instead of pushing the effects into a form which doesn't really suit them.

To really tackle this problem, a non-linear approach might be more appropriate, modelling the seasonal effect with a suitable function, possibly something sine-like.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Mathijs
  • 39
  • 6
2

First, you can't add all the months because you have to have one month as a reference for the other months.

Second, if you have data more finely divided than month (e.g. weekly or daily) that would be good. If you only have month, then, you can make do. Is all your data from the same location? If not, things get more complicated.

Let's assume it is all from one place. Then, using dummy variables like this, you can only get a rough idea of seasonality with months. But you can examine the coefficients of the dummy variables and get a sort of idea. What else might be tried?

  1. If "seasonality" is really "temperature" then you could try to find the average temperature for the month (or smaller time period) and use that. That would make it continuous (and July in one year would not be identical to July in another year). You could use a spline of temperature or, if you have a more specific idea of the effect, you could implement that. But splines are pretty flexible.

  2. You could code month as numeric and then graph the results (maybe a different line for each year). That might give you ideas as to the right model. Maybe some trig function of month (probably transformed somehow). Or maybe a spline.

There are probably other good approaches, too.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276