1

I apologise if this has been asked before. If so, please point me the right way. However, I have had a look and cannot find an appropriate answer.

I am attempting to fit cumulative logit models using the Ordinal (clm) and VGAM (vglm) R packages. I have continuous and discrete explanatory variables. Both discrete variables are the number of days it rained in a month, so take a single integer value between 0-31. As the distances between each value are equal regardless of level (0|1 = 15|16 = 30|31) are dummy variables required?

If so, I understand some R regression packages automatically dummy code factors, is this the case with the clm and vglm packages?

Finally, how would you recommend selecting the reference variable for these dummy variables?

  • Here is a similiar question asking about the use of dates in a regression model https://stats.stackexchange.com/questions/65900/does-it-make-sense-to-use-a-date-variable-in-a-regression – Matt L. Aug 14 '18 at 19:06
  • 2
    Possible duplicate of [Does it make sense to use a date variable in a regression?](https://stats.stackexchange.com/questions/65900/does-it-make-sense-to-use-a-date-variable-in-a-regression) – Matt L. Aug 14 '18 at 19:07
  • 2
    I do not think this is a duplicate of that one as the OP here has number of days, not a date strictly speaking @MattL. – mdewey Aug 15 '18 at 12:20
  • See also https://stats.stackexchange.com/questions/332688/what-type-of-data-are-dates/332715#332715 – kjetil b halvorsen Aug 15 '18 at 12:32

1 Answers1

1

If you are prepared to assume that scientifically the difference between 1 day and 2 days is the same as between 20 days and 21 days then you could enter this as continuous either as a linear term or something more complicated.

Since it is a continuous variable there is no reference category strictly speaking although the intercept in your model will be estimated for your covariates all having the value zero.

mdewey
  • 16,541
  • 22
  • 30
  • 57
  • I ended up standardising frequency measure, producing a proportion of days it rained per month. However, I did still have an option to use continuous or categorical with month. I am including month in the model to try to explain any seasonal effects that are not explained by the other covariates. As this is unlikely to be linear I have dummy coded it. However, if other effects demonstrate a non-linear effect I'll likely run a nonparametric ordinal regression model. Thank you for your help! – AwkwardAttempts Aug 16 '18 at 07:27