4

I am trying to fit a mixed model to determine the effect of X on Y after controlling for non-independence in my data. Non-independence arises from a variable: ProductID. I have close to 10,000 observations in my dataset and 5500 unique ProductID. Is the average number of observations per ProductID being too less ( < 2) a conceptual problem for fitting the below mixed model in R?

lmer(Y~ X + (1|ProductID), myData)

If it is a conceptual problem, should I be looking at some other grouping variable - such as ProductCategoryID (100 unique ones) - in place of ProductID to account for non-independence?

SanMelkote
  • 621
  • 5
  • 20

1 Answers1

3

lmer should have no problem fitting a model with an average of 1.8 observations per cluster. In general, problems arise where there are too few groups, not too many. The minimum sample size per cluster is 1. Changing the grouping level as you suggest will lead to a loss of information and should be avoided if possible. Instead, it may make sense to specify nested random effects, if there is variation at the ProductCategoryID level:

lmer(Y~ X + (1|ProductCategoryID/ProductID), myData)
Robert Long
  • 53,316
  • 10
  • 84
  • 148