Build a single lm() model from hundreds of stores in R?

Question

I’ve built many lm() models in R, but this is a new challenge.

I have 100s of independent stores as objects. Each stores has simple 2 column time series (1 X and 1 Y). Very simple!

I want to build a single linear model that is based on these independent stores and their corresponding time series.

Is this mixed modeling? If I do a loop I’ll just get 100s of different models, one for each store. But I want 1 single model that is fit to all of them.

Yes, this can be tackled by mixed-effects modelling. You didn't state your dependent or independent variables, but basically your hundreds of stores would simply be a group variable for your observations, and then, in simplified terms, you estimate the fixed effect (the overall average effect) and the random effects (deviations in effects due to sampling of a population). — Firebug, Jan 24 '18 at 18:37

score 1 · Answer 1 · edited Jun 11 '20 at 14:32

I want to build a single linear model that is based on these independent stores and their corresponding time series.

Is this mixed modeling? If I do a loop I’ll just get 100s of different models, one for each store. But I want 1 single model that is fit to all of them.

Mixed-effects modeling is one of the modeling approaches you can employ, which might lend itself better to the data at hand than other paradigms, but in no way we can simply say "this data at hand requires mixed-effects modeling".

Having said that, yes, this paradigm you presented can be tackled by mixed-effects modelling. You didn't state your dependent or independent variables, but basically your hundreds of stores would simply be a group variable for your observations, and then, in simplified terms, you estimate the fixed effect (the overall average effect) and the random effects (deviations in effects due to sampling of a population).

Notice it's important in this paradigm for you to consider the effect in each store unimportant. If you're interested in modeling the stores at hand and perhaps compare their coefficients, then their effects are to be considered fixed-effects.

You can read more about mixed-effects modeling on this site (and I recommend you to do that exactly that, because it can open another can of worms if you don't understand what's happening when applying it) navigating the tag mixed-model:

Also, if you want to build on a correlation structure directly instead of specifying the random effects, give a read on Generalized Least Squares (here's a question of SO comparing gls and lme).

score -3 · Answer 2 · answered Jan 24 '18 at 17:45

-3

I guess you want to gather all the data in one data frame and then fit your model. If so it is simple

df1=data.frame(X=runif(50),Y=runif(50))
df2=data.frame(X=rnorm(50),Y=rnorm(50))
df=rbind(df1,df2)
lmModel=lm(Y~X,data=df)

and you can generalize for as many data frames you have

answered Jan 24 '18 at 17:45

1

This is incorrect, you're breaking the assumption of independence between observations. – Firebug Jan 24 '18 at 18:38
the person who asked did not state such assumption – Jan 24 '18 at 18:51
Yeah, because it's not his assumption. It's a basic assumption of linear modelling as per the `lm` function. – Firebug Jan 24 '18 at 18:53
whether observations between different stores are independent is not basic neither for lm function nor for linear modelling in general – Jan 24 '18 at 18:57
1

The problem lies not in observations from different stores, but in observations from the same store. They are obviously correlated (unless the null is true, and then that defeats your whole modelling approach). Your approach is also under risk of Simpson's Paradox. – Firebug Jan 24 '18 at 19:02
And `lm` is based around maximum likelihood estimation, which reduces to OLS in the case of basic linear models, which explicitly assumes the correlation of residuals is the identity matrix. – Firebug Jan 24 '18 at 19:03

Build a single lm() model from hundreds of stores in R?

2 Answers2