6

Edit: I know some people vote this question is off-topic since it is more like a Cross Validated question. However, I am not here to ask about the coding thing (but I might word in the wrong way). I put a piece of my code here just in case the mistake is in my code, or I mis-specified the model. But the question is about how to use or adjust the Bayesian Model Selection (BIC) methods in the mixed effects model situation. We know that BIC is not normally used in this situation because of its poor and slow performance. That's why we want to investigate it and improve it. But before we do the improvement we want to reproduce the "poor" performance at first.

Suppose we have the linear mixed effects model (multiple mesurements per subject $i$) as $$y_{ij}=\begin{pmatrix}1 & x_{ij2} & x_{ij3} & x_{ij4} \end{pmatrix}\begin{pmatrix} \alpha_1 \\ \alpha_2\\ \alpha_3\\ \alpha_4\end{pmatrix}+\begin{pmatrix} 1 & x_{ij2} & x_{ij3} & x_{ij4} \end{pmatrix}\begin{pmatrix} \beta_{i1}\\\beta_{i2}\\\beta_{i3}\\ \beta_{i4}\end{pmatrix}+ \epsilon_{ij}$$ where $\alpha=\begin{pmatrix} \alpha_1 \\ \alpha_2\\ \alpha_3\\ \alpha_4\end{pmatrix}$ is the fixed effect and $\beta_i=\begin{pmatrix} \beta_{i1}\\\beta_{i2}\\\beta_{i3}\\ \beta_{i4}\end{pmatrix} \sim N({\bf 0}, {\bf D})$ is the random effects, where ${\bf D}_{44}=0$, i.e. we are forcing $\beta_{i4}=0$. And $\epsilon_{ij} \stackrel{iid}{\sim}N(0,\sigma^2)$. We want to compare these two models:

The full model $$y_{ij}=\begin{pmatrix}1 & x_{ij2} & x_{ij3} & x_{ij4} \end{pmatrix}\begin{pmatrix} \alpha_1 \\ \alpha_2\\ \alpha_3\\ \alpha_4\end{pmatrix}+\begin{pmatrix} 1 & x_{ij2} & x_{ij3} & x_{ij4} \end{pmatrix}\begin{pmatrix} \beta_{i1}\\\beta_{i2}\\\beta_{i3}\\ \beta_{i4}\end{pmatrix}+ \epsilon_{ij}$$

And the reduced model ($\bf \text{which is the true model!}$) $$y_{ij}=\begin{pmatrix}1 & x_{ij2} & x_{ij3} & x_{ij4} \end{pmatrix}\begin{pmatrix} \alpha_1 \\ \alpha_2\\ \alpha_3\\ \alpha_4\end{pmatrix}+\begin{pmatrix} 1 & x_{ij2} & x_{ij3} \end{pmatrix}\begin{pmatrix} \beta_{i1}\\\beta_{i2}\\\beta_{i3}\end{pmatrix}+ \epsilon_{ij}$$ And even simpler models: $$y_{ij}=\begin{pmatrix}1 & x_{ij2} & x_{ij3} & x_{ij4} \end{pmatrix}\begin{pmatrix} \alpha_1 \\ \alpha_2\\ \alpha_3\\ \alpha_4\end{pmatrix}+\begin{pmatrix} 1 & x_{ij2} \end{pmatrix}\begin{pmatrix} \beta_{i1}\\\beta_{i2}\end{pmatrix}+ \epsilon_{ij}$$ and

$$y_{ij}=\begin{pmatrix}1 & x_{ij2} & x_{ij3} & x_{ij4} \end{pmatrix}\begin{pmatrix} \alpha_1 \\ \alpha_2\\ \alpha_3\\ \alpha_4\end{pmatrix}+\begin{pmatrix} 1 \end{pmatrix}\begin{pmatrix} \beta_{i1}\end{pmatrix}+ \epsilon_{ij}$$ However, when I using lmer in r to fit the model, the BIC always show that the simplest model is the correct one, which is not true.

Here is the code for doing the lmer, where y is the observations for all subjects, and X is the design matrix with first column all 1s. So did I specify my model correctly in lmer?

#Full model
lmer(y ~ X -1 + (0+ X|subject))
#reduced model 1
lmer(y ~ X -1 + (0+ X[,1:3]|subject))
#reduced model 2
lmer(y ~ X -1 + (0+ X[,1:2]|subject))
#reduced model 3
lmer(y ~ X -1 + (0+ X[,1]|subject))

Many thanks!

Edit: In particular, we consider the case where we have 200 subjects, with 8 observations per each. The covariates $x_{ij}=(x_{ij1},...,x_{ij4})^T$ are simulated by fixing $x_{ij1}=1$ and then generating $x_{ijk} \sim \text{Uniform}(-2,2)$ for $k=2,3,4.$ So $X_i$ will look like $$X_i=\begin{pmatrix}1 & x_{i11} & x_{i12} & x_{i13}\\ 1 & x_{i21} & x_{i22} & x_{i23}\\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_{i81} & x_{i82} & x_{i83} \end{pmatrix}_{8 \times 4}$$

And we will choose $\alpha=(1,1,1,1)^T$ and $\sigma^2=1$ in the model with $\beta_i=(\beta_{i1},...,\beta_{i4})^T \sim N(0, D)$, where $$D=\begin{pmatrix} 9 & 4.8 & 0.6 & 0 \\ 4.8 & 4 & 1 & 0 \\ 0.6 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix}_{4 \times 4}$$ Note here, by forcing $D_{44}=0$ and $\mathbb E(\beta_i)={\bf 0}$, we actually force $\beta_{i4}=0$,

Nan
  • 597
  • 2
  • 11
  • 1
    Could you describe your data as well. Also, you could simulate data in order to see what the expected scores are for the AIC or BIC, when the true model is correct. The AIC and BIC score might turn out to be unlucky for your particular data/sample, but also, in general, it is only an estimate and is only valid asymptotically (only for large sample numbers the likelihood ratio is close to a chi-squared distributed variable). – Sextus Empiricus Feb 09 '19 at 22:43
  • @MartijnWeterings Hi, thanks for replying. So I realized that adding the random effects will actually increase the residual, but I want to compare the models all of which have random effects. Sorry I might model it wrong. If it is not the case, so is there any other ways to fit these models? – Nan Feb 09 '19 at 23:16
  • I was a bit off. The residual, $x \beta + \epsilon$, indeed increases but the error/residual term that is actually used in [the likelihood function](https://stackoverflow.com/questions/20980116), $\epsilon$, decreases. So adding those extra terms does increase the log-likelihood after all. – Sextus Empiricus Feb 10 '19 at 00:27
  • There is however an additional term in the likelihood function for the additional parameters $\beta$, and this changes the entire likelihood function making it not a comparison of nested models (and making a comparison by BIC or AIC wrong). Furthermore, there is an [issue with the degrees of freedom](https://stats.stackexchange.com/questions/146988). – Sextus Empiricus Feb 10 '19 at 00:29
  • An example that demonstrates the problem of the changing likelihood function and the models not being nested is when you change the variances a bit by multiplying $\beta$ with 100 and dividing $\epsilon$ with 100. Then the likelihood will decrease for additional terms parameters (so the criteria will increase even without considering a penalty term). – Sextus Empiricus Feb 10 '19 at 00:45
  • @MartijnWeterings Thanks for your suggestions. But I think here we can not generate the 1600 observations, `y`, one by one as shown in your code. Since within one subject let's say $i$, the 8 observations $y_{ij}$ are not independent with each other. And if you generate multinormal $y_i \sim N(X_i\alpha, X_iDX_i^T+\sigma^2I_8)$, then do the `lmer`, the results will change dramatically. – Nan Feb 10 '19 at 05:16
  • I computed it like $y_{ij} \sim N(X_{ij} (\alpha+\beta_i), \sigma^2 I_8)$ this makes them not independent because of, $\beta_i$, the dependent mean term. You could indeed translate this into some correlation structure and have the conditionality on $X$ the same for all subjects $i$. – Sextus Empiricus Feb 10 '19 at 10:33
  • Is your question actually about BIC specifically (and why it turns out like this for this case), or is it about comparing models more general. In that latter case, what is it that you are trying to do? Why are you comparing the models with different random errors? If you have some good reasons to do this, then I guess you might develop some cross validation method (instead of using the BIC) that satisfies your needs. – Sextus Empiricus Feb 10 '19 at 10:43
  • @MartijnWeterings Thanks for your comments. I was actually repeating the simulation study from a paper I was reading. In that paper they are using BIC to compare the model with different random errors. – Nan Feb 10 '19 at 16:39
  • @MartijnWeterings Sorry I might get it wrong but I still think that your $y_{ij}$ are independent since the off diagonal part of the covariance matrix are zeros. – Nan Feb 10 '19 at 16:47
  • the $y_{ij}$ dependency stems from using the same $\beta_i$ for different observations $j$. It is like your formula:$$y_{ij}=\begin{pmatrix}1 & x_{ij2} & x_{ij3} & x_{ij4} \end{pmatrix}\begin{pmatrix} \alpha_1 \\\alpha_2\\\alpha_3\\\alpha_4\end{pmatrix}+\begin{pmatrix} 1& x_{ij2} & x_{ij3} & x_{ij4} \end{pmatrix}\begin{pmatrix}\beta_{i1}\\\beta_{i2}\\\beta_{i3}\\\beta_{i4}\end{pmatrix}+\epsilon_{ij}$$ what you did is convert this into a $$y_{ij}=\begin{pmatrix}1 &x_{ij2} &x_{ij3} &x_{ij4} \end{pmatrix}\begin{pmatrix} \alpha_1\\ \alpha_2\\ \alpha_3\\ \alpha_4\end{pmatrix}+\hat\epsilon_{ij}$$ – Sextus Empiricus Feb 10 '19 at 17:08
  • a different note: your question is placed on hold for being off-topic. I do not believe that it is about `lmer` or programming specifically and it is much more about the statistics behind the code, therefore it should be left open. However, your exact problem is a bit unclear. Is it about the use of BIC for this specific case? Or is there something more general? Stating the underlying problem or motivation might make it more clear. – Sextus Empiricus Feb 10 '19 at 17:17
  • @MartijnWeterings Thanks for the comments. But I am not quite sure we can convert your model to my model. According to the paper I was reading and the wiki page about mixed effects model, https://en.wikipedia.org/wiki/Mixed_model, the random effects does not change the mean of the observations but the covariance of the error. I will go and check more resources but deeply appreciate your comments. The only thing that confused me with your model is I did not see within one subject the observations $y_{ij}$ are related. – Nan Feb 10 '19 at 18:26
  • 1
    https://stats.stackexchange.com/questions/325306/intuition-about-parameter-estimation-in-mixed-models-variance-parameters-vs-co – Sextus Empiricus Feb 10 '19 at 19:37

0 Answers0