How to test equivalence of two models using MSE?

Question

I want to compare two generalized linear mixed models (GLMM), model A and model B, which differ from their link function. More specifically, I want to test wheter these models are equivalent or model A is better than model B.
The dependant variable is an always positive and continuous variable. Model A is the linear mixed model and model B is the GLMM with the log link function. I want to proove that using the additive model or multiplicative model is equivalent$^*$. I have no assumption about whether the model is additive or multiplicative.
I choose the Mean Squared Error (MSE i.e. $\displaystyle \frac{1}{N}\sum^N_{i=1} \left(\widehat{y}_i - y_i\right)^2$) as criterion for comparing the two models.

Thus, the hypotheses are $H_0 : MSE_A = MSE_B $ and $H_1 : MSE_A < MSE_B$

The test statistic is $T = MSE_A - MSE_B $

As the T-distribution is unknow, I thought to :

create P datasets randomly permuted between the predictions of the two models (i.e. for each $i$, $\widehat{y}_{A,i}$ randomly assigne to the A or B model and $\widehat{y}_{B,i}$ to the other one)
for each permuted dataset $p$, compute the $T^*_p$ statistic
compare the observed $T$ value to the $T^*$ distribution to conclude on the test

Is it a valid process? If not, any suggestion on how to implement this test?

$^*$ My boss want me to use the additive model. I want to proove him (or not) that is equivalent to use the multiplicative model,regarding model performance. Thus, if two models are equivalent, it is more accurate to use the GLMM with log link function because the dependant variable is always positive.

How big is your data set? T-test is for small sample. If it is big enough, you could use the method of bootstrap: Create a huge number of sample P(with replacement) and compare both distributions of MSE. — YCR, Feb 17 '16 at 15:51
Thanks for answering. I used Generalized linear mixed models, my dataset is 6 repeated measures for 32 observations. Why permutation test is for small sample ? As I don't know the T distribution under the null hypothesis, I have to estimate it. Doing bootstrap will not enable to estimate the distribution of the null hypothesis or I missed something. — Marion H., Feb 17 '16 at 17:39
Can you explain why you are interested in this test? From a model selection standpoint, you would simply choose the model with lower $MSE$ (regardless of any hypothesis test). In absence of a priori knowledge, it is the best decision you could make using $MSE$ as your choice criterion. If you are interested in rejecting model $A$ as the "true" model in favor of model $B$, then $MSE$ is not the appropriate statistic to use (a likelihood based or even a Bayesian approach would be more fitting). — Zachary Blumenfeld, Feb 22 '16 at 06:13
@ZacharyBlumenfeld I just edited the post. Hope it will be helpful. Thanks for answering. — Marion H., Feb 22 '16 at 09:44
You might want to consider the answers to [this question](http://stats.stackexchange.com/questions/48714/prerequisites-for-aic-model-comparison/100671#100671) — probabilityislogic, Feb 24 '16 at 09:14
Hi @Krantz I see you've been adding [machine-learning] to a number of questions which ask about [neural-networks] but not machine learning. We have some discussion about this on meta: https://stats.meta.stackexchange.com/questions/5609/how-to-improve-the-description-and-usage-of-the-machine-learning-tag — Sycorax, Apr 07 '19 at 19:41

Jacky1 · Accepted Answer · 2016-02-21T18:22:44.517

2

If you just want to compare the link functions and find which one is the "best", I think you can use a likelihood-ratio test, where you compare the ratio of likelihoods with the quantile of a $\chi^2$ distribution (Wilks theorem). If the test rejects H0:"the likelihoods are equal" then it means that one of the models gets a significantly better fit to the data. So, you can select the one with the highest likelihood. For GLMM, you have numerical methods for evaluating the likelihood, take a look here for instance : http://people.math.aau.dk/~rw/Undervisning/Topics/Handouts/6.hand.pdf Or inside the R package "glmm".

If the variables of your models are not the same, it seems to me that it is more adequate to use criteria like Akaike Information Criteria (AIC) or Bayesian Information Criteria (BIC). This kind of criterion allow to select the model with the best fit to the data, while avoiding overfitting. To compute these criteria, you also have to compute the likelihood of each model. Next, you choose the model with the smaller criterion.

edited Feb 21 '16 at 18:22

answered Feb 21 '16 at 17:59

Jacky1

646
3
9

2

For GLMMs with a small dataset, a simulated LRT (e.g. using simulate() in lme4) may be the safer choice than a parametric test. I don't see the point of AIC / BIC - if models differ only in their link functions, overfitting won't be addressed anyway. – Florian Hartig Feb 21 '16 at 19:02
1

Also, be careful using information criteria to compare models where the likelihood (and consequently the information criteria) is reported for a transformation of the data rather than the raw data (I don't know if this is the case here but I am a little suspicious due to the different link functions). – Richard Hardy Feb 21 '16 at 21:20
I agree with you, it will be good to have a little more information on the models to compare. – Jacky1 Feb 22 '16 at 09:22
@Jacky1 I edited my post. Model A is the linear mixed model and model B is the GLMM with the log link function. – Marion H. Feb 23 '16 at 09:40

How to test equivalence of two models using MSE?

1 Answers1