1

I have an output from two LMER-models and I'd like to calculate AIC & BIC. I believe I've understood the tables correctly, but I'm uncertain regarding the k parameter; have I understood it correctly?

My models:

#Model1
                     Mixed Linear Model Regression Results
================================================================================
Model:                  MixedLM     Dependent Variable:     Volume              
No. Observations:       183         Method:                 REML                
No. Groups:             13          Scale:                  135112391671077.1250
Min. group size:        13          Log-Likelihood:         -3232.0814          
Max. group size:        15          Converged:              Yes                 
Mean group size:        14.1                                                    
--------------------------------------------------------------------------------
                 Coef.          Std.Err.     z   P>|z|    [0.025       0.975]   
--------------------------------------------------------------------------------
Intercept        33526487.924  7715549.309 4.345 0.000 18404289.158 48648686.690
Score             4612862.511  2011290.523 2.293 0.022   670805.524  8554919.498
Group Var 758032009969204.875 27899429.226                                      
================================================================================

# I used k = Interecept+Score+Group Var = 3

#Model2
             Mixed Linear Model Regression Results
================================================================
Model:            MixedLM Dependent Variable: Transformed_Volume
No. Observations: 181     Method:             REML              
No. Groups:       13      Scale:              8698.0110         
Min. group size:  13      Log-Likelihood:     -1098.8257        
Max. group size:  15      Converged:          Yes               
Mean group size:  13.9                                          
----------------------------------------------------------------
                    Coef.   Std.Err.   z   P>|z|  [0.025  0.975]
----------------------------------------------------------------
Intercept           330.777   74.894 4.417 0.000 183.989 477.566
Score                39.102   16.390 2.386 0.017   6.979  71.224
Group Var         71886.612  333.046                            
Group x Score Cov  2161.829   70.984                            
Score Var            65.015   18.319                            
================================================================

# I used k = Intercept+Score+Group Var+Group x Score Cov + Score Var = 5

My calculations:

#AIC & BIC Model1
k1 = 3
l1 = -3232.0814
n1 = np.log(183)

AIC1 = 2*k-2*l1
BIC1 = k1*n1-2*l1

#AIC & BIC Model2
k2 = 5
l2 = -1098.8257 
n2 = np.log(181)

AIC2 = 2*k1-2*l2
BIC2 = k2*n2-2*l2

Result:

AIC1: 6474.1628
BIC1: 6479.791258458525
AIC2: 2203.6514
BIC2: 2223.6438851563294
Robert Long
  • 53,316
  • 10
  • 84
  • 148
OLGJ
  • 71
  • 5
  • 1
    The random effect structure also has parameters in addition to fixed effects. Also you need to fit the model by maximum likelihood instead of REML, so set `method='ML'` to get a likelihood instead of just a *profile* likelihood. Lastly, there are already default method for `AIC` and `BIC` in R that work. – AdamO May 18 '21 at 18:16
  • Thanks @AdamO. Do I understand you correctly that the random effect model is in itself an extra parameter (so k2=6)? I'm using python, so the programs in R are out of bounds. When I try to fit with `reml=False` as done in Python, I get two different error messages: `RuntimeWarning: invalid value encountered in log` & `ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals`. One problem I can identify from this is that my `Score` variable ranges from -1 to 1. Is there any other way to compare the models while still using the REML method? – OLGJ May 18 '21 at 19:14
  • You have used the `lme4-nlme` tag, so @AdamO perhaps presumed that you are using R, since that tag is for specifically for the R packages `lme4` and `nlme` I will edit your question to change the tags. – Robert Long May 18 '21 at 20:37
  • I didn't, but I see that my post was edited by someone who probably thought I was. Thanks @Rob – OLGJ May 18 '21 at 20:56
  • Ahh - OK, no worries, I see there was another edit before mine :) – Robert Long May 18 '21 at 20:58
  • 1
    Calculating degrees of freedom for mixed-effects model is non-obvious in general see https://stats.stackexchange.com/questions/274565/degrees-freedom-reported-by-the-lmer-model-dont-seem-plausible/274585#274585 and https://stats.stackexchange.com/questions/185360/t-value-associated-with-nlme-lme4/185372#185372 – Tim May 19 '21 at 07:04

1 Answers1

3

If you are using statsmodels in python, then you can just use statsmodels.LikelihoodModelResults:

https://www.statsmodels.org/stable/generated/statsmodels.regression.mixed_linear_model.MixedLMResults.html

https://www.statsmodels.org/stable/generated/statsmodels.regression.mixed_linear_model.MixedLMResults.aic.html#statsmodels.regression.mixed_linear_model.MixedLMResults.aic

https://www.statsmodels.org/stable/generated/statsmodels.regression.mixed_linear_model.MixedLMResults.bic.html#statsmodels.regression.mixed_linear_model.MixedLMResults.bic

Mixed models are quite complex models. There are several of packages in R that can fit them, and since the R ecosystem is much more mature than that for python (and R is designed specifically for statistics whereas python is a general purpose programming language), and you seem to be having some difficulties in python, you might want to consider using one of the R packages such as lme4. If you want to stay within python then you can still call R functions from python using the rpy2 python package:
https://rpy2.github.io/doc/v2.9.x/html/introduction.html

Edit:

I really would recommend using R for these kinds of complex models. There is a far bigger user community for mixed models using R than for python.

Another thing you could try is rescaling your variables. Looking at the coefficient estimates particularly for model1, these are very big numbers. It's possible the the software is running into computational problems due to this when fitting.

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • Hi Robert, thanks for answering! If I understand you correctly, your suggestion of using `my_fitted_model.aic` & `my_fitted_model.bic` does not work (it returns `nan`). It's stated here on their github: https://github.com/statsmodels/statsmodels/issues/6153. It's the final calculation I require for my thesis, so chaning is not optimal (I have no R experience), but I'll check out the `rpy2` package. Thanks! – OLGJ May 19 '21 at 09:11
  • @OLGJ I've edited my answer – Robert Long May 19 '21 at 09:25
  • Thank you. I'll accept this answer. Lesson learned -> avoid statsmodels if possible. – OLGJ May 19 '21 at 14:53
  • You're welcome. The good news is that, although the learning curve for R is quite steep there are many great resources on the internet, such as this site as well as stackoverflow of course. – Robert Long May 19 '21 at 15:40