2

I want to look for a relashionship between the competition facing a hospital and mortality within the hospital. Assuming that patients in the same hospital may be more correlated than patients in different hospitals, I decided to adopt a mixed model. I have a data set of more than 150k rows. The number of hospitals is 720 So I consider the hospital to be a random effect variable. I also consider Trimester (=20 modalities, because the study is 5 years of data divided into trimesters) as a random effect variable. The variables: Hospital_status (The status of the hospital) and Hospital_caseload(number of patients treated by the hospital) are related to the hospital and the other variables are related to the patients.

This is my model:

MultModel<-glmer(Death30~HHI+age+Sex++Emmergency+neoadjTrt+
                    denutrition+Charlson+Right colectomy+
                    colectomie_transverse+Total.colectomy+Hospital_status
                    Hospital_caseload+(1|Trimester)+(1|Hospital_ID),
                  data =data,family=binomial(link="logit"),nAGQ = 0)

However, I have some doubt about rightness of the model. What could be the problems if I don't take into account of hospital effect and fit the model below?

MultModel<-glmer(Death30~HHI+age+Sex++Emmergency+neoadjTrt+
                        denutrition+Charlson+Right colectomy+
                        colectomie_transverse+Total.colectomy+Hospital_status
                        Hospital_caseload+(1|Trimester),
                      data =data,family=binomial(link="logit"),nAGQ = 0)

But if take into account hospital effect, could it be a problem to put in the model the other variables related to hospital (that is Hospital_status and Hospital_caseload)

As a last question, does nAGQ=0 give a good model, I use it because of the slowness of R to run the model. What value should I give to nAGQ to have the most accurate and fastest model?What other tricks can I use to speed up the execution of the model without affecting the quality?

Seydou GORO
  • 505
  • 2
  • 9

1 Answers1

1

What could be the problems if I don't take into account of hospital effect and fit the model below?

You have already answered that question yourself in your opening paragraph:

patients in the same hospital may be more correlated than patients in different hospitals

So, if you don't account for this, the standard errors for the fixed effets will be wrong. Random intercepts are a good way to do this. Alternatives are fitting fixd effects for hospitals, which is not a good idea here since you have so many, or generalised estimating equations (GEE) which can take even longer to fit that mixed models in some circumstances.

But if take into account hospital effect, could it be a problem to put in the model the other variables related to hospital (that is Hospital_status and Hospital_caseload)

There is nothing wrong with that. It is normal to include group level predictors.

As a last question, does nAGQ=0 give a good model, I use it because of the slowness of R to run the model. What value should I give to nAGQ to have the most accurate and fastest model?What other tricks can I use to speed up the execution of the model without affecting the quality?

You might get more accurate results with nAGQ>0. The higher the better. A good way to assess whether you need to is to take some samples from your dataset, run the models with nAGQ=0 and nAGQ>0 and compare the results on the smaller datasets. If you find little difference, then you have a good reason to stick with nAGQ=0 in the full dataset. For example you could randomly choose 72 hospitals and use all the observations from them. Alternatively you could just sample one tenth of the whole dataset. It would be good if you do this as many times as you can.

Apart from that, you can fit nAQG=0, extract the results from the fitted model, then refit the model with nAGQ>0 using the extracted results as the starting values. See this question and answer for details about why and how to do this:
Convergence in Linear Mixed-Effects Model

You could also try a different optimizer using lmercontrol. And finally you could running it on a faster machine and/or a machine with more memory. The cloud is a good and inexpensive way to do that.

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • Thank you @Robert Long for your response. I have more questions for you, please. I don't understand the last method which consists in extracting the results from the fitted model and refitting the model using the extracted results as a starting value. I've never done anything like that before. – Seydou GORO Sep 16 '20 at 10:08
  • @SeydouGORO You are welcome. I have updated the answer with a link that should help. – Robert Long Sep 16 '20 at 10:34
  • I tried the method of refitting using this code: ```ss – Seydou GORO Sep 16 '20 at 14:00
  • Wait, so are you saying that the model in your question doesn't converge ? – Robert Long Sep 16 '20 at 14:33
  • Exactly. With ```nAGQ=0```, there is no convergence problem. I tried to refit the model with nAGQ=1 using the value extracted from the first model as you recommended. So the model did not converge with ```nAGQ=1```. I am wondering if this is not due to the unbalanced structure of my data, because the observation within the hospitals varies between 5 and more than 1200 – Seydou GORO Sep 16 '20 at 14:58
  • There could be many reasons and it's very difficult to isolate the problem. That does seem quite strange. I would try fitting the model on random subsets of the data as per my other suggestion, and perhaps set `maxfun` in `lmerControl` to higher values – Robert Long Sep 16 '20 at 15:04
  • Could I use ```glmerControl``` instead of ```lmerControl```? – Seydou GORO Sep 16 '20 at 15:20
  • Yes, sorry `glmerControl`. Also why do you fit random intercepts for `Trimester` and what happens if you don't? – Robert Long Sep 16 '20 at 17:36
  • Thanks robert Long. Despite its slowness (about 30 mn for running) the model run correctly. Honestly, I don't even know what its purpose is (fitting time as RE), my tutor first wanted me to fit the years (made up of five categories from 2015 to 2019) as random effect. I said I didn't understand the interest of this. Since I have to test several models, some with time in years and others with time in trimester, he finally proposed a model with the trimester as a R.E. Trimester being made up of 20 categories(that I consider, a little large in number), I figured it wouldn't cost me anything. – Seydou GORO Sep 16 '20 at 22:58
  • I would run a model without trimester as a random effect. Code it as numeric and put it in the fixed effects and compare the output with the model that had random effects for it. R doesn't use multiple cores by default, but most machines have them these days. You can run multiple R sessions and have maybe 8 (depending on your hardware) run at the same time. This is only helpful if your model is CPU-bound and not RAM-bound though. You should check the operating system resource manager to see that first. I once had a model that took 8 days to run :( – Robert Long Sep 17 '20 at 06:49