3

I've got a study in which patients (record_id) can have from 1 to 5 aneurysms (concurrently) and each may be treated differently (each aneurysm). We are interested to see whether one treatment (treatmentBinary) is different than the other and what risk factors may contribute to adverse effects.

I've set the data up so that we have one observation per aneurysm and not per patient. That means that one patient may be recorded upwards to 5 observations with a variable aneurysm_id denoting which aneurysm the observation is referring to.

I'm testing the model on this data:

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str32 record_id float(aneurysm_id neurotromb treatmentBinary)
"007128de18ce5cb1635b8f27c5435ff3" 1 0 1
"00abd7bdb6283dd0ac6b97271608a122" 1 0 .
"0142103f84693c6eda416dfc55f65de1" 1 0 .
"0153826d93a58d7e1837bb98a3c21ba8" 1 0 0
"01c729ac4601e36f245fd817d8977917" 1 0 .
"01c729ac4601e36f245fd817d8977917" 2 0 .
"01dd90093fbf201a1f357e22eaff6b6a" 1 0 .
"0208e14dcabc43dd2b57e2e8b117de4d" 1 0 .
"0210f575075e5def7ffa77530ce17ef0" 1 0 1
"022cc7a9397e81cf58cd9111f9d1db0d" 1 0 1
"02afd543116a22fc7430620727b20bb5" 1 0 .
"0303ef0bd5d256cca1c836e2b70415ac" 1 0 .
"0303ef0bd5d256cca1c836e2b70415ac" 2 0 .
"041b2b0cac589d6e3b65bb924803cf1a" 1 0 0
"0536317a2bbb936e85c3eb8294b076da" 1 0 .
"06161d4668f217937cac0ac033d8d199" 1 0 1
"065e151f8bcebb27fabf8b052fd70566" 1 0 1
"065e151f8bcebb27fabf8b052fd70566" 2 0 .
"065e151f8bcebb27fabf8b052fd70566" 3 0 .
"065e151f8bcebb27fabf8b052fd70566" 4 0 1
"07196414cd6bf89d94a33e149983d102" 1 0 1
"0721c38f8275dab504fc53aebcc005ce" 1 0 1
"0721c38f8275dab504fc53aebcc005ce" 2 0 0
"0721c38f8275dab504fc53aebcc005ce" 3 0 0
"0721c38f8275dab504fc53aebcc005ce" 4 0 .
"07bef516d53279a3f5e477d56d552a2b" 1 0 .
"08678829b7e0ee6a01b17974b4d19cfa" 1 0 1
"08bb6c65e63c499ea19ac24d5113dd94" 1 0 1
"08f036417500c332efd555c76c4654a0" 1 1 0
"090c54d021b4b21c7243cec01efbeb91" 1 0 1
"09166bb44e4c5cdb8f40d402f706816e" 1 0 .
"0930159addcdc35e7dc18812522d4377" 1 0 0
"096844af91d2e266767775b0bee9105e" 1 0 .
"09884af1bb9d59803de0c74d6df57c23" 1 0 .
"09e03748da35e9d799dc5d8ddf1909b5" 1 0 0
"0a4ce4a7941ff6d1f5c217bf5a9a3bf9" 1 0 0
"0a5db40dc58e97927b407c9210aab7ba" 1 0 1
"0a5db40dc58e97927b407c9210aab7ba" 2 0 1
"0a73c992955231650965ed87e3bd52f6" 1 0 .
"0a84ab77fff74c247a525dfde8ce988c" 1 0 1
"0a84ab77fff74c247a525dfde8ce988c" 2 0 0
"0a84ab77fff74c247a525dfde8ce988c" 3 0 .
"0af333ae400f75930125bb0585f0dcf5" 1 0 0
"0af73334d9d2166191f3385de48f15d2" 1 0 .
"0b341ac8f396a8cdb88b7c658f66f653" 1 0 1
"0b341ac8f396a8cdb88b7c658f66f653" 2 0 .
"0b35cf4beb830b361d7c164371f25149" 1 0 .
"0b35cf4beb830b361d7c164371f25149" 2 0 .
"0b3e110c9765e14a5c41fadcc3cfc300" 1 . .
"0b6681f0f441e69c26106ab344ac0733" 1 0 1
"0b8d8253a8415275dbc2619e039985bb" 1 0 1
"0b8d8253a8415275dbc2619e039985bb" 2 0 1
"0b8d8253a8415275dbc2619e039985bb" 3 0 0
"0b92c26375117bf42945c04d8d6573d4" 1 0 .
"0b92c26375117bf42945c04d8d6573d4" 2 0 .
"0ba961f437f43105c357403c920bdef1" 1 0 0
"0bb601fabe1fdfa794a5272408997a2f" 1 0 0
"0c75b36e91363d596dc46bd563c3f5ef" 1 0 .
"0d461328a3bae7164ce7d3a10f366812" 1 0 .
"0d4cc4eb459301a804cbef22914f44a3" 1 0 .
"0d4e29e11bb94e922112089f3fec61ef" 1 0 .
"0d4e29e11bb94e922112089f3fec61ef" 2 0 .
"0d513c74d667f55c8f4a9836c304149c" 1 0 .
"0da25de126bb3b3ee565eff8888004c2" 1 0 .
"0da25de126bb3b3ee565eff8888004c2" 2 0 .
"0db9ae1f2201577f431b7603d0819fa6" 1 0 .
"0dd8a681f6a5d4c888831a591e57a747" 1 0 1
"0e05d6958d878368b5fb831211fad6a1" 1 0 .
"0e3ff41e0e2b2cb5ec336fd0b04e5d44" 1 0 0
"0f61e560ab56b8fea1f2593d7d3b2718" 1 0 .
"0f61e560ab56b8fea1f2593d7d3b2718" 2 0 .
"0f69f1f998984d37f133185179d63c60" 1 0 1
"1037032886a93e66406a4c910d1ef747" 1 0 .
"1037032886a93e66406a4c910d1ef747" 2 0 .
"1044b81b354b420e85ae835ea07de2d6" 1 0 .
"10620fc488346291281212a404681386" 1 0 .
"1074389c469944edf026d193a55b1148" 1 0 1
"1090d5a678119b03cddab609289a4d3c" 1 0 0
"111eebb45cef2211a2a2ff0219095e6a" 1 0 .
"11ddcbc8de8ef56cbc578fc81b602ffc" 1 0 1
"11f22488513cf717c333786c789b0289" 1 0 1
"11f22488513cf717c333786c789b0289" 2 0 1
"121552b22cee2a1eb4360b4d2534cd39" 1 0 0
"1251d707c5dc9243dc45d04beb7c3493" 1 0 .
"125689659bb3821fa81698dd72462773" 1 0 .
"127ba572433921c5bb408fc62eb9b5d7" 1 0 0
"129bea3f73e84e37d77d55fadfeb49dd" 1 0 1
"12e8dc6fb87822be26d6678cee9644f5" 1 0 1
"12f05a65f771c9675c2c5e9cdbfc33d1" 1 0 1
"12f05a65f771c9675c2c5e9cdbfc33d1" 2 0 1
"13d2bc86f1a19ed2959cd7354bc92d1d" 1 0 .
"13db5ede38e2ae1da17884c9a18df202" 1 0 1
"13f946e50df8ad74d7cf9fa05b4ad05b" 1 0 .
"146c4b8be7996a9789873fe55a47ab41" 1 0 0
"147fadd87da13a0271225d944d2a5e98" 1 0 1
"14a1dcfa015343bbefaac9a3a45769e5" 1 1 1
"14a1dcfa015343bbefaac9a3a45769e5" 2 1 .
"14d1377f74a63ffa29db2d99e7f6a1ce" 1 0 .
"150017d944a87b4c61f90034380c0659" 1 0 1
"150f6ca1ea453260eabf3472d3ebcad1" 1 0 1
end
[/CODE]

This is an excerpt of 100 observations.

I'm running a mixed logistic model with neurotromb as the dependent and treatmentBinary as independent, and grouping by record_id. I haven't used aneurysm_id in the model and I'm not sure as to whether I should or not?

In any case the model takes a long time to run and never reaches convergence. I don't understand why and am hoping someone perhaps can see?

Thank you.

Robert Long
  • 53,316
  • 10
  • 84
  • 148
Paze
  • 1,751
  • 7
  • 21

1 Answers1

4

You don't want aneurysm_id in the model because this is the measurement-level identifier. For the first question, "whether one treatment is different than the other", your model should look something like:

outcome ~ treatmentBinary + confounders + competing exposures + (1 | record_id)

For the 2nd question, "what risk factors may contribute to adverse effects" the model will be similar, but for each "risk factor" you must ensure that you run a seperate model, where only confounders (and competing exposures) are included, and not meditators. This is because, for the association of variable A with the outcome, a variable B may be a confounder, and should therefore be included, but when assising the association of variable B with the outcome, A would be a mediator and should not be included. A causal diagram is very helpful to determine the set of variables for include. See this answer for more details:
How do DAGs help to reduce bias in causal inference?

As to the specific reasons for Stata having problems converging, this could be related to the small cluster sizes. You could try using -meqrlogit- instead of -melogit-, or alternatively try glmer from the lme4 package in R

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • My model seems to look correct (I'm just trying to run a very simple model to make it run, but the problem is it won't reach convergence). The model is outcome - treatmentBinary || record_id: A simple model but yet won't run. I'm wondering if there is some problem with the data above that would confuse the model? – Paze Aug 25 '20 at 08:12
  • The outomce is binary, right ? Did you try using Stata's `difficult` option ? Try it in R and see what happens: `glmer(outcome ~ (1 | record_id), family=binomial)` – Robert Long Aug 25 '20 at 08:17
  • I ran this: glmer(neurotromb ~ (treatmentBinary | record_id), family=binomial) ------------And got: Error: number of observations (=530) < number of random effects (=880) for term (treatmentBinary | record_id); the random-effects parameters are probably unidentifiable ---------------------- – Paze Aug 25 '20 at 08:32
  • Try `glmer(neurotromb ~ treatmentBinary + (1 | record_id), family=binomial)` – Robert Long Aug 25 '20 at 08:34
  • That seems to run fine: "Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod'] Family: binomial ( logit ) Formula: neurotromb ~ treatmentBinary + (1 | record_id) AIC BIC logLik deviance df.resid 147.2843 160.1029 -70.6421 141.2843 527 Random effects: Groups Name Std.Dev. record_id (Intercept) 52.97 Number of obs: 530, groups: record_id, 440 Fixed Effects: (Intercept) treatmentBinary -13.2453 0.2845 " -------------Any idea why this runs but stata hangs and won't converge? – Paze Aug 25 '20 at 08:42
  • I also tried with ,difficult in stata but to no avail. – Paze Aug 25 '20 at 08:42
  • Are you using `-meqrlogit-` or `-melogit-` ? – Robert Long Aug 25 '20 at 08:49
  • I am using melogit – Paze Aug 25 '20 at 09:02
  • and did you try `-meqrlogit-` ? – Robert Long Aug 25 '20 at 09:08
  • Yes, it returns: . meqrlogit neurotromb treatmentBinary || record_id: Refining starting values: Iteration 0: log likelihood = -161.94647 Iteration 1: log likelihood = -152.50747 (not concave) Iteration 2: log likelihood = -152.50747 (backed up) Performing gradient-based optimization: initial values not feasible – Paze Aug 25 '20 at 09:10
  • I'ce updated my answer. – Robert Long Aug 25 '20 at 09:12
  • Thank you, I'll stick to R for now for this project. Last thing, as I'm not very familiar with R, is -13.24 the coefficient in the analysis (the one I posted), and is 0.2845 the p-value? Also the glmer seems to be a linear analysis while my outcome variable is binary - is this okay? – Paze Aug 25 '20 at 09:17
  • Thank you, I'll stick to R for now for this project. Last thing, as I'm not very familiar with R, is -13.24 the coefficient in the analysis (the one I posted), and is 0.2845 the p-value? Also the glmer seems to be a linear analysis while my outcome variable is binary - is this okay? – Paze Aug 25 '20 at 09:17
  • the "g" in glmer means generalised - it's a logistic model because you specified `family=binomial` and the default link function is the logit. The output will be on the log-odds scale. glmer doesn't produce p values (there are ways to get them but I would strongly advise against it) – Robert Long Aug 25 '20 at 09:20
  • Great, thank you. I'll have to do a bit of reading on how to interpret the model, though. In medical literature, p-values are king and often it's difficult to get published without them. I'll at least need to understand why I'm abandoning them in this case. – Paze Aug 25 '20 at 09:32
  • I hear you. I worked in a medical school for many years. – Robert Long Aug 25 '20 at 09:37