Best GLM to model a random variable that represents a count over a non-fixed interval of time

Question

The random variable on which I am seeking to fit a GLM is the number of times a patient has a blood glucose level measurement above a specified threshold before they are escalated onto a stronger therapy. My goal is to examine the fitted model to determine if there are any factors which are associated with an increased number of measurements before therapy escalation.

The type of hypothesis I would wish to test is: Being male is associated with having more blood glucose measurements above the escalation threshold compared with females.

Intially I fit a poisson regression model and interpreted the coefficient associated with being male as the average increased number of measurements before escalation associated with gender = male.

However, it seems this process is not strictly a poisson process because I am not measuring the count of events in a fixed time interval. Therefore, because the patients have medical histories of different lengths I have to adjust for this fact by including year of treatment start as predictor variable that is an input into the model.

y = the vector of measurements above threshold counts observed in patients 1 through n X = the matrix of predictors a.k.a. regressors (n x m) where m is the number of explanatory variables including gender and year of treatment start.

I did consider using a negative binomial GLM which would correct for the fact that the variance does not equal the mean for this random variable but is there a better technique?

score 1 · Accepted Answer · answered Nov 23 '20 at 19:41

1

You're probably on the right track with Poisson regression, but you need to adjust for the treatment duration. This is done using the offset function in R's glm, and although the documentation for this isn't great, this answer is very helpful. In your case, it would be something like

my_model = glm(cases ~ gender + offset(log(treatment_duration)), 
               data=my_data, family=poisson)

However, this analysis may run into problems if there is a link between the number of times a patient has a blood glucose level measurement above a threshold and whether or not a patient's therapy is escalated (or some interaction between this and gender), since this would create some serious and possibly insurmountable confounds. Without knowing more, all I can say for now is good luck!

answered Nov 23 '20 at 19:41

Eoin

4,543
15
32

Thanks Eoin - very helpful. I was not aware of the offset function - I will certainly look into it. Interesting you raised the issue of confounding when there is an interaction between the number of measurements above target and the therapy escalation: this is almost certainly the case: each measurement above the threshold is a signal that the doctor should escalate the therapy. Maybe I will have to reformulate the problem as a logistic regression where y=1 if therapy is escalated after 3 or fewer measurements above threshold, y=0 if no escalation after 3 or more measurements above threshold – Jude Wells Nov 23 '20 at 20:27
See https://stats.stackexchange.com/questions/306494/can-a-variable-be-used-both-as-an-offset-and-an-independent-variable/306497#306497 – kjetil b halvorsen Nov 23 '20 at 21:15

Best GLM to model a random variable that represents a count over a non-fixed interval of time

1 Answers1