Modeling both mean and variance in a linear model

Question

I have a variable $X$ that decays log-normally with time, and I have estimated the mean and the SD of that log-linear relationship. I also have a (categorical) variable $Y$ which—I hypothesize—will affect linearly both the mean and the SD. It is this variability between $Y$ and the mean and the SD that I am interested in, and my question is what model is suitable for this.

I have been searching for it and, apparently, what I am looking for is a GLM of the gamma family, but I am not sure why or if there are better alternatives to it. I would appreciate any hint.

Edit: As requested, I give more details and context. In the real world, $X$ represents the level of a certain biomarker of inflammation, which decays log-normally with time, $T$, the range of which goes from 0 (the first measurement) to 120 hours, i.e., I have several measurements per patient, and I have around 1000 measurements overall.

I have another variable, which I called $Y$ in the pre-edit text, which is the type of surgery undertaken by the patient. This is a binary variable ("minimally invasive surgery", "not minimally invasive surgery"). I want to know how this variable (and, potentially, others) affects the mean and variance of the log-normal relationship between the levels of the biomarker and time.

Edit 2: As requested, I provide a plot of the relationship between $X$ and time. I would like to build a model that allows me to simulate data with the same distribution as you see in the image, but taking into consideration the fact that patients may have undertaken either minimally invasive surgery or not minimally invasive surgery. I mean, I don't want "two curves", but addressing the variability in the mean and the SD that the surgery variable introduces.

Could you give some more details and context? Sample size? What does your variables represent in the real world? Can you show us a plot? With a glm you can model, via the li k function and the variance funtion, mean and variance separately. If you in addition want to model the variance as a function of covariables, look into `gamlss` (search this site)( — kjetil b halvorsen, Oct 16 '20 at 16:45
@kjetilbhalvorsen Done! I hope the edit clarifies the problem. GAMLSS may be what I am looking for, though. — Cromack, Oct 16 '20 at 18:51
Please say more about just what you mean by "decays log-normally with time." An example plot would help a lot, as @kjetilbhalvorsen notes, as would a reference to the type of analysis you are trying to perform and some theoretical basis on which you expect both the mean and the variance to change. The variance of most distributions is a function of the mean, so modeling the mean might get you what you want anyway. — EdM, Oct 16 '20 at 20:10
@EdM Done! I understand that "decays log-normally with time" was kind of vague and a plot really helps. I hope this edit clarifies things. — Cromack, Oct 17 '20 at 07:16
This looks like two processes, one with increasing production of the biomarker for a period of time (approx. 24 hours), followed by an exponential decay (linear in log-concentration versus time). Does your have knowledge of the subject matter provide information about the dynamics of both of those processes and how the 2 types of surgery might affect those processes? A model based on fundamental processes would be more useful than simply curve-fitting means/variances. Also: about how many measurements per patient, spaced over what times? — EdM, Oct 17 '20 at 13:18
@EdM Unfortunately, what I know about the biomarker-time relationship is almost limited to the information contained in the plot. I do know, though, that the not minimally invasive surgery would imply a slower decay in the biomarker concentration. Regarding the question of how many measurements per patient, that highly depends on the patient, but in average a patient is measured between 3 and 4 times, spaced over roughly 10 hours. — Cromack, Oct 17 '20 at 18:17

score 3 · Accepted Answer · answered Oct 18 '20 at 15:33

The closer you can bring your model to underlying biological reality, the better. Just fitting an arbitrary distribution to a set of data won't be nearly as satisfying.

The data (plotted on a log scale) look pretty much like they follow a broken stick: a straight upward-sloping line (representing an exponential increase in the original concentration scale) up to about 24 hours, followed by a straight downward-sloping line thereafter (representing an exponential decay of concentration). On the log scale, it looks like the spread of data around those 2 underlying trends is reasonably constant over time, on the order of 1 to 1.5 log-10 units.

So a change-point analysis based on linear modeling in the log scale of concentration seems like a more promising approach. For your data, with a single slope breakpoint in a continuous variable, the segmented package in R might be the simplest of several that allow for such analysis. In particular, you will be able to include the binary surgery-treatment variable as a predictor in the model and directly test what seems (from a comment) to be the main hypothesis: that the type of surgery treatment affects the exponential decay rate.

There will be a few complications with this type of repeated-measures data. For one, the multiple measurements on individuals mean that the observations will not all be independent. Ideally that should be taken into account in terms of differences among individuals in biomarker levels or slopes with respect to time, for example treating those as random effects in a mixed model. (With only 3 or 4 observations per patient and breakpoint times and slopes and intercepts on both sides of the break to be estimated from the data, treating patients as fixed effects probably wouldn't work.) This page discusses how to include random effects into change-point analysis. Or you might find a way to incorporate the change-point analysis into nonlinear modeling and use the nlme function in its eponymous package to handle the random effects.

For another, the paucity of data beyond 48 hours suggests that there might be some systematic differences between the patients who were followed for a long time and those who weren't. That would need to be investigated, along with any systematic differences between the patients who received the two types of treatment.

Thank you for such a thorough answer! It gives some really good starting points for building the model I want. Thanks again! — Cromack, Oct 18 '20 at 19:03

Modeling both mean and variance in a linear model

1 Answers1