Difference between saturated model and usual model in GLM

Question

I'm quite struggling in the understanding of the concept of saturated model.

I give a brief context:

Suppose that we have $N$ observations of $(Y, X_1, X_2)$.
Suppose that each $Y_i$ is Poisson distributed with the parameter $\lambda_{i}$. Our wish is to estimate the $E(Y_i|X_{i1}, X_{i2})$ which is equivalent to estimate $\lambda_{i}$ as $i:1 \to N$

Suppose we are fitting a GLM model (we call this fitted model) by assuming the relationship: $g(\mu_{i}) = \beta_1 X_1 + \beta_2 X_2$ $(1)$
We estimates $\beta_1, \beta_2$ by using maximum likelihood estimation, then we replace back $\beta_1, \beta_2$ into the function $(1)$ to have the estimation of $\mu$.

Consider another type of model: the saturated model. It is defined to be the model that has as many parameters (which is $\mu$) as number of observations (which is $N$ in this example).

My understanding about the saturated model is as follows:

There are $N$ parameters which are $\mu_1, \mu_2, ..., \mu_N$ to be estimated.

The difference between saturated model and fitted model is that in the saturated model, we estimate the $\mu_1, \mu_2, .., \mu_N$ whereas in the fitted model, we estimate the $\beta_1, \beta_2$

I have 2 questions please:

I see that in the fitted model, there are also $N$ parameters $\mu_1, \mu_2, ..., \mu_N$ to be estimated (as we have $N$ observations $Y_1, Y_2, ..., Y_N$ where each $Y_i$ has its own $\mu_i$, and we estimate these $\mu_i$ by first estimating the coefficients $\beta_1, \beta_2$ and replace back into the link function).
So why we define the saturated model to be "the model that has as many parameter as number of observation" where it is also the case of usual fitted model ?
Does the saturated model depends on the explanatory variables $X_1, X_2$ ? I think it does not, but in the answer of Taylor in this thread In a GLM, is the log likelihood of the saturated model always zero?, I see that the design matrix $\mathbb{X}$ becomes a matrix that have $1$ on its diagonal and $0$ elsewhere. As to me, it suggests that the saturated model depends on the explanatory variables $X_1, X_2$ but I don't know why the design matrix $\mathbb{X}$ is transformed into such a new form (i.e. a matrix of $1$ and $0$) when we deal with saturated model.

Thank you very much for your help!

@awhug: Hi, thank you very much for your comment. It clarifies things a lot. Regardings to your "key point", actually I think if given a dataset and an assumption that $Y$ follows some exponential family distribution, we only have 1 saturated model (i.e. we estimate it by taking the log likelihood, take the derivative with respect to all the $\mu_i$, then calculate $\mu_i$, then that 's all, and as there is only 1 unique set of $\mu_1, ..\mu_N$ that maximize the likelihood of saturated model, i think it implies that there is only 1 saturated model). — InTheSearchForKnowledge, Jun 10 '21 at 08:26
A saturated model may come from a specific dataset and therefore have a unique set of $\beta$ parameter estimates, but all saturated models will have the same log-likelihood, residual deviance, and make the same predictions. — awhug, Jun 10 '21 at 08:49
@awhug: thanks for your response! I think with regard to a saturated model, there are no $\beta$ parameter to be estimated. I think what we need to estimate directly is the $\mu_i$ instead right ? — InTheSearchForKnowledge, Jun 10 '21 at 09:51
I don't think that's quite right. A fitted saturated Poisson model has as many $\beta$ parameters as $N$ observations - this is what makes it a saturated model. However if you want to know the predicted value on the outcome $y_i$ in your dataset of any given saturated model (which can also be used in the log-likelihood function), you don't need to estimate anything at all really, because you already know $g(\mu_i) = y_i$ (i.e. the predictions on the outcome are perfect). — awhug, Jun 10 '21 at 10:35
@awhug: Hi, thanks for your response. I think we have to estimate the $\mu_i$ anyway in the saturated model right, https://stats.stackexchange.com/questions/184753/in-a-glm-is-the-log-likelihood-of-the-saturated-model-always-zero#:~:text=The%20log%2Dlikelihood%20of%20the%20saturated%20model%20is%20in%20general,(the%20difference%20between%20deviances) . This link provides how can we estimate the $\mu_i$ — InTheSearchForKnowledge, Jun 10 '21 at 10:52
That answer shows why the log-likelihood is not always zero (which is true). But as the answer itself suggests, in the saturated model $\mu_i = y_i$. If you try substituting $y_i$ as the estimated $\mu_i$ into the last formula (dropping any $y_i = 0$ because $\log(0)$ is undefined), you'll find it yields the log-likelihood of any given saturated model. — awhug, Jun 10 '21 at 12:07
[Here's some R code demonstrating what I mean](https://gist.github.com/awhug/f2461864ce39d29f8c862ac52232a09f) — awhug, Jun 10 '21 at 12:31
@awhug: Hi, thanks for your response and the code R. There is one thing that confuses me, in your example when you fit 2 saturated models, i see that they are 2 saturated models of DIFFERENT dataset (because the $\mathbb{X}$ are different). What i mean is that given 1 dataset and 1 assumption about the distribution, there is only 1 saturated model, and the saturated model is fitted using the maximum likelihood estimation, and it turns about to predict perfectly the dataset (as it is designed to be so). What do you think ? Thank you so much for your response. — InTheSearchForKnowledge, Jun 10 '21 at 17:25
I see what you mean, but I'm not sure I agree. If you have a dataset where you're *exclusively* using $N - 1$ explanatory variables (EV's) you've observed, so $N_\beta = N_i$ (including the intercept), this almost makes sense. But even then, you can keep adding interactions between EVs and/or polynomials for non-linear effects. This applies even when you've only got a few EVs ([increasing interactions example here](https://stats.stackexchange.com/a/493794)). Also, if you have more EVs than observed outcomes, there's lots of different possible combinations that form different saturated models. — awhug, Jun 10 '21 at 23:47
Here's a small example of what I mean. Say we have $N = 5$ poisson-distributed observed outcomes and 4 explanatory variables, $A$ through $D$. We could create a saturated model as $g(\mu_i) = \beta_0 + \beta_1 * A + ... + \beta_4 * D$. But we could also investigate an $A$ and $B$ interaction, creating a saturated model as $g(\mu_i) = \beta_0 + \beta_1 * A + \beta_2 * B + \beta_3 * C + \beta_4 * AB$. Likewise for a $A$ and $C$ interaction, or for an $A^2$ explanatory variable, etc etc. Now admittedly, the design matrix $X$ will differ for these, but all models come from the same dataset — awhug, Jun 11 '21 at 00:08
@awhug: Thank you very much for your help. I am wondering what the technical condition makes every saturated model (in a given dataset) predicts perfectly the outcome (i.e. $g(\mu_i) = Y_i$. Is it due to the usage of the maximization of likelihood estimation (in which we differentiate the likelihood function with respect to each $\mu_i$ ? (because i see that all saturated model are differents, but the only common thing is the MLE of $\mu$ of these models. Thank you for your help and this is the last question of mine. — InTheSearchForKnowledge, Jun 11 '21 at 07:42
No problems at all! My pleasure. Best answer I've come across on this site is [this one](https://stats.stackexchange.com/a/135819) - it's motivated through OLS, but the same principle applies to a GLM. I believe it's a result of having as many equations as unknowns, rather than a result of MLE per se. — awhug, Jun 11 '21 at 08:23
@awhug: Hi, i would like to ask for your help one more time please. We say that the "saturated model" is the model that has as many parameters as number of datapoints, then "parameters" here means the $\mu_1, .. \mu_N$ or the coefficients $\beta_1, ..., \beta_N$ ? (i.e. I suppose that the explained variable $Y_i$ follows a Poisson distribution($\mu_i$). I am so confused about it. Thank you very much for your help! — InTheSearchForKnowledge, Jun 12 '21 at 07:26

Difference between saturated model and usual model in GLM

0 Answers0

Linked