GLM: Show that in a saturated model, the fitted MLE values $\hat \mu_i = y_i$ for all $i$

Question

Suppose there is a set of $n$ independent observations $y_i$ from the exponential family of distributions. How can we prove that in a saturated GLM model, the fitted MLE values $\hat\mu_i = y_i$ for all $1 \leq i \leq n$?

Edit: Using the new definition proposed I am still not sure I get the proof correct. Let $\dot f$ denotes the derivative of $f$ with respect to the canonical parameter $\theta$, $g$ the link function and $V$ the variance function, we know that if $D = \mathrm{diag}\left( [\dot g(\mu_i)V(\mu_i) ]^{-1} \right)$ is a $p\times p$-matrix then it is easy to see that $D$ must have full rank since it is diagonal with non-zero columns. Maximizing the likelihood means $X^TD(y-\mu) = 0$. If this arises from a saturated model, then by the answer below, $\rm{rank}\ X = p$, We know that the product of two matrices of full rank has full rank, and therefore if we can somehow see that $X$ is square, the invertible matrix theorem tells us there exist an inverse, C, for $X^T$ such that $$ D^{-1}CX^TD(y - \mu) = D^{-1}C 0 = 0 \qquad \text{only if} \qquad y - \mu = 0. $$ So is there something that forbids linearly dependent columns of $X$? Can $X$ be singular? As far as I can tell, it would nail the proof. What do you think?

Is this a question from a course or textbook? If so, please add the `[self-study]` tag & read its [wiki](http://stats.stackexchange.com/tags/self-study/info). — gung - Reinstate Monica, Nov 29 '16 at 17:31
Can't you write the design matrix $X$ for the saturated model explicitly? See http://stats.stackexchange.com/questions/283. — whuber, Nov 30 '16 at 05:26

score 1 · Answer 1 · answered Nov 29 '16 at 18:36

1

Do you have a working definition of a saturated model? It might help to develop one and work from there. If $Y$ is continuously valued with no ties, then you could claim that the rank of $\mathbf{X}$, e.g. the model matrix, is equal to the sample size $n$. If there are ties in $Y$, you could simplify things by using the weighted likelihood to create frequency weights for non-distinct $Y$ observations and maintain that definition.

answered Nov 29 '16 at 18:36

AdamO

52,330
5
104
209

We know that $X^T D(y-\mu) = 0$, where $D$ is the diagonal matrix of $[g'(\theta) V(\theta)]^{-1}(\mu_i)$ for link function $g$. By invertible matrix theorem, $X^T D (y-\mu) = 0$ has only the trivial solution $(y-\mu) = 0$ if $X^T D$ has full rank. This is the case if and only if $X^T D$ has full rank and is a square matrix, this would complete the proof. Is there something that forbids $X^T D$ to be non-square in a saturated model? Is it impossible to have more predictors than observations? Can we somehow guarantee that $D$ has full rank? It is diagonal, but is this enough? – Mikkel Rev Nov 30 '16 at 03:09
Actually, $D$ has full rank. Therefore it is sufficient to see that $X$ is square. Is there something in the definition of GLM that forbids X be $m \times n$ for $m>n$, intuitively this is a problem because it would mean there are two linearly depedent rows in X. – Mikkel Rev Nov 30 '16 at 03:27
@MariusJonsson $\mathbf{X}^T\mathbf{D} \left( y- \mathbf{X}\hat{\beta} \right) = 0$ is the likelihood equation for a GLM. You should focus on the structure of $\mathbf{X}$ as it relates to $\beta$, especially in terms of being a projection. – AdamO Dec 02 '16 at 18:00

GLM: Show that in a saturated model, the fitted MLE values $\hat \mu_i = y_i$ for all $i$

1 Answers1