4

Suppose there is a set of $n$ independent observations $y_i$ from the exponential family of distributions. How can we prove that in a saturated GLM model, the fitted MLE values $\hat\mu_i = y_i$ for all $1 \leq i \leq n$?

Edit: Using the new definition proposed I am still not sure I get the proof correct. Let $\dot f$ denotes the derivative of $f$ with respect to the canonical parameter $\theta$, $g$ the link function and $V$ the variance function, we know that if $D = \mathrm{diag}\left( [\dot g(\mu_i)V(\mu_i) ]^{-1} \right)$ is a $p\times p$-matrix then it is easy to see that $D$ must have full rank since it is diagonal with non-zero columns. Maximizing the likelihood means $X^TD(y-\mu) = 0$. If this arises from a saturated model, then by the answer below, $\rm{rank}\ X = p$, We know that the product of two matrices of full rank has full rank, and therefore if we can somehow see that $X$ is square, the invertible matrix theorem tells us there exist an inverse, C, for $X^T$ such that $$ D^{-1}CX^TD(y - \mu) = D^{-1}C 0 = 0 \qquad \text{only if} \qquad y - \mu = 0. $$ So is there something that forbids linearly dependent columns of $X$? Can $X$ be singular? As far as I can tell, it would nail the proof. What do you think?

Mikkel Rev
  • 721
  • 3
  • 16

1 Answers1

1

Do you have a working definition of a saturated model? It might help to develop one and work from there. If $Y$ is continuously valued with no ties, then you could claim that the rank of $\mathbf{X}$, e.g. the model matrix, is equal to the sample size $n$. If there are ties in $Y$, you could simplify things by using the weighted likelihood to create frequency weights for non-distinct $Y$ observations and maintain that definition.

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • We know that $X^T D(y-\mu) = 0$, where $D$ is the diagonal matrix of $[g'(\theta) V(\theta)]^{-1}(\mu_i)$ for link function $g$. By invertible matrix theorem, $X^T D (y-\mu) = 0$ has only the trivial solution $(y-\mu) = 0$ if $X^T D$ has full rank. This is the case if and only if $X^T D$ has full rank and is a square matrix, this would complete the proof. Is there something that forbids $X^T D$ to be non-square in a saturated model? Is it impossible to have more predictors than observations? Can we somehow guarantee that $D$ has full rank? It is diagonal, but is this enough? – Mikkel Rev Nov 30 '16 at 03:09
  • Actually, $D$ has full rank. Therefore it is sufficient to see that $X$ is square. Is there something in the definition of GLM that forbids X be $m \times n$ for $m>n$, intuitively this is a problem because it would mean there are two linearly depedent rows in X. – Mikkel Rev Nov 30 '16 at 03:27
  • @MariusJonsson $\mathbf{X}^T\mathbf{D} \left( y- \mathbf{X}\hat{\beta} \right) = 0$ is the likelihood equation for a GLM. You should focus on the structure of $\mathbf{X}$ as it relates to $\beta$, especially in terms of being a projection. – AdamO Dec 02 '16 at 18:00