0

I understand that in general MLE ha no requirement that the observation probabilities be identical or even independently distributed.

But in order to decompose MLE in the conventional way — e.g. to express it as a sum of log likelihoods — my text (and several others I've skimmed) describe the observations as being "i.i.d".

For example, in discussing how linear regression can be seen as an MLE my text says

Since the examples are assumed do be i.i.d, the conditional log-likelihood is given by $$\sum_{i=1}^{m} -\log p\left(y^{(i)}|\mathbf{x}^{(i)}; \mathbf{θ}\right) = m\log\sigma-\frac{m}{2}\log(2\pi)-\sum_{i=1}^{m}\frac{\lVert \widehat{y}^{(i)}- y^{(i)}\rVert^2}{2\sigma^2}\tag{5.65}$$

I see how this works, but the only assumption here is that the $p\left(y^{(i)}|\mathbf{x}^{(i)}; \mathbf{θ}\right)$ are independent, and in fact they are explicitly, by construction, not identically distributed: As the text says just a few lines earlier

we define $p\left(y|\mathbf{x}\right) = \mathcal{N}(y;\widehat{y}(\mathbf{x};\mathbf{w}),\sigma^2)$. The function $\widehat{y}(\mathbf{x};\mathbf{w})$ gives the mean of the Gaussian.

In other words (as clearly expressed in the derivation of (5.65)) each observation distribution is assumed to have a different mean ($\widehat{y})$.

Am I misunderstanding this? Do observations probabilities need to be identically distributed in order to decompose them (for example in establishing that a least squares estimator is an MLE) or is is sufficient that they be independent?

orome
  • 368
  • 1
  • 4
  • 15
  • 1
    Decompose as a sum of loglikelihood contributions from each individual observation only needs independence, not iid. – kjetil b halvorsen Aug 24 '17 at 14:57
  • @kjetilbhalvorsen: Good, that makes sense. Any idea what the text is trying to say there, or is it just a mistake? – orome Aug 24 '17 at 14:58
  • What is it the text says that you think is wrong? The likelihood you have written seems to be a concentrated likelihood, the parameters $\beta$ in the mean function seems to be removed, so it is useful for the estimation of the variance parameter. You didnt give us much from the text, but noth9ng in there needs iid. Some results, for example asymptotics of $\hat{\beta}$ might need some restrictions on just *how different* the distributions can be. – kjetil b halvorsen Aug 24 '17 at 15:02
  • @kjetilbhalvorsen: That the $p\left(y^{(i)}|\mathbf{x}^{(i)}\right)$ are identically distributed (after defining them, as is obviously necessary for the demonstration, not to be): "i.I.d". – orome Aug 24 '17 at 15:06
  • That does not make sense. It is the $y_i$ that have distributions, not the $p(y_i \mid x_i)$ – kjetil b halvorsen Aug 24 '17 at 15:09
  • Sorry, being sloppy: That distributions of the $y^{(i)}$, *namely* $p\left(y^{(i)}|\mathbf{x}^{(i)}; \mathbf{θ}\right)$, are identical — when in fact they are not, since each is defined to have a mean of $\widehat{y}^{(i)} \equiv\widehat{y}(\mathbf{x}^{(i)};\mathbf{w})$ (the text conflates $\mathbf{w}$ and $\mathbf{θ}$). – orome Aug 24 '17 at 15:29

0 Answers0