I understand that in general MLE ha no requirement that the observation probabilities be identical or even independently distributed.
But in order to decompose MLE in the conventional way — e.g. to express it as a sum of log likelihoods — my text (and several others I've skimmed) describe the observations as being "i.i.d".
For example, in discussing how linear regression can be seen as an MLE my text says
Since the examples are assumed do be i.i.d, the conditional log-likelihood is given by $$\sum_{i=1}^{m} -\log p\left(y^{(i)}|\mathbf{x}^{(i)}; \mathbf{θ}\right) = m\log\sigma-\frac{m}{2}\log(2\pi)-\sum_{i=1}^{m}\frac{\lVert \widehat{y}^{(i)}- y^{(i)}\rVert^2}{2\sigma^2}\tag{5.65}$$
I see how this works, but the only assumption here is that the $p\left(y^{(i)}|\mathbf{x}^{(i)}; \mathbf{θ}\right)$ are independent, and in fact they are explicitly, by construction, not identically distributed: As the text says just a few lines earlier
we define $p\left(y|\mathbf{x}\right) = \mathcal{N}(y;\widehat{y}(\mathbf{x};\mathbf{w}),\sigma^2)$. The function $\widehat{y}(\mathbf{x};\mathbf{w})$ gives the mean of the Gaussian.
In other words (as clearly expressed in the derivation of (5.65)) each observation distribution is assumed to have a different mean ($\widehat{y})$.
Am I misunderstanding this? Do observations probabilities need to be identically distributed in order to decompose them (for example in establishing that a least squares estimator is an MLE) or is is sufficient that they be independent?