I am facing a similar situation as that addressed in this question, and the accepted answer there has helped me a lot, but I need to resolve a doubt.
The accepted answer draws on the excellent resource "Matrix Cookbook" to show that
$$\frac{\partial \mathbf{L}}{\partial \mathbf{\Sigma}}= -1/2\left(\mathbf{\Sigma^{-1}-\Sigma^{-1}(y-\mu)(y-\mu)'\Sigma^{-1}}\right)$$
where $\mathbf{L}$ is the log likelihood of the gaussian vector $\mathbf{y}$ with covariance matrix $\mathbf{\Sigma}$ and mean $\mathbf{\mu}$
If I'm not mistaken, to solve for the $\mathbf{\Sigma}$ that maximizes $\mathbf{L}$, one would then set $\frac{\partial \mathbf{L}}{\partial \mathbf{\Sigma}}=0$ and get
$$\mathbf{\Sigma^{-1}=\Sigma^{-1}(y-\mu)(y-\mu)'\Sigma^{-1}}$$
Pre- and post-multiplying by $\mathbf{\Sigma}$ gives
$$\mathbf{\Sigma=(y-\mu)(y-\mu)'}$$
Which brings me to my doubt. The RHS of this is an outer product with determinant 0 and thus not invertible. But the covariance matrix $\mathbf{\Sigma}$ on the LHS must be invertible. Likewise, the intermediate equation
$$\mathbf{I=\Sigma^{-1}(y-\mu)(y-\mu)'}$$
seems to be a contradiction since $\mathbf{[(y-\mu)(y-\mu)']^{-1}}$ does not exist.
Anyways, what this seems to say to me is that the condition for maximum $\mathbf{L}$ requires that $\mathbf{\Sigma}$ be rank deficient with determinant 0, in which case it could not really be called a covariance matrix, and $\mathbf{L}$ would be undefined at its maximum.
And yet the answerer says that he/she uses these formulae all the time for ML parameter estimation, so I guess I am missing something. Please help.