As @jbowman explains in the comment above, the specific form of deviance
$D_i= -2 \sum_k n_{ik} \log(p_{ik})$ is only true for the classification
problems discussed in the referenced post.
For linear models with Normal errors, the result $SSE = D = \Sigma (y_i - \mu)^2$
can be derived from the general definition of deviance ($D$),
scaled deviance ($D^*$) and the normal pdf as follows:
First remember that
$
f(y; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y-\mu)^2}{2\sigma^2}\right)
$
then for a single observation we have
$$
\begin{align}
D^{*}(y, \mu) &= 2l(y;y) - 2l(\mu;y) \\
&= 2\log(f(y;y)) - 2\log(f(y;\mu))\\
&= 2 \bigg[-\frac{1}{2}\log(2 \pi \sigma^2) - \frac{(y - y)^2}{2\sigma^2}\bigg]
- 2 \bigg[-\frac{1}{2}\log(2 \pi \sigma^2) - \frac{(y - \mu)^2}{2\sigma^2}\bigg]\\
&= \frac{(y - \mu)^2}{\sigma^2}
\end{align}
$$
and $D^{*}(y; \mu) = \frac{D(y; \mu)}{\phi} = \frac{D(y; \mu)}{\sigma^2}$, where the last equality holds since for a Normal distribution the dispersion parameter equals the variance, i.e. $\phi = \sigma ^ 2$. Therefore
$$
D(y, \mu) = (y - \mu)^2
$$
For a good introduction to deviance, see
McCullagh and Nelder (1989). Generalized Linear Models (2nd ed.). Chapman and Hall. pp. 23-25, 33-36