In the classical linear model with $$Y=X\beta +\epsilon,$$ where $Y \in \mathbb{R}^n$ is the observation, $X\in \mathbb{R}^{n\times p}$ is the known covariates, $\beta \in \mathbb{R}^p$ is the unknown parameter with, $p < n$, and $\epsilon \in \mathbb{R}^{n}$, $\epsilon \sim \mathcal{N}(0,\sigma^{2}I)$.
The classical least squares estimator here would be $$\hat{\beta}= (X^TX)^{-1}X^TY.$$
If $\beta^{0}$ is the true parameter, then we have that the prediction error is given by $E=||X(\hat{\beta}- \beta^{0})||_2^2$. We have that, $$E=||X(\hat{\beta}- \beta^{0})||_2^2/\sigma^2=||X(X^TX)^{-1}X^T \epsilon||_2^2/\sigma^2=||\gamma||_2^2/\sigma^2.$$
It is easy to see that $\gamma \sim \mathcal{N}(0,\sigma^2 X(X^TX)^{-1}X^T)$. However it is claimed in High Dimensional Statistics
by Bühlmann and van de Geer
(on page 101) that $E$ is distributed according to a chi-square distribution with $p$ degrees of freedom. I can not see how this is true (It would be true if $\gamma \sim \mathcal{N}(0,D)$ for some diagonal matrix $D$ with non-negative diagonal entries.) Am I missing something here?