In Elements of Statistical Learning, pg 47, at the very bottom, it states that $\hat{\beta}$ and $\hat{\sigma}^2$ are statistically independent.
Is this saying that they are independent when conditioned on $X$, or is it saying they're marginally independent? If it is saying they're marginally independent, which seems to be the case in the book, I'm confused because it would seem they're not marginally independent.
We know that $$ \hat{\beta} = (X^TX)^{-1}X^Ty$$
$$\hat{\sigma}^2 = \frac{1}{N - p - 1} ||y - \hat{y} ||_2^2 = \frac{1}{N - p - 1} ||y - X\hat{\beta} ||_2^2 $$
By inspection, $\hat{\sigma}^2$ is a function of $\hat{\beta}$, so it's not obvious how the two are marginally independent.