Geometric intuitive view
You can view the observation $y_1,y_2,y_3, ... , y_n$ as being partitioned into two parts. These two parts are components in two ortogonal subspaces.
One part is the fitted point $\hat y_1, \hat y_2,\hat y_3, ... , \hat y_n$. The fitted point is in the plane that is spanned by the vectors $x$.
The other part is the residuals $\epsilon_1, \epsilon_2, \epsilon_3, ..., \epsilon_n$. The residual is in the complement space.
In this view of two subspaces, you can see the distribution of $y_1, ... , y_n$ as a speherical symmetric n-dimensional multivariate normal distribution, which can be split up into $d$ independent normal vectors in the plane of the fit, and $n-d$ independent normal vectors in the plane of the residual space. These two parts are independent from each other and the squared distance of this vector, also the RSS, is a chi-squared variable.

The above explains the partition of the model and the residual. You can extend this to multiple models when these are nested. For instance, in the above image
- the plane is the space for the model as a sum of two vectors $y_{fit 2} = a x_1 + b x_2$
- and the black line inside that space is the model $y_{fit 1} = a x_1$.
If $y_{fit 1}$ is inside the space of $y_{fit 2}$ then you could find the fit $y_{fit 1}$ by first fitting $y_{fit 2}$ and then treat that fit as the observation to find $y_{fit 1}$ instead.
The difference between $y_{fit 1}$ and $y_{fit 2}$ can be seen as an additional residual. The difference in the RSS from the two models is the residual from fitting $y_{fit 1}$ starting with $y_{fit 2}$ and this is multivariate normal distributed.
This view of a multivariate normal distribution that can be split up into separate subparts, which are lower dimensional multivariate normal distributions, holds when the true model, the true population mean, is actually inside the space of the model.
The statistic is only F-distributed when the null hypothesis is true.