Let us show the result for the general case of which your formula for the test statistic is a special case. In general, we need to verify that the statistic can be, according to the characterization of the $F$ distribution, be written as the ratio of independent $\chi^2$ r.v.s divided by their degrees of freedom.
Let $H_{0}:R^\prime\beta=r$ with $R$ and $r$ known, nonrandom and $R:k\times q$ has full column rank $q$. This represents $q$ linear restrictions for (unlike in OPs notation) $k$ regressors including the constant term. So, in @user1627466's example, $p-1$ corresponds to the $q=k-1$ restrictions of setting all slope coefficients to zero.
In view of $Var\bigl(\hat{\beta}_{\text{ols}}\bigr)=\sigma^2(X'X)^{-1}$, we have
\begin{eqnarray*}
R^\prime(\hat{\beta}_{\text{ols}}-\beta)\sim N\left(0,\sigma^{2}R^\prime(X^\prime X)^{-1} R\right),
\end{eqnarray*}
so that (with $B^{-1/2}=\{R^\prime(X^\prime X)^{-1} R\}^{-1/2}$ being a "matrix square root" of $B^{-1}=\{R^\prime(X^\prime X)^{-1} R\}^{-1}$, via, e.g., a Cholesky decomposition)
\begin{eqnarray*}
n:=\frac{B^{-1/2}}{\sigma}R^\prime(\hat{\beta}_{\text{ols}}-\beta)\sim N(0,I_{q}),
\end{eqnarray*}
as
\begin{eqnarray*}
Var(n)&=&\frac{B^{-1/2}}{\sigma}R^\prime Var\bigl(\hat{\beta}_{\text{ols}}\bigr)R\frac{B^{-1/2}}{\sigma}\\
&=&\frac{B^{-1/2}}{\sigma}\sigma^2B\frac{B^{-1/2}}{\sigma}=I
\end{eqnarray*}
where the second line uses the variance of the OLSE.
This, as shown in the answer that you link to (see also here), is independent of $$d:=(n-k)\frac{\hat{\sigma}^{2}}{\sigma^{2}}\sim\chi^{2}_{n-k},$$
where $\hat{\sigma}^{2}=y'M_Xy/(n-k)$ is the usual unbiased error variance estimate, with $M_{X}=I-X(X'X)^{-1}X'$ is the "residual maker matrix" from regressing on $X$.
So, as $n'n$ is a quadratic form in normals,
\begin{eqnarray*}
\frac{\overbrace{n^\prime n}^{\sim\chi^{2}_{q}}/q}{d/(n-k)}=\frac{(\hat{\beta}_{\text{ols}}-\beta)^\prime R\left\{R^\prime(X^\prime X)^{-1}R\right\}^{-1}R^\prime(\hat{\beta}_{\text{ols}}-\beta)/q}{\hat{\sigma}^{2}}\sim F_{q,n-k}.
\end{eqnarray*}
In particular, under $H_{0}:R^\prime\beta=r$, this reduces to the statistic
\begin{eqnarray}
F=\frac{(R^\prime\hat{\beta}_{\text{ols}}-r)^\prime\left\{R^\prime(X^\prime X)^{-1}R\right\}^{-1}(R^\prime\hat{\beta}_{\text{ols}}-r)/q}{\hat{\sigma}^{2}}\sim F_{q,n-k}.
\end{eqnarray}
For illustration, consider the special case $R^\prime=I$, $r=0$, $q=2$, $\hat{\sigma}^{2}=1$ and $X^\prime X=I$. Then,
\begin{eqnarray}
F=\hat{\beta}_{\text{ols}}^\prime\hat{\beta}_{\text{ols}}/2=\frac{\hat{\beta}_{\text{ols},1}^2+\hat{\beta}_{\text{ols},2}^2}{2},
\end{eqnarray}
the squared Euclidean distance of the OLS estimate from the origin standardized by the number of elements - highlighting that, since $\hat{\beta}_{\text{ols},2}^2$ are squared standard normals and hence $\chi^2_1$, the $F$ distribution may be seen as an "average $\chi^2$ distribution.
In case you prefer a little simulation (which is of course not a proof!), in which the null is tested that none of the $k$ regressors matter - which they indeed do not, so that we simulate the null distribution.

We see very good agreement between the theoretical density and the histogram of the Monte Carlo test statistics.
library(lmtest)
n <- 100
reps <- 20000
sloperegs <- 5 # number of slope regressors, q or k-1 (minus the constant) in the above notation
critical.value <- qf(p = .95, df1 = sloperegs, df2 = n-sloperegs-1)
# for the null that none of the slope regrssors matter
Fstat <- rep(NA,reps)
for (i in 1:reps){
y <- rnorm(n)
X <- matrix(rnorm(n*sloperegs), ncol=sloperegs)
reg <- lm(y~X)
Fstat[i] <- waldtest(reg, test="F")$F[2]
}
mean(Fstat>critical.value) # very close to 0.05
hist(Fstat, breaks = 60, col="lightblue", freq = F, xlim=c(0,4))
x <- seq(0,6,by=.1)
lines(x, df(x, df1 = sloperegs, df2 = n-sloperegs-1), lwd=2, col="purple")
To see that the versions of the test statistics in the question and the answer are indeed equivalent, note that the null corresponds to the restrictions $R'=[0\;\;I]$ and $r=0$.
Let $X=[X_1\;\;X_2]$ be partitioned according to which coefficients are restricted to be zero under the null (in your case, all but the constant, but the derivation to follow is general). Also, let $\hat{\beta}_{\text{ols}}=(\hat{\beta}_{\text{ols},1}^\prime,\hat{\beta}_{\text{ols},2}')'$ be the suitably partitioned OLS estimate.
Then,
$$
R'\hat{\beta}_{\text{ols}}=\hat{\beta}_{\text{ols},2}
$$
and
$$
R^\prime(X^\prime X)^{-1}R\equiv\tilde D,
$$
the lower right block of
\begin{align*}
(X^TX)^{-1}&=\left( \begin{array} {c,c} X_1'X_1&X_1'X_2 \\ X_2'X_1&X_2'X_2\end{array} \right)^{-1}\\&\equiv\left( \begin{array} {c,c} \tilde A&\tilde B \\ \tilde C&\tilde D\end{array} \right)
\end{align*}
Now, use results for partitioned inverses to obtain
$$
\tilde D=(X_2'X_2-X_2'X_1(X_1'X_1)^{-1}X_1'X_2)^{-1}=(X_2'M_{X_1}X_2)^{-1}
$$
where $M_{X_1}=I-X_1(X_1'X_1)^{-1}X_1'$.
Thus, the numerator of the $F$ statistic becomes (without the division by $q$)
$$
F_{num}=\hat{\beta}_{\text{ols},2}'(X_2'M_{X_1}X_2)\hat{\beta}_{\text{ols},2}
$$
Next, recall that by the Frisch-Waugh-Lovell theorem we may write
$$
\hat{\beta}_{\text{ols},2}=(X_2'M_{X_1}X_2)^{-1}X_2'M_{X_1}y
$$
so that
\begin{align*}
F_{num}&=y'M_{X_1}X_2(X_2'M_{X_1}X_2)^{-1}(X_2'M_{X_1}X_2)(X_2'M_{X_1}X_2)^{-1}X_2'M_{X_1}y\\
&=y'M_{X_1}X_2(X_2'M_{X_1}X_2)^{-1}X_2'M_{X_1}y
\end{align*}
It remains to show that this numerator is identical to $\text{RSSR}-\text{USSR}$, the difference in restricted and unrestricted sum of squared residuals.
Here,
$$\text{RSSR}=y'M_{X_1}y$$
is the residual sum of squares from regressing $y$ on $X_1$, i.e., with $H_0$ imposed. In your special case, this is just $TSS=\sum_i(y_i-\bar y)^2$, the residuals of a regression on a constant.
Again using FWL (which also shows that the residuals of the two approaches are identical), we can write $\text{USSR}$ (SSR in your notation) as the SSR of the regression
$$
M_{X_1}y\quad\text{on}\quad M_{X_1}X_2
$$
That is,
\begin{eqnarray*}
\text{USSR}&=&y'M_{X_1}'M_{M_{X_1}X_2}M_{X_1}y\\
&=&y'M_{X_1}'(I-P_{M_{X_1}X_2})M_{X_1}y\\
&=&y'M_{X_1}y-y'M_{X_1}M_{X_1}X_2((M_{X_1}X_2)'M_{X_1}X_2)^{-1}(M_{X_1}X_2)'M_{X_1}y\\
&=&y'M_{X_1}y-y'M_{X_1}X_2(X_2'M_{X_1}X_2)^{-1}X_2'M_{X_1}y
\end{eqnarray*}
Thus,
\begin{eqnarray*}
\text{RSSR}-\text{USSR}&=&y'M_{X_1}y-(y'M_{X_1}y-y'M_{X_1}X_2(X_2'M_{X_1}X_2)^{-1}X_2'M_{X_1}y)\\
&=&y'M_{X_1}X_2(X_2'M_{X_1}X_2)^{-1}X_2'M_{X_1}y
\end{eqnarray*}