I would encourage OP to conceptually separate the mathematical and statistical principles of ANOVA.
Mathematical Principles of ANOVA
Consider variable
$Y_k, \; k = 1, \ldots, N,$ with sample variance $s^2 = \sum_{k = 1}^N (Y_k - \bar{Y}_{\centerdot})^2.$ Now consider a grouping index $i = 1, \ldots, I$ with no particular meaning that divides $1, \ldots, N$ into equal (for convenience) groups of size $n$. We can then rewrite the variance as
$$s^2 = \sum_{i = 1}^I \sum_{j = 1}^n (Y_{ij} - \bar{Y}_{\centerdot \centerdot})^2/(N - 1).$$
This is the exact same quantity with a different indexing scheme. The following two operations of subtracting and adding the group means, and expanding the square (needs demonstration that the cross-product goes to zero), is entirely algebraic:
\begin{align*}
(N - 1)s^2 &= \sum_{i = 1}^I \sum_{j = 1}^n (Y_{ij} - \bar{Y}_{i \centerdot} + \bar{Y}_{i \centerdot} - \bar{Y}_{\centerdot \centerdot})^2 \\
&= n\sum_{i = 1}^I (\bar{Y}_{i \centerdot} - \bar{Y}_{\centerdot \centerdot})^2 + \sum_{i = 1}^I \sum_{j = 1}^n (Y_{ij} - \bar{Y}_{i \centerdot})^2.
\end{align*}
The math doesn't care about the interpretation of these terms, and the decomposition always works (at least for the one-way layout).
Statistical Principles of ANOVA
So far in this example, not a single distributional statement was made about $Y_k$ or the re-indexed $Y_{ij}$, and that's because the mathematical decomposition didn't need any. A statistical device, the null hypothesis that there are no group effects, along with the assumption of normality, leads to $Y_{ij} \sim N(\mu, \sigma^2)$ for all $i, j$.
I won't go through every step along the way to the F-statistic, but note that the sample variance of $\bar{Y}_{i \centerdot}$ is $\sum_{i = 1}^I (\bar{Y}_{i \centerdot} - \bar{Y}_{\centerdot \centerdot})^2 / (I - 1)$, and, when scaled by the appropriate constant, has a $\chi_{I-1}^2$ distribution. You can probably "see" this quantity in the decomposition above, and well as hints of the F-statistic if you divide the first term by the second.
In Casella and Berger's text Statistical Inference, both $t$ and $F$ distributions are introduced under a section "The Derived Distributions." As far as I know, the F-distribution was derived ad hoc (from a scaled ratio of $\chi^2$ random variables) for the purposes of testing ANOVA null hypotheses.