Given a sample from a population, and a model fit to that sample $y=f(x)+\epsilon$, what is the sampling distribution of $R^2$? (i.e.: scaled brier score). I could compute its variance easily enough, but $R^2$ can't go above 1, and a confidence interval based just on its mean and variance might go above 1. I'm looking for an approach that's applicable to different $\epsilon$ (i.e.: binomial, gamma, etc.) and different $f$'s (i.e.: arbitrary algorithms). I've got a use-case where bootstrapping is costly.
For clarity, the quantity that I'm calling $R^2$ (or scaled brier score) is defined as $$ 1-\frac{(y-\hat{f}(x))^2}{(y-\bar{y})^2} $$
where $\hat{f}$ is the fitted model and $\bar{y}$ is the mean of $y$. Assume for simplicity that $y$ is not categorical, though I'd be interested if this could generalize to multinomial logits.
Related question here discusses the distribution of $R^2$ under the null hypothesis that coefs of a regression model are zero.