As you noticed, model like $y_i = f(x_i) + \epsilon$ assumes that the noise is independent and identically distributed, so the variance is the same for all observations. If this isn't the case, you need a model that accounts for that. How can you estimate the functional form of variance? The answer depends on what kind of model you choose for it. $f(x_i)$ can have an unlimited number of forms and way of estimating it, the same applies to $\sigma^2_0(x_i)$.
I guess that you don't have any idea of the functional form of the variance that could be appropriate, otherwise, you wouldn't ask the question, at least not the broad one. If you don't, you probably want a flexible nonparametric model. A popular one that does exactly this is Gaussian process, where you estimate both the mean function $m(\mathbf{x})$ and covariance function ${k(\mathbf{x}, \mathbf{x}')}$ to model the distribution over functions as Gaussian process
$$
f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), \;k(\mathbf{x}, \mathbf{x}'))
$$
If your model was polynomial regression, than the model assumed constant variance. If the variance is non-constant, you need a different model accounting for that. As about calculating it from residuals, for pointwise variance you would have only single point to calculate each variance, so it cannot be calculated from the residuals.