Not sure if this is what you're asking but one nice example where functional analysis (some might call it approximation theory, but I am not sure if approximation theorists consider basic functional analysis approximation theory proper) and statistical regression interact is non-parametric curve estimation.
Consider a fixed design homoskedastic model
$$
y_t = f(x_t) + \sigma \epsilon_t
$$
with $x_t$'s evenly spaced on $[0,1]$ and independent mean-zero errors.
In a purely mathematical context, there are two classical approaches to approximating a function $f$.
- Assume $f$ lies in, for example, the Hilbert space $L^2[0,1]$. Then $f$ can be written as a Fourier series with respect to an orthonormal basis $\phi_j$, in the $L^2$-topology
$$
f = \sum \theta_j \phi_j.
$$
- Consider a kernel $K_h(x) = \frac{1}{h} K(\frac{x}{h})$. Fourier analysis says that, if $f$ is sufficiently smooth, the convolution $f * K_h \rightarrow f$ uniformly as $h \rightarrow 0$.
These suggest statistical estimators $\hat{f}$. One then needs to show the nice approximation properties survive to the noisy environment.
The series method is derived from the Hilbert space approach. The nonparametric problem is reduced to a (semi-)parametric one of estimating the Fourier coefficients $\{ \theta_j \}$.
The kernel method convolves the kernel $K_h$ with the empirical $\sum y_t \cdot \delta_{x_t}$ ($\delta_x$ is delta function at $x$) rather than the actual $f$.
For the series method, you need to choosing a cut-off $J$. Higher $J$ means larger variance and smaller squared bias, i.e. "smoother" $\hat{f}$. This corresponds to smaller window size $h$ in the kernel method.
A more elementary example would be OLS, where you're approximating the data in a finite dimensional Hilbert space. If you inspect the proof for, say, Gauss-Markov, you'll see an immediate link between the Hilbert space structure and the model assumptions.