Difference between regression and approximation tasks

Question

Do you know any references (books/articles) which give a systematic description of the relation between approximation and regression tasks?

I know that, although these terms originate from different worlds - functional theory and statistics, they are commonly used interchangeably but I really am interested in precise definitions and differences.

Regression is a way to approximate given data. Hence, approximation is more general concept because there are many different ways to approximate data or functions. Other approximation methods are, e. g., polynomial approximation using, e. g., the Lagrange or Newton polynomial. — random_guy, Feb 20 '14 at 15:40
This post may be helpful as a teaser: http://stats.stackexchange.com/questions/70539/is-there-a-regression-method-for-fitting-general-n-order-polynomials-of-two-or-m/70647#70647 . Also this paper discusses and contrasts the two : www.phil.vt.edu/dmayo/personal_website/SPanos%202010.pdf — Alecos Papadopoulos, Feb 20 '14 at 16:11
@AlecosPapadopoulos Nice article - really shows how complicated and misleading "choosing the best model" can be — dratewka, Feb 21 '14 at 11:51
This (Question and answers) are relevant: http://stats.stackexchange.com/questions/119866/function-approximation-vs-regression?rq=1 — kjetil b halvorsen, Nov 04 '14 at 06:47

Michael · Answer 1 · 2014-03-03T09:20:05.720

Not sure if this is what you're asking but one nice example where functional analysis (some might call it approximation theory, but I am not sure if approximation theorists consider basic functional analysis approximation theory proper) and statistical regression interact is non-parametric curve estimation.

Consider a fixed design homoskedastic model

$$ y_t = f(x_t) + \sigma \epsilon_t $$

with $x_t$'s evenly spaced on $[0,1]$ and independent mean-zero errors.

In a purely mathematical context, there are two classical approaches to approximating a function $f$.

Assume $f$ lies in, for example, the Hilbert space $L^2[0,1]$. Then $f$ can be written as a Fourier series with respect to an orthonormal basis $\phi_j$, in the $L^2$-topology

$$ f = \sum \theta_j \phi_j. $$

Consider a kernel $K_h(x) = \frac{1}{h} K(\frac{x}{h})$. Fourier analysis says that, if $f$ is sufficiently smooth, the convolution $f * K_h \rightarrow f$ uniformly as $h \rightarrow 0$.

These suggest statistical estimators $\hat{f}$. One then needs to show the nice approximation properties survive to the noisy environment.

The series method is derived from the Hilbert space approach. The nonparametric problem is reduced to a (semi-)parametric one of estimating the Fourier coefficients $\{ \theta_j \}$.
The kernel method convolves the kernel $K_h$ with the empirical $\sum y_t \cdot \delta_{x_t}$ ($\delta_x$ is delta function at $x$) rather than the actual $f$.

For the series method, you need to choosing a cut-off $J$. Higher $J$ means larger variance and smaller squared bias, i.e. "smoother" $\hat{f}$. This corresponds to smaller window size $h$ in the kernel method.

A more elementary example would be OLS, where you're approximating the data in a finite dimensional Hilbert space. If you inspect the proof for, say, Gauss-Markov, you'll see an immediate link between the Hilbert space structure and the model assumptions.

Difference between regression and approximation tasks

1 Answers1