Why curse of dimensionality affects more non-parametric approaches?

Question

I just want to know the reason why the curse of dimensionality mostly affects the non-parametric approaches compared with parametric ones.

@Dave from the book An Introduction to Statistical Learning with Applications in R, I quote "When the number of features p is large, there tends to be a deterioration in the performance of KNN and other local approaches that perform prediction using only observations that are near the test observation for which a prediction must be made. This phenomenon is known as the curse of dimensionality, and it ties into the fact that nonparametric approaches often perform poorly when p is large" — olad uhg, Oct 19 '21 at 00:32
Could you please edit that (useful!) information into the original question? — Dave, Oct 19 '21 at 00:37
The problem in this case is the locality. Curse of dimensionally makes locality infinite in size. All observations become equally for away from any point because the universe collapses into a point — Aksakal, Oct 19 '21 at 02:37
Here's an intuitive explanation: A parametric method has a constrained solution set. So adding new dimensions increases the amount of solutions much less than for non-parametric methods, which in turn makes them less prone to the curse of dimensionality. — drmaettu, Oct 19 '21 at 07:47
Rather than _nonparametric_ it would be slightly better to refer to constrained vs. unconstrained models. There are parametric methods that suffer the curse of dimensionality, e.g., a regression model with all possible two-way interactions. — Frank Harrell, Oct 19 '21 at 12:34

lmaosome · Answer 1 · 2021-10-21T07:33:53.323

I like the intuitive answer given by @drmaettu and want to give another, in my opinion, heuristic, and intuitive answer. Suppose that $X_1, X_2, \dots, X_n$ forms an independent and identically distributed sample from some $d$-variate probability distribution with unknown density $f$. The objective is to estimate $f$ based on the observed sample.

If we just assume that $f$ is, say, twice continuously differentiable, all information about the shape, curvature, etc. of the function has to be deduced from the sample. A good estimate thus requires the space to be adequately populated as we know essentially nothing about the function. However, the number of required data points to adequately populate a space increases exponentially with the number of dimensions.¹

On the other hand, if we assume that $f(x) = (2\pi)^{-\frac{d}{2}}\exp\left(-\frac 12\Vert x-\theta\Vert_2^2\right)$, where $\theta\in\mathbb R^d$ is unknown, then we effectively know how the function looks like at every point $x\in\mathbb R^d$ with regard to its shape, curvature, etc. That is, much less information has to be deduced from the sample.

Consequently, the optimal rate of convergence of nonparametric estimators depends on $d$, while the rate of optimal rate of convergence for parametric estimators is $\mathcal O(n^{-\frac12})$.

¹ Consider, for instance, the unit interval $[0,1]$. Let's assume why we fix a grid of 100 evenly spaced points in $[0,1]$. To ensure a maximum distance between two adjacent points in the unit square $[0,1]^2$ would require a grid with 100² evenly spaced points. In the unit cube $[0,1]^3$ a grid with 100³ evenly spaced points would be necessary to ensure a maximum distance of 0.01 between two adjacent points.

Why curse of dimensionality affects more non-parametric approaches?

1 Answers1