5

Usually a model is considered to be high-dimensional when $n \ll p$, where $n$ is the number of the observations and $p$ the number of the variables/features (e.g. Bühlmann and van de Geer, 2011). However, you can find other definitions in the literature, as in Belloni et al. (2018) Ref. They write:

High-dimensional models are characterized by having a number of unknown parameters that is not vanishingly small relative to the sample size.

So when is a model high-dimensional? Is there a rule of thumb or some criterion, which defines that a model is high-dimensional different than $n \ll p$? I know that there is similar question on StackExchange (Ref), but this post is relatively old and the field of high-dimensional statistics/ machine learning developed in the last ten years.

tdy
  • 313
  • 7
timm
  • 317
  • 1
  • 9
  • 4
    "High-dimensional" is merely a qualitative characterization, whose meaning will depend on the context, the data, the contemplated models, and even the date (because it will depend on computing capabilities: in 1950, five dimensions was extremely high-dimensional). Thus, don't expect any specific or general answer. BTW, if you're looking for a "criterium," attend a bicycle race! – whuber Jan 11 '22 at 20:34
  • 2
    See also the Sorites paradox. – Arya McCarthy Jan 12 '22 at 02:01

1 Answers1

1

From computational point of view, the criterion for high-dimensionality of a model probably tied to curse of dimensionality. Obviously, this tied to computational resources as well (as mentioned in the comment by @whuber). The dimension $p$ where by "conventional resources" provides slow solution or becomes infeasible would be called "high dimensional model".

The case $n<<p$ would be called overparametrized model (see) rather than high-dimensional. In linear algebra, it would be called Underdetermined system. Recall LASSO.

From specific domain point of view: In physics/statistics, anything $p>3$ is high-dimensional for classical mechanics, where by in statistical mechanics $p$ could be 1000s for simulations to be called high-dimensional. See degrees of freedom. In biology/genetics "high-dimensionality" may refer to simultaneous study of different factors, see The use of high-dimensional biology.

msuzen
  • 1,709
  • 6
  • 27
  • 1
    Re physics: In classical mechanics, the dimension is the number of conjugate coordinates. A system with "moderate" dimensions might consist of a few particles to a few hundred and each particle can have $3+3=6$ position+momentum coordinates. Re computation: although your characterization is interesting and potentially useful, it seems not to distinguish "dimension" from "model complexity" or "algorithmic complexity." A model can have very few dimensions, yet be computationally difficult to fit. – whuber Jan 12 '22 at 01:28
  • @whuber Agreed. A model indeed can be low dimensional with high algorithmic complexity. – msuzen Jan 12 '22 at 06:50