Pertains to a large number of features or dimensions (variables) for data. (For a large number of data points, use the tag [large-data]; if the issue is a larger number of variables than data, use the [underdetermined] tag.)
Questions tagged [high-dimensional]
334 questions
328
votes
8 answers
Why is Euclidean distance not a good metric in high dimensions?
I read that 'Euclidean distance is not a good distance in high dimensions'. I guess this statement has something to do with the curse of dimensionality, but what exactly? Besides, what is 'high dimensions'? I have been applying hierarchical…

teaLeef
- 3,497
- 3
- 12
- 11
103
votes
11 answers
Explain "Curse of dimensionality" to a child
I heard many times about curse of dimensionality, but somehow I'm still unable to grasp the idea, it's all foggy.
Can anyone explain this in the most intuitive way, as you would explain it to a child, so that I (and the others confused as I am)…

Kobe-Wan Kenobi
- 2,437
- 3
- 20
- 33
55
votes
7 answers
Best PCA algorithm for huge number of features (>10K)?
I previously asked this on StackOverflow, but it seems like it might be more appropriate here, given that it didn't get any answers on SO. It's kind of at the intersection between statistics and programming.
I need to write some code to do PCA…

dsimcha
- 7,375
- 7
- 32
- 29
37
votes
3 answers
How to estimate shrinkage parameter in Lasso or ridge regression with >50K variables?
I want to use Lasso or ridge regression for a model with more than 50,000 variables. I want do so using software package in R. How can I estimate the shrinkage parameter ($\lambda$)?
Edits:
Here is the point I got up to:
set.seed (123)
Y <- runif…

John
- 2,088
- 6
- 27
- 37
25
votes
3 answers
Should dimensionality reduction for visualization be considered a "closed" problem, solved by t-SNE?
I've been reading a lot about $t$-sne algorithm for dimensionality reduction. I'm very impressed with the performance on "classic" datasets, like MNIST, where it achieves a clear separation of the digits (see original article):
I've also used it to…

galoosh33
- 2,202
- 13
- 20
23
votes
5 answers
Functional principal component analysis (FPCA): what is it all about?
Functional principal component analysis (FPCA) is something I have stumbled upon and never got to understand. What is it all about?
See "A survey of functional principal component
analysis" by Shang, 2011, and I'm citing:
PCA runs into serious…

Dov
- 1,630
- 3
- 14
- 24
23
votes
1 answer
Should data be centered+scaled before applying t-SNE?
Some of my data's features have large values, while other features have much smaller values.
Is it necessary to center+scale data before applying t-SNE to prevent bias towards the larger values?
I use Python's sklearn.manifold.TSNE implementation…

stmax
- 396
- 1
- 2
- 11
22
votes
1 answer
Does Dimensionality curse effect some models more than others?
The places I have been reading about dimensionality curse explain it in conjunction to kNN primarily, and linear models in general. I regularly see top rankers in Kaggle using thousands of features on dataset that hardly has 100k data points. They…

Dileep Kumar Patchigolla
- 701
- 2
- 8
- 17
20
votes
1 answer
Why is LASSO not finding my perfect predictor pair at high dimensionality?
I'm running a small experiment with LASSO regression in R to test if it is able to find a perfect predictor pair. The pair is defined like this: f1 + f2 = outcome
The outcome here is a predetermined vector called 'age'. F1 and f2 are created by…

Ansjovis86
- 455
- 4
- 15
19
votes
5 answers
Why is Gaussian distribution on high dimensional space like a soap bubble
In this famous post "Gaussian Distributions are Soap Bubbles" it is claimed that the distribution of the points looks like a soap bubble (where it is less dense in the center and more dense at the edge) instead of a bold of mold where it is more…

Code Pope
- 781
- 6
- 17
19
votes
4 answers
Does "curse of dimensionality" really exist in real data?
I understand what is "curse of dimensionality", and I have done some high dimensional optimization problems and know the challenge of the exponential possibilities.
However, I doubt if the "curse of dimensionality" exist in most real world data…

Haitao Du
- 32,885
- 17
- 118
- 213
17
votes
1 answer
High-dimensional regression: why is $\log p/n$ special?
I am trying to read up on the research in the area of high-dimensional regression; when $p$ is larger than $n$, that is, $p >> n$. It seems like the term $\log p/n$ appears often in terms of rate of convergence for regression estimators.
For…

Greenparker
- 14,131
- 3
- 36
- 80
17
votes
3 answers
Curse of dimensionality- does cosine similarity work better and if so, why?
When working with high dimensional data, it is almost useless to compare data points using euclidean distance - this is the curse of dimensionality.
However, I have read that using different distance metrics, such as a cosine similarity, performs…

PyRsquared
- 1,084
- 2
- 9
- 20
17
votes
2 answers
How do I know my k-means clustering algorithm is suffering from the curse of dimensionality?
I believe that the title of this question says it all.

mathieu
- 273
- 1
- 2
- 6
15
votes
4 answers
PCA on high-dimensional text data before random forest classification?
Does it make sense to do PCA before carrying out a Random Forest Classification?
I'm dealing with high dimensional text data, and I want to do feature reduction to help avoid the curse of dimensionality, but don't Random Forests already to some sort…

Maus
- 253
- 1
- 2
- 5