Highest Voted 'high-dimensional' Questions - Statistical Analysis Stack Exchange

328

votes

8 answers

Why is Euclidean distance not a good metric in high dimensions?

I read that 'Euclidean distance is not a good distance in high dimensions'. I guess this statement has something to do with the curse of dimensionality, but what exactly? Besides, what is 'high dimensions'? I have been applying hierarchical…

asked May 18 '14 at 17:50

teaLeef

3,497
3
12
11

103

votes

11 answers

Explain "Curse of dimensionality" to a child

I heard many times about curse of dimensionality, but somehow I'm still unable to grasp the idea, it's all foggy. Can anyone explain this in the most intuitive way, as you would explain it to a child, so that I (and the others confused as I am)…

machine-learning dimensionality-reduction high-dimensional

asked Aug 28 '15 at 09:11

Kobe-Wan Kenobi

2,437
3
20
33

55

votes

7 answers

Best PCA algorithm for huge number of features (>10K)?

I previously asked this on StackOverflow, but it seems like it might be more appropriate here, given that it didn't get any answers on SO. It's kind of at the intersection between statistics and programming. I need to write some code to do PCA…

pca algorithms model-evaluation high-dimensional

asked Sep 18 '10 at 02:08

dsimcha

7,375
7
32
29

37

votes

3 answers

How to estimate shrinkage parameter in Lasso or ridge regression with >50K variables?

I want to use Lasso or ridge regression for a model with more than 50,000 variables. I want do so using software package in R. How can I estimate the shrinkage parameter ($\lambda$)? Edits: Here is the point I got up to: set.seed (123) Y <- runif…

r lasso ridge-regression high-dimensional

asked Apr 16 '12 at 12:02

John

2,088
6
27
37

25

votes

3 answers

Should dimensionality reduction for visualization be considered a "closed" problem, solved by t-SNE?

I've been reading a lot about $t$-sne algorithm for dimensionality reduction. I'm very impressed with the performance on "classic" datasets, like MNIST, where it achieves a clear separation of the digits (see original article): I've also used it to…

clustering data-visualization dimensionality-reduction high-dimensional tsne

asked Mar 28 '17 at 17:45

galoosh33

2,202
13
20

23

votes

5 answers

Functional principal component analysis (FPCA): what is it all about?

Functional principal component analysis (FPCA) is something I have stumbled upon and never got to understand. What is it all about? See "A survey of functional principal component analysis" by Shang, 2011, and I'm citing: PCA runs into serious…

time-series pca dimensionality-reduction high-dimensional functional-data-analysis

asked Feb 17 '12 at 10:36

Dov

1,630
3
14
24

23

votes

1 answer

Should data be centered+scaled before applying t-SNE?

Some of my data's features have large values, while other features have much smaller values. Is it necessary to center+scale data before applying t-SNE to prevent bias towards the larger values? I use Python's sklearn.manifold.TSNE implementation…

normalization dimensionality-reduction high-dimensional tsne

asked Aug 06 '15 at 08:36

stmax

396
1
2
11

22

votes

1 answer

Does Dimensionality curse effect some models more than others?

The places I have been reading about dimensionality curse explain it in conjunction to kNN primarily, and linear models in general. I regularly see top rankers in Kaggle using thousands of features on dataset that hardly has 100k data points. They…

neural-networks svm k-means k-nearest-neighbour high-dimensional

asked Dec 11 '15 at 01:29

Dileep Kumar Patchigolla

701
2
8
17

20

votes

1 answer

Why is LASSO not finding my perfect predictor pair at high dimensionality?

I'm running a small experiment with LASSO regression in R to test if it is able to find a perfect predictor pair. The pair is defined like this: f1 + f2 = outcome The outcome here is a predetermined vector called 'age'. F1 and f2 are created by…

r regression feature-selection lasso high-dimensional

asked Feb 03 '17 at 10:53

Ansjovis86

455
4
15

19

votes

5 answers

Why is Gaussian distribution on high dimensional space like a soap bubble

In this famous post "Gaussian Distributions are Soap Bubbles" it is claimed that the distribution of the points looks like a soap bubble (where it is less dense in the center and more dense at the edge) instead of a bold of mold where it is more…

normal-distribution high-dimensional

asked Jul 27 '19 at 10:19

Code Pope

781
6
17

19

votes

4 answers

Does "curse of dimensionality" really exist in real data?

I understand what is "curse of dimensionality", and I have done some high dimensional optimization problems and know the challenge of the exponential possibilities. However, I doubt if the "curse of dimensionality" exist in most real world data…

clustering dimensionality-reduction high-dimensional

asked Jun 17 '16 at 13:24

Haitao Du

32,885
17
118
213

17

votes

1 answer

High-dimensional regression: why is $\log p/n$ special?

I am trying to read up on the research in the area of high-dimensional regression; when $p$ is larger than $n$, that is, $p >> n$. It seems like the term $\log p/n$ appears often in terms of rate of convergence for regression estimators. For…

regression lasso convergence high-dimensional

asked May 20 '18 at 12:12

Greenparker

14,131
3
36
80

17

votes

3 answers

Curse of dimensionality- does cosine similarity work better and if so, why?

When working with high dimensional data, it is almost useless to compare data points using euclidean distance - this is the curse of dimensionality. However, I have read that using different distance metrics, such as a cosine similarity, performs…

clustering high-dimensional cosine-similarity

asked Apr 19 '18 at 15:26

PyRsquared

1,084
2
9
20

17

votes

2 answers

How do I know my k-means clustering algorithm is suffering from the curse of dimensionality?

I believe that the title of this question says it all.

clustering k-means high-dimensional

asked Aug 30 '16 at 14:24

mathieu

273
1
2
6

15

votes

4 answers

PCA on high-dimensional text data before random forest classification?

Does it make sense to do PCA before carrying out a Random Forest Classification? I'm dealing with high dimensional text data, and I want to do feature reduction to help avoid the curse of dimensionality, but don't Random Forests already to some sort…

classification pca random-forest dimensionality-reduction high-dimensional

asked Jan 10 '13 at 19:30

Maus

253
1
2
5

Questions tagged [high-dimensional]