Highest Voted 'sparse' Questions - Statistical Analysis Stack Exchange

90

votes

7 answers

Euclidean distance is usually not good for sparse data (and more general case)?

I have seen somewhere that classical distances (like Euclidean distance) become weakly discriminant when we have multidimensional and sparse data. Why? Do you have an example of two sparse data vectors where the Euclidean distance does not perform…

asked Jun 01 '12 at 13:55

shn

2,479
9
31
38

29

votes

3 answers

How exactly is sparse PCA better than PCA?

I learnt about PCA a few lectures ago in class and by digging more about this fascinating concept, I got to know about sparse PCA. I wanted to ask, if I'm not wrong this is what sparse PCA is: In PCA, if you have $n$ data points with $p$ variables,…

machine-learning pca sparse

asked Dec 10 '13 at 05:34

GrowinMan

831
2
8
8

26

votes

1 answer

Difference between missing data and sparse data in machine learning algorithms

What are main differences between sparse data and missing data? And how does it influences machine learning? More specifically, what effect sparse data and missing data have on classification algorithms and regression (predicting numbers) type of…

machine-learning dataset missing-data sparse

asked Mar 14 '17 at 06:45

tired and bored dev

855
2
9
17

24

votes

4 answers

Is there a Random Forest implementation that works well with very sparse data?

Is there an R random forest implementation that works well with very sparse data? I have thousands or millions of boolean input variables, but only hundreds or so will be TRUE for any given example. I'm relatively new to R and noticed that there is…

r random-forest sparse

asked May 20 '12 at 18:29

Eryn

181
1
1
4

21

votes

1 answer

Clustering algorithms that operate on sparse data matricies

I'm trying to compile a list of clustering algorithms that are: Implemented in R Operate on sparse data matrices (not (dis)similarity matrices), such as those created by the sparseMatrix function. There are several other questions on CV that…

r clustering sparse

asked Jan 06 '14 at 16:02

Zach

22,308
18
114
158

15

votes

2 answers

difference between convex and concave functions

what is the difference between convex, non-convex, concave and non-concave functions? how will we come to know that the given function is convex or non-convex? and if a function is non-convex then it will necessarily be concave one? Thanks in…

machine-learning optimization sparse

asked Jan 23 '18 at 03:59

Honey

301
1
2
7

13

votes

4 answers

$L_p$ Norms - What is special about $p=2$?

An $L_1$ norm is unique (at least partly) because $p=1$ is at the boundary between non-convex and convex. An $L_1$ norm is the 'most sparse' convex norm (right?). I understand that the $p=2$ Euclidean norm has roots in geometry and it has a clear…

regression regularization sparse

asked Mar 17 '17 at 03:40

Trenton

101
4

12

votes

1 answer

What are $\ell_p$ norms and how are they relevant to regularization?

I have been seeing a lot of papers on sparse representations lately, and most of them use the $\ell_p$ norm and do some minimization. My question is, what is the $\ell_p$ norm, and the $\ell_{p, q}$ mixed norm? And how are they relevant to…

machine-learning regularization sparse

asked Jul 12 '12 at 21:18

water

123
3

12

votes

1 answer

Does a sparse training set adversely affect an SVM?

I'm trying to classify messages into different categories using an SVM. I've compiled a list of desirable words/symbols from the training set. For each vector, which represents a message, I set the corresponding row to 1 if the word is…

classification svm sparse

asked Feb 09 '12 at 20:46

jonsca

1,790
3
20
30

12

votes

1 answer

Is large scale PCA even possible?

Principal component analysis' (PCA) classical way is to do it on an input data matrix which columns have zero mean (then PCA can "maximize variance"). This can be achieved easily by centering the columns. Howenver, when the input matrix is sparse,…

pca algorithms dimensionality-reduction large-data sparse

asked Jul 31 '15 at 15:00

Roy

719
6
14

11

votes

4 answers

Sparsity-inducing regularization for stochastic matrices

It is well-known (e.g. in the field of compressive sensing) that the $L_1$ norm is "sparsity-inducing," in the sense that if we minimize the functional (for fixed matrix $A$ and vector $\vec{b}$)…

regression matrix normalization regularization sparse

asked Aug 23 '12 at 19:24

Justin Solomon

749
3
12

9

votes

2 answers

Selecting the number of sparse principal components to include in regression

Does anyone have experience with approaches for selecting the number of sparse principal components to include in a regression model?

pca sparse regression-strategies

asked Mar 05 '14 at 18:56

Frank Harrell

74,029
5
148
322

9

votes

5 answers

Cosine similarity on sparse matrix

I'm trying to implement item based filtering, with a large feature space representing consumers who bought (1) or did not buy (0) a particular product. I have a long tail distribution, so the matrix is quite sparse. R is not handling it well. What…

clustering sparse

asked Jun 06 '13 at 19:13

Olga Mu

705
1
5
12

9

votes

2 answers

Can it be over fitting when validation loss and validation accuracy is both increasing?

Training a simple neural network over a very sparse matrix (Has 2400 features and 18000 train rows) for a binary classification problem. At the end of 1st epoch validation loss started to increase, whereas validation accuracy is also increasing. Can…

classification neural-networks overfitting sparse

asked Sep 24 '18 at 12:21

betelgeuse

93
1
5

9

votes

1 answer

User segmentation by clustering with sparse data

Imagine that I have 100k users and 1k categories. For each user, up to 5 categories, I know how much money they have spent. Obviously my data is very sparse. Now I want to group users by the money they spend on different categories. This way, I…

clustering k-means sparse

asked Mar 02 '16 at 10:10

bfaskiplar

562
2
4
14

Questions tagged [sparse]