Questions tagged [sparse]

A sparse matrix is a matrix where many of the elements are zeros. The tag can also be used for sparsity in other contexts, such as regression models with sparsity, or the "bet on sparsity"-principle.

A sparse matrix is a matrix where many of the elements are zeros.

273 questions
90
votes
7 answers

Euclidean distance is usually not good for sparse data (and more general case)?

I have seen somewhere that classical distances (like Euclidean distance) become weakly discriminant when we have multidimensional and sparse data. Why? Do you have an example of two sparse data vectors where the Euclidean distance does not perform…
shn
  • 2,479
  • 9
  • 31
  • 38
29
votes
3 answers

How exactly is sparse PCA better than PCA?

I learnt about PCA a few lectures ago in class and by digging more about this fascinating concept, I got to know about sparse PCA. I wanted to ask, if I'm not wrong this is what sparse PCA is: In PCA, if you have $n$ data points with $p$ variables,…
GrowinMan
  • 831
  • 2
  • 8
  • 8
26
votes
1 answer

Difference between missing data and sparse data in machine learning algorithms

What are main differences between sparse data and missing data? And how does it influences machine learning? More specifically, what effect sparse data and missing data have on classification algorithms and regression (predicting numbers) type of…
tired and bored dev
  • 855
  • 2
  • 9
  • 17
24
votes
4 answers

Is there a Random Forest implementation that works well with very sparse data?

Is there an R random forest implementation that works well with very sparse data? I have thousands or millions of boolean input variables, but only hundreds or so will be TRUE for any given example. I'm relatively new to R and noticed that there is…
Eryn
  • 181
  • 1
  • 1
  • 4
21
votes
1 answer

Clustering algorithms that operate on sparse data matricies

I'm trying to compile a list of clustering algorithms that are: Implemented in R Operate on sparse data matrices (not (dis)similarity matrices), such as those created by the sparseMatrix function. There are several other questions on CV that…
Zach
  • 22,308
  • 18
  • 114
  • 158
15
votes
2 answers

difference between convex and concave functions

what is the difference between convex, non-convex, concave and non-concave functions? how will we come to know that the given function is convex or non-convex? and if a function is non-convex then it will necessarily be concave one? Thanks in…
Honey
  • 301
  • 1
  • 2
  • 7
13
votes
4 answers

$L_p$ Norms - What is special about $p=2$?

An $L_1$ norm is unique (at least partly) because $p=1$ is at the boundary between non-convex and convex. An $L_1$ norm is the 'most sparse' convex norm (right?). I understand that the $p=2$ Euclidean norm has roots in geometry and it has a clear…
Trenton
  • 101
  • 4
12
votes
1 answer

What are $\ell_p$ norms and how are they relevant to regularization?

I have been seeing a lot of papers on sparse representations lately, and most of them use the $\ell_p$ norm and do some minimization. My question is, what is the $\ell_p$ norm, and the $\ell_{p, q}$ mixed norm? And how are they relevant to…
water
  • 123
  • 3
12
votes
1 answer

Does a sparse training set adversely affect an SVM?

I'm trying to classify messages into different categories using an SVM. I've compiled a list of desirable words/symbols from the training set. For each vector, which represents a message, I set the corresponding row to 1 if the word is…
jonsca
  • 1,790
  • 3
  • 20
  • 30
12
votes
1 answer

Is large scale PCA even possible?

Principal component analysis' (PCA) classical way is to do it on an input data matrix which columns have zero mean (then PCA can "maximize variance"). This can be achieved easily by centering the columns. Howenver, when the input matrix is sparse,…
Roy
  • 719
  • 6
  • 14
11
votes
4 answers

Sparsity-inducing regularization for stochastic matrices

It is well-known (e.g. in the field of compressive sensing) that the $L_1$ norm is "sparsity-inducing," in the sense that if we minimize the functional (for fixed matrix $A$ and vector $\vec{b}$)…
9
votes
2 answers

Selecting the number of sparse principal components to include in regression

Does anyone have experience with approaches for selecting the number of sparse principal components to include in a regression model?
Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
9
votes
5 answers

Cosine similarity on sparse matrix

I'm trying to implement item based filtering, with a large feature space representing consumers who bought (1) or did not buy (0) a particular product. I have a long tail distribution, so the matrix is quite sparse. R is not handling it well. What…
Olga Mu
  • 705
  • 1
  • 5
  • 12
9
votes
2 answers

Can it be over fitting when validation loss and validation accuracy is both increasing?

Training a simple neural network over a very sparse matrix (Has 2400 features and 18000 train rows) for a binary classification problem. At the end of 1st epoch validation loss started to increase, whereas validation accuracy is also increasing. Can…
betelgeuse
  • 93
  • 1
  • 5
9
votes
1 answer

User segmentation by clustering with sparse data

Imagine that I have 100k users and 1k categories. For each user, up to 5 categories, I know how much money they have spent. Obviously my data is very sparse. Now I want to group users by the money they spend on different categories. This way, I…
bfaskiplar
  • 562
  • 2
  • 4
  • 14
1
2 3
18 19