10

I don't understand why reduction in dimension is important. What is the benefit of taking some data and reducing their dimension?

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 3
    The tone of the question does not invite constructive answers. Please consider rewording your question. – Sasha Dec 09 '11 at 19:03
  • 2
    The point may be to reduce the volume of data needed to store certain information as an expense of slight loss of accuracy (e.g. JPEG image compression). – Sasha Dec 09 '11 at 19:04
  • 2
    Thank you for your comments, @Sasha. It's a reasonable question, so I made a minor edit to avoid the impression of bluntness (surely unintended) conveyed by the original wording. – whuber Jan 24 '12 at 21:10
  • See https://stats.stackexchange.com/questions/177102/what-is-the-intuition-behind-svd/179042#179042 for an example! – kjetil b halvorsen Dec 21 '17 at 01:09
  • You do SVD for topic modelling that is NOT probabilistic. For topic modelling that is probabilistic use LDA. If you are NOT doing topic modelling then use PCA. – Brad May 18 '18 at 12:09

3 Answers3

18

Singular value decomposition (SVD) is not the same as reducing the dimensionality of the data. It is a method of decomposing a matrix into other matrices that has lots of wonderful properties which I won't go into here. For more on SVD, see the Wikipedia page.

Reducing the dimensionality of your data is sometimes very useful. It may be that you have a lot more variables than observations; this is not uncommon in genomic work. It may be that we have several variables that are very highly correlated, e.g., when they are heavily influenced by a small number of underlying factors, and we wish to recover some approximation to the underlying factors. Dimensionality-reducing techniques such as principal component analysis, multidimensional scaling, and canonical variate analysis give us insights into the relationships between observations and/or variables that we might not be able to get any other way.

A concrete example: some years ago I was analyzing an employee satisfaction survey that had over 100 questions on it. Well, no manager is ever going to be able to look at 100+ questions worth of answers, even summarized, and do more than guess at what it all means, because who can tell how the answers are related and what is driving them, really? I performed a factor analysis on the data, for which I had over 10,000 observations, and came up with five very clear and readily interpretable factors which could be used to develop manager-specific scores (one for each factor) that would summarize the entirety of the 100+ question survey. A much better solution than the Excel spreadsheet dump that had been the prior method of reporting results!

jbowman
  • 31,550
  • 8
  • 54
  • 107
5

Regarding your secont point of the question, benefits of dimensionality reduction for a data set may be:

  • reduce the storage space needed
  • speed up computation (for example in machine learning algorithms), less dimensions mean les computing, also less dimensions can allow usage of algorithms unfit for a large number of dimensions
  • remove redundant features, for example no point in storing a terrain's size in both sq meters and sq miles (maybe data gathering was flawed)
  • reducing a data's dimension to 2D or 3D may allow us to plot and visualize it, maybe observe patterns, give us insights

Other than that, beyond PCA, SVD's has many applications in Signals Processing, NLP and many more

clyfe
  • 790
  • 7
  • 8
2

Take a look at this answer of mine. The singular value decomposition is a key component of principal components analysis, which is a very useful and very powerful data analysis technique.

It is often used in facial recognition algorithms, and I make frequent use of it in my day job as a hedge fund analyst.

Chris Taylor
  • 3,432
  • 1
  • 25
  • 29