3

I have a big dataset (e.g. number of variables), I developed a method that selects small subsets (of variables) of it. I would like to estimate how much variance of the original data a given subset can capture or "explain". Is there a way to measure that?

Note: the subsets are not selected using statistical procedures like PCA etc., but rather using a heuristic methods based on the business meaning of these data.

amoeba
  • 93,463
  • 28
  • 275
  • 317
amit
  • 541
  • 3
  • 10
  • 1
    +1. This is related to https://stats.stackexchange.com/questions/8630 (see in particular my answer there: https://stats.stackexchange.com/a/135264 as I think that the currently most accepted answer does not tell the full story). But your Q looks a bit different: instead of asking how much variance a given linear combination of variables explains, you are asking how much variance a given subset of variables explains. Correct? – amoeba May 16 '17 at 09:57
  • If so, then the answer can be obtained by linear regression. Regress your full set of variables onto your subset of variables, and measure R-squared. That's the fraction of explained variance. Do you know how to do it? – amoeba May 16 '17 at 10:01
  • Thank you for the answer. yes your understanding is correct. do you mean multivariate regression. I know how to do it, of course. Isn't this a problem when the regressors appear in the response vectors as well? should I take them out first? – amit May 16 '17 at 10:28
  • Yes, multivariate regression. This is equivalent to doing univariate regressions for each of your original variables, and summing up all the R-squared values, weighing them by each variable's variance. But it's easier to do multivariate regression. If you want to know how much variance of the original data -- *including the selected subset!* -- is explained by the subset, then you should NOT take them out. – amoeba May 16 '17 at 10:56
  • @amoeba how would you have more than dependent variable in the linear regression ? I have posted another question: https://stats.stackexchange.com/questions/544487/how-much-variance-is-explained-from-a-subset-of-features – quant Sep 13 '21 at 15:26

0 Answers0