3

Suppose I have 10 variables: A, B, C, D, E, F, G, H, I, J, all of which contain non-stationary data - say for example daily prices from 1 Jan 2020 until present. Now suppose I pick one of these variables at random - say D. I can look at the correlation matrix to tell me which pairs are correlated and I may find that D and H have a correlation of e.g. .55. Supposing I am not happy with this correlation and would like something more correlated > .70.

My question is this: is there a way to find a combination of these sets so that their combination (either via addition or subtraction) taken together and correlated with D, gives a correlation >= .7. The combination should not include D or the variable in question.

So for example correlation of D vs 2A+H-7J >= .7. What methodology would you adopt? Can you use factor analysis / PCA or any other method?

Thanks.

  • You can always get a correlation of $1$ simply by taking one combination to be a positive multiple of the other. If you want something less trivial, then you need to impose some constraints on the two combinations, such as that they be orthogonal. – whuber Nov 12 '20 at 21:01
  • Could explain this a bit further? You mean something like D vs 2H? – RebeccaKennedy Nov 12 '20 at 21:15
  • Try $D$ vs. $2D.$ Or $2A+H-7J$ vs. $4A+2H-14J.$ If the triviality of this makes you unhappy, consider (say) $D$ vs. $D + (1/10^9)H$ vs. $D:$ now the combinations are different, but their correlation can be made arbitrarily close to $1.$ – whuber Nov 12 '20 at 21:16
  • I can't use D. Sorry I should have clarified in my post. It has to be D vs. everything else but D. But doesn't have to include ALL the other variables can just be one or two. – RebeccaKennedy Nov 12 '20 at 21:18
  • Please, then, edit your post so it states your question as intended. – whuber Nov 12 '20 at 21:21
  • The solution is any constant multiple of the least-squares coefficients for the regression of $D$ against the other variables. My post at https://stats.stackexchange.com/a/108862/919 gives the formulas in terms of the overall covariance matrix. You can't usually obtain a solution if you only have the correlation matrix. – whuber Nov 12 '20 at 21:41

0 Answers0