0

I have a vector $Y$ where I only know the mean $\overline \mu_Y$ but not the single data points.

How can I generate a vector $X$ so that $Cor(X,Y)$ is high (>0.5). Is that possible at all? What else would I have to know?

The formula for the sample Pearson correlation coefficient is:

$$ r_{xy} =\frac{\sum ^n _{i=1}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum ^n _{i=1}(x_i - \bar{x})^2} \sqrt{\sum ^n _{i=1}(y_i - \bar{y})^2}} $$

spore234
  • 1,323
  • 1
  • 15
  • 31
  • Do you know *anything* more about X and Y ? – Tim Nov 06 '15 at 13:34
  • @Tim the vector X can be as I want, it is my goal to generate a vector X that is highly to correlated to Y where I only know the mean of Y. – spore234 Nov 06 '15 at 13:53
  • There are infinitely many solutions to problem defined like this. E.g. you can simply generate values from bivariate normal with all parameters set as any values besides one of the means... Also, "highly correlated" is *very* ambiguous (see example here http://stats.stackexchange.com/a/132538/35989). – Tim Nov 06 '15 at 13:59
  • @Tim thanks, I already assumed that this problem is infeasible – spore234 Nov 06 '15 at 14:01
  • Asking for a high *covariance* is unclear - what counts as "high"? Perhaps if you said you wanted to achieve a *specified* covariance, that would be clearer. I'm not sure why you jump from covariance in the second paragraph to correlation in the third. – Silverfish Nov 06 '15 at 14:22
  • @Silverfish I made it more clear and fixed the typo – spore234 Nov 06 '15 at 14:24

2 Answers2

1

Correlation is

$$ \rho_{X,Y} = {\sigma_{X,Y} \over \sigma_X \sigma_Y} $$

Your problem is that you dou don't know $\sigma_Y$, have only loose idea of desired $\rho_{X,Y}$ (> 0.5), don't know $\sigma_{X,Y}$ and allow $\sigma_X$ to be possibly any value, then there is infinitely many solutions for such problem. At the same time

I input a vector $X$ into a black box. This box knows $Y$ and spits out the correlation of $X$ and $Y$. What vector $X$ do I have to feed the box to get a high correlation.

what makes your problem hopeless since among infinite possible values only some are correct. Notice also that your question is contradictory, because you say

I don't know what is in $Y$ and I cannot use it to generate $X$.

and at the same time, you want to generate $X$ such that it is dependent on $Y$ (correlation measures dependence, or degree of association). Independently of $Y$ you want to generate $X$ that is dependent on it.

In this case possibly the only thing that you could do is to generate some totally random data and hope that by pure luck one of the samples will appear correlated with $Y$. To be more efficient, you could try to learn from output of your black box and somehow adapt (e.g. using some genetic algorithm) based on output (correlations returned by the black box).

Tim
  • 108,699
  • 20
  • 212
  • 390
-1

This should meet your criteria?

# R code
y.mu <- 32
y <- sort(rnorm(100, y.mu, sd=1))
x <- sort(rnorm(100, 30, sd=3))
cor(x, y, method='pearson')
# 0.9879412

Edit relative to comment.

y.mu <- 32
y <- rnorm(100, y.mu, sd=1)
x <- 3 * y + 22 + rnorm(100, 0, 1.7)
cor(x, y)
# 0.8685377

It is unclear why you would need two such vectors.

ui_90jax
  • 111
  • 3
  • thanks, but the assumption that Y is ordered is pretty strong and not available in my case. – spore234 Nov 06 '15 at 14:17
  • I don't know what is in Y and I cannot use it to generate X. I just know its mean y_mu – spore234 Nov 06 '15 at 14:30
  • @spore234 if you don't know what is Y than you simply cannot generate even the values of Y, not talking about X correlated with it! – Tim Nov 06 '15 at 14:37
  • As @Tim indicated, there are an infinite number of solutions to your problem they way it is specified. Using is y to generate x is a sufficient but not necessary condition. y – ui_90jax Nov 06 '15 at 14:44
  • @Tim The situation is as follows: I know y_mu. I input a vector X into a black box. This box knows Y and spits out the correlation of X and Y. What vector X do I have to feed the box to get a high correlation. – spore234 Nov 06 '15 at 14:48