5

My question is how reversibility effects correlation. I am effectively a statistics noob, so forgive my lack of proper terminology. I will use the following example to illustrate the question.

I have created a list of random numbers ranging from 0 to 30, and a second list indicating the first order of the number.

Random list of 10 numbers ranging from 0 to 30

Clearly, there is strong correlation between the two columns, yet going from num to dec is 100% certain, whereas going from dec to num is a guess (10% chance of guessing the correct number). How is this phenomenon called statistically?

A correlation plot would look as follows. It is symmetric along its diagonal, but would it make sense to have a correlation 'from' num to dec in one half (corr = 1), and 'from' dec to num in the other half (corr = 0.1). (probably the relation I am referring to is not actually correlation, but useful for data science nonetheless)

Correlation between num and dec columns

Ben
  • 91,027
  • 3
  • 150
  • 376

3 Answers3

8

Correlation does not have a "from" and a "to". It is invariant $Cor(A, B) = Cor(B, A)$. The terms "from" and "to" can make sense in the context of regression, where we speak of "independent" and "dependent" variables or "predictor" and "predicted". Pearson correlation is closely related to linear regression. In Linear Regression again, the first order of a value does not play a role, it cannot be expressed in it.

So if you constructed a form of regression that has a way to express "first order of value", then that form of regression would perform better with $num$ as predictor for $dec$ then the other way around.

Nuclear03020704
  • 730
  • 4
  • 18
Bernhard
  • 7,419
  • 14
  • 36
  • Thank you for your reply, you gave some useful pointers and terminology for further reading! I'll get back here if any questions remain – Arnold de Jager May 12 '20 at 10:53
5

This is simply a case where dec is a function of num ---i.e., the value of dec is fully determined by the value of num. That is all it is called --- a function. Functions of random variables are often correlated with the initial random variables, so this is not an unusual situation. The correlation indicates that the two variables are (statistically) linearly related, which they are. Obviously, in this case the correlation is not a particularly good representation of the relationship, but that is not surprising, since the function relationship is highly non-linear.

Ben
  • 91,027
  • 3
  • 150
  • 376
2

As Bernhard mentioned, correlation does not have a "from - to" concept. It describes the relationship between to variables.

Another useful idea to think about is that if we change (or filter on) one variable, how would another variable change.

Think about the relationship between human height and weight, if we focus on tall population, it is very likely we are getting larger numbers on weight. This is called "positive" correlation.

Now think about another interesting case what will happen if one variable have zero variance, i.e., how all data have same value?

The answer can be found in this closely related post

How would you explain covariance to someone who understands only the mean?

Haitao Du
  • 32,885
  • 17
  • 118
  • 213