5

I was reading the following book

Han J, Pei J, Kamber M. Data mining: concepts and techniques. Elsevier; 2011 Jun 9. (Third Edition)

On page 96, at the first line of the last paragraph it says (here)

If the resulting value is equal to $0$, then $A$ and $B$ are independent and there is no correlation between them.

where the resulting value above corresponds to the following formula (correlation coefficient)

$$ r_{A,B}=\frac{\sum_{i=1}^n (a_i - \overline{A}) (b_i - \overline{B})}{n\sigma_A\sigma_B}. \tag{3.3} $$

However, on the next page on the last paragraph, it says

If $A$ and $B$ are independent (i.e., they do not have correlation), then ... $Cov(A,B) = \ldots = 0$.

Up to here, everything looks good, however by the following relation $$ r_{A,B} = \frac{Cov(A,B)}{\sigma_A\sigma_B} \tag{3.5} $$ the correlation and covariance are related and as far as I remember, if the covariance of two random variables tend to be zero, it is not necessary that they are independent. However, the book says if $r_{A,B} = 0$ , then $A$ and $B$ are independent. Am I right that the book is wrong? or there is something else happening here.

Ali Shakiba
  • 153
  • 6
  • 1
    When all the correlations are 0 it is the off diagonal elements that should be 0. Zero correlation implies independence for a bivariate normal but not in general for other distrbutions. – Michael R. Chernick Apr 10 '17 at 19:19
  • Did you buy this book because it's required for a class? I can't think of other reasons to buy this thing – Aksakal Apr 10 '17 at 19:19
  • @Aksakal yes. it is required for a course. – Ali Shakiba Apr 10 '17 at 19:31
  • 1
    If you read just a little further in the book, it explicitly tells you that zero covariance *does not* imply independence: see the bottom of p. 97. – whuber Apr 10 '17 at 19:38
  • @whuber you are right and the contradiction happens here. the $cov(a,b)=0$ does not imply independence, however, $r_{a,b}=0$ implies that. This is what the book says and is a contradiction. – Ali Shakiba Apr 10 '17 at 19:43
  • 2
    I will wholeheartedly agree that the book is not well written. – whuber Apr 10 '17 at 19:46
  • sometimes this whole textbook publishing business looks like a collusion between the publishers and educators: the former pays the latter to force their product on students. some of these textbooks are awful, and would never survive without being required reads. then you have these countless editions of textbooks and the professors requiring the latest editions. why would anyone need 13th edition of calculus text? the damn thing hasn't changed in past 100 years – Aksakal Apr 10 '17 at 20:47
  • Please see the following thread: [Simple examples of uncorrelated but not independent $X$ and $Y$](http://stats.stackexchange.com/questions/85363); I believe it will be a constructive addition to your understanding of the issue. – usεr11852 Apr 10 '17 at 22:49
  • See [this question and its answers](http://stats.stackexchange.com/q/261377/6633) for more information. – Dilip Sarwate Apr 11 '17 at 03:01
  • 2
    Possible duplicate of [Under what additional conditions does independence follow from zero correlation?](http://stats.stackexchange.com/questions/261377/under-what-additional-conditions-does-independence-follow-from-zero-correlation) – Dilip Sarwate Apr 11 '17 at 03:03
  • Did you get a chance to lookup the book's errata? – dangiankit Apr 11 '17 at 05:16
  • @dangiankit yes, I have checked it out (here)[https://wiki.illinois.edu//wiki/display/cs591han/Errata+of+Data+Mining+(3rd+Edition)]. – Ali Shakiba Apr 11 '17 at 08:15

2 Answers2

17

Zero correlation does not imply independence. Either:

  1. There is a typo/mistake and the book is wrong or
  2. The book made additional assumptions previously, for example, that the joint distribution of A and B were bivariate normal. There exist additional conditions such that zero correlation and these conditions would imply independence.
Matthew Gunn
  • 20,541
  • 1
  • 47
  • 85
9

Your book is wrong. Correlation zero is not a sufficient condition for independence. You can have Pearson correlation zero for variables that are not independent.

The independent variables will have both covariance and correlation zero, provided their variances are non-zero. There's no contradiction here.

Aksakal
  • 55,939
  • 5
  • 90
  • 176