Informational value of R squared and correlation?

Question

Taleb has previously undermined the typical interpretation of correlation with regards to the informational value it carries, showing how the uncertainty is reduced in a non-linear fashion.

With regards to a graph showing the relationship between COVID death tolls at lockdown time and the daily deaths afterwards, where the 15% of explained variability ($R^2$) was being used to defend the contribution, he makes the following statement:

an R-squared of .15 means, if you look at it generously, that almost all the variance is for random reasons, something like ~98% (conventional) or (entropy) >99.9%

How can these exact numbers be explained?
Is it related to the sampling distribution of the R^2?

Why is this such an unorthodox interpretation of these statistics?

I would say that statement is in general wrong. He or she may have been talked about a specific case. — , Apr 14 '20 at 21:34
@Sören I would say it is a general statement. He has previously undermined the informational value of correlation from a mutual information lens (https://twitter.com/nntaleb/status/1135116646442590208 here, for instance), but I am not sure I follow the distinction, nor why it seems to be an unorthodox or uncommon approach to interpret these statistics. — Kuku, Apr 14 '20 at 21:40
@Dave I assume it to be since the informational value of a statistic is different from the amount of explained variance in the sample. But the derivation of those numbers (which I assume are related to information theory) is the gist of my question, indeed. — Kuku, Apr 14 '20 at 21:48
See https://stats.stackexchange.com/questions/28139/why-squaring-r-gives-explained-variance — Tim, Apr 14 '20 at 21:51
@Tim My question is not on why R^2 of 0.15 is interpreted as 15% of explained variability. My question is where do the 98% and 99% numbers come (and maybe tangentially, why is this an unorthodox approach to interpret these statistics). (Curiously enough, I think it is the second time this exact same mishap arises between us) — Kuku, Apr 14 '20 at 22:03
I think it means that Taleb thought not enough people were paying attention to him, so he should say something else that made little sense but would get headlines. — Peter Flom, Apr 15 '20 at 12:02
@PeterFlom-ReinstateMonica should I take this comment as a fundamental disagreement on his derivations? — Kuku, Apr 15 '20 at 21:41
In the first link it sounds like a description of collider bias. I find it nicely explained here: https://blog.ephorie.de/collider-bias-are-hot-babes-dim-and-eggheads-ugly — Sextus Empiricus, Apr 16 '20 at 09:35

score 0 · Answer 1 · answered Apr 17 '20 at 21:12

0

The statement is incorrect. If the $r$-value were 0.15 then $r^2$, i.e., the 'explained fraction' would be 0.0225, which would then leave 0.9775 unexplained. However, 0.15 is already $r^2$ which means that $r=\sqrt{0.15}\approx0.387$, such that the explained fraction is 15%, and unexplained 85%

Moreover, the plot does not appear to have been normalized for frequency of occurrence based on relative national population, so that the actual explained fraction is likely higher than 15%.

answered Apr 17 '20 at 21:12

Carl

11,532
7
45
102

That would give a possible explanation for the "conventional" percentage given: a small mishap. But the question remains for the information perspective on it, and where the 99% "entropy" number comes from. – Kuku Apr 17 '20 at 21:20
1

@Kuku My ability to read minds stops at the first disruptive mistake in a calculation. Maybe from something like [this](https://en.wikipedia.org/wiki/Entropy_of_mixing#Proof_from_statistical_mechanics), who knows? – Carl Apr 17 '20 at 21:36

Informational value of R squared and correlation?

1 Answers1