5

Taleb has previously undermined the typical interpretation of correlation with regards to the informational value it carries, showing how the uncertainty is reduced in a non-linear fashion.

With regards to a graph showing the relationship between COVID death tolls at lockdown time and the daily deaths afterwards, where the 15% of explained variability ($R^2$) was being used to defend the contribution, he makes the following statement:

an R-squared of .15 means, if you look at it generously, that almost all the variance is for random reasons, something like ~98% (conventional) or (entropy) >99.9%

How can these exact numbers be explained?
Is it related to the sampling distribution of the R^2?

Why is this such an unorthodox interpretation of these statistics?

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
Kuku
  • 1,013
  • 5
  • 18
  • 2
    I would say that statement is in general wrong. He or she may have been talked about a specific case. –  Apr 14 '20 at 21:34
  • 1
    @Sören I would say it is a general statement. He has previously undermined the informational value of correlation from a mutual information lens (https://twitter.com/nntaleb/status/1135116646442590208 here, for instance), but I am not sure I follow the distinction, nor why it seems to be an unorthodox or uncommon approach to interpret these statistics. – Kuku Apr 14 '20 at 21:40
  • 1
    $98\%?$ Why not $85\%?$ – Dave Apr 14 '20 at 21:46
  • 1
    @Dave I assume it to be since the informational value of a statistic is different from the amount of explained variance in the sample. But the derivation of those numbers (which I assume are related to information theory) is the gist of my question, indeed. – Kuku Apr 14 '20 at 21:48
  • See https://stats.stackexchange.com/questions/28139/why-squaring-r-gives-explained-variance – Tim Apr 14 '20 at 21:51
  • 1
    @Tim My question is not on why R^2 of 0.15 is interpreted as 15% of explained variability. My question is where do the 98% and 99% numbers come (and maybe tangentially, why is this an unorthodox approach to interpret these statistics). (Curiously enough, I think it is the second time this exact same mishap arises between us) – Kuku Apr 14 '20 at 22:03
  • 2
    I think it means that Taleb thought not enough people were paying attention to him, so he should say something else that made little sense but would get headlines. – Peter Flom Apr 15 '20 at 12:02
  • @PeterFlom-ReinstateMonica should I take this comment as a fundamental disagreement on his derivations? – Kuku Apr 15 '20 at 21:41
  • In the first link it sounds like a description of collider bias. I find it nicely explained here: https://blog.ephorie.de/collider-bias-are-hot-babes-dim-and-eggheads-ugly – Sextus Empiricus Apr 16 '20 at 09:35

1 Answers1

0

The statement is incorrect. If the $r$-value were 0.15 then $r^2$, i.e., the 'explained fraction' would be 0.0225, which would then leave 0.9775 unexplained. However, 0.15 is already $r^2$ which means that $r=\sqrt{0.15}\approx0.387$, such that the explained fraction is 15%, and unexplained 85%

Moreover, the plot does not appear to have been normalized for frequency of occurrence based on relative national population, so that the actual explained fraction is likely higher than 15%.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • That would give a possible explanation for the "conventional" percentage given: a small mishap. But the question remains for the information perspective on it, and where the 99% "entropy" number comes from. – Kuku Apr 17 '20 at 21:20
  • 1
    @Kuku My ability to read minds stops at the first disruptive mistake in a calculation. Maybe from something like [this](https://en.wikipedia.org/wiki/Entropy_of_mixing#Proof_from_statistical_mechanics), who knows? – Carl Apr 17 '20 at 21:36