XKCD's modified Bayes theorem: actually kinda reasonable?

Question

I know this is from a comic famous for taking advantage of certain analytical tendencies, but it actually looks kind of reasonable after a few minutes of staring. Can anyone outline for me what this "modified Bayes theorem" is doing?

http://www.explainxkcd.com/wiki/index.php/2059:_Modified_Bayes%27_Theorem the explanation from the author. — Tschallacka, Oct 16 '18 at 09:49
@Tschallacka unless any of [the authors](http://www.explainxkcd.com/wiki/index.php?title=2059:_Modified_Bayes%27_Theorem&action=history) is Randall himself, this is not the case. — SQB, Oct 16 '18 at 10:34
But shouldn't you apply Bayes theorem to P(C) to update its value in face of more evidence? — Yakk, Oct 16 '18 at 19:50
I'm pretty sure the $P(C)$ there is just a facetious addition. — Ian MacDonald, Oct 17 '18 at 18:34
No comment on this particular comic, but there are a few XKCD comics that are [taken rather seriously](https://security.stackexchange.com/a/6116/46979). — jpmc26, Oct 18 '18 at 23:53
G5W, would you please construct such an example? :) I believe it's actually not possible based on the answer below. — eric_kernfeld, Oct 19 '18 at 11:56
My error. P(x|H)/P(x) >= 0 so P(C)*[P(x|H)/P(x) - 1] >= -P(C) >= -1 so P(H|x) is always positive. — G5W, Oct 19 '18 at 13:11

tddevlin · Accepted Answer · 2018-10-16T18:53:53.577

107

Well by distributing the $P(H)$ term, we obtain $$ P(H|X) = \frac{P(X|H)P(H)}{P(X)} P(C) + P(H) [1 - P(C)], $$ which we can interpret as the Law of Total Probability applied to the event $C =$ "you are using Bayesian statistics correctly." So if you are using Bayesian statistics correctly, then you recover Bayes' law (the left fraction above) and if you aren't, then you ignore the data and just use your prior on $H$.

I suppose this is a rejoinder against the criticism that in principle Bayesians can adjust the prior to support whatever conclusion they want, whereas Bayesians would argue that this is not how Bayesian statistics actually works.

(And yes, you did successfully nerd-snipe me. I'm neither a mathematician nor a physicist though, so I'm not sure how many points I'm worth.)

edited Oct 16 '18 at 18:53

answered Oct 16 '18 at 00:35

tddevlin

3,205
1
12
27

61

A clever joke that's embedded in the formula above is that if you're not using Bayesian statistics correctly, your inference is completely independent of the truth. – Cliff AB Oct 16 '18 at 02:10
I am particularly curious about the properties of this procedure when $P(C)=1/2$ or similar. You end up with a mixture between the prior and posterior, but this is still not consistent, since the weight of the prior does not decrease as the data keep piling up. – eric_kernfeld Oct 16 '18 at 12:09
25

I hope you didn't type out your answer while crossing a busy street. I'll have no part in this... – eric_kernfeld Oct 16 '18 at 12:10
6

The sort of bayesians carricatured above isn't bayesian statisticians, they are bayesian lawyers – kjetil b halvorsen Oct 16 '18 at 12:28
4

@CliffAB I don't know if I'd call that a clever joke or a law of nature. – eric_kernfeld Oct 16 '18 at 15:30
7

@CLiffAB Do you mean "Your posterior (as calculated by this formula) is independent of the evidence"? – Acccumulation Oct 16 '18 at 17:00
@eric_kernfeld $P(C) = \frac{1}{2}$ corresponds to someone half using Bayesian statistics correctly, i.e. they're wrong somewhere along the line...which just means they're wrong, right? $P(C)$ is binary, I believe. – Lio Elbammalf Oct 17 '18 at 09:05
1

@LioElbammalf That implies that I know whether I'm right or not, which is kind of the issue. What if I'm only 75% sure that I'm using Bayesia statistics correctly? – anaximander Oct 17 '18 at 14:43
@LioElbammalf: Letting $P(C) = \frac12$ in this formula just means that you'll end up assigning half as much weight to the observed evidence as Bayes' law says you should. Effectively, your "half-Bayesian" posterior distribution will end up being halfway between your prior and the proper Bayesian posterior. – Ilmari Karonen Oct 18 '18 at 14:08
@IlmariKaronen But if you're wrong somewhere you could be much further off...Say you're half right but you've made assumptions that mean you calculate $P(X|H)$ to be almost 1 for all cases when in reality it should be almost 0 then you aren't half way off...you're very off. In other words if you're wrong somewhere your result is wrong, if you're not wrong somewhere then you're right. Its a binary thing. – Lio Elbammalf Oct 18 '18 at 14:25
1

@eric_kernfeld "_You end up with a mixture between the prior and posterior, but this is still not consistent, since the weight of the prior does not decrease as the data keep piling up_" Surely that should be something like "_...as the weight of your posterior keeps piling up_" :-) – TripeHound Oct 19 '18 at 14:22

Cliff AB · Answer 2 · 2018-10-18T15:31:23.803

Believe it or not, this type of model does pop up every now and then in very serious statistical models, especially when dealing with data fusion, i.e., trying to combine inference from multiple sensors trying to make inference on a single event.

If a sensor malfunctions, it can greatly bias the inference made when trying to combine the signals from multiple sources. You can make a model more robust to this issue by including a small probability that the sensor is just transmitting random values, independent of the actual event of interest. This has the result that if 90 sensors weakly indicate $A$ is true, but 1 sensor strongly indicates $B$ is true, we should still conclude that $A$ is true (i.e., the posterior probability that this one sensor misfired becomes very high when we realize it contradicts all the other sensors). If the failure distribution is independent of the parameter we want to make inference on, then if the posterior probability that it is a failure is high, the measures from that sensor have very little effect on the posterior distribution for the parameter of interest; in fact, independence if the posterior probability of failure is 1.

Is this a general model that should considered when it comes to inference, i.e., should we replace Bayes theorem with Modified Bayes Theorem when doing Bayesian statistics? No. The reason is that "using Bayesian statistics correctly" isn't really just binary (or if it is, it's always false). Any analysis will have degrees of incorrect assumptions. In order for your conclusions to be completely independent from the data (which is implied by the formula), you need to make extremely grave errors. If "using Bayesian statistics incorrectly" at any level meant your analysis was completely independent of the truth, the use of statistics would be entirely worthless. All models are wrong but some are useful and all that.

I guess we got lucky in discovering the static failure mode of our sensors is one extreme or the other. Noise squashing is much harder though. It's really annoying to discover the sensor is working correctly and the value received is wrong because the wire is acting like an antenna. — Joshua, Oct 16 '18 at 19:53
@Joshua hopefully someday I'll have time to properly learn Kalman filtering for those kinds of situations (or maybe someone will write a brilliant SE answer that makes everything clear?). — mbrig, Oct 17 '18 at 18:58
@Joshua: I believe you are thinking of a much more specific model than I am. While well characterizing the failure distribution can improve inference, these can still be very helpful in the case of very vague failure distributions. For example, suppose we wanted to make inference about a parameter $\mu$ and sensor $i$ measure is $N(a_i \mu, 1)$ if it works. We could include a small probability that it is, say $t(df = 10)$ under failure. If sensor $i$ disagrees strongly with all other sensors, the posterior probability that it is a failure becomes very high. — Cliff AB, Oct 18 '18 at 15:26

XKCD's modified Bayes theorem: actually kinda reasonable?

2 Answers2