41

Although this question is somewhat subjective, I hope it qualifies as a good subjective question according to the faq guidelines. It is based on a question that Olle Häggström asked me a year ago and although I have some thoughts about it I do not have a definite answer and I would appreciate some help from others.

Background:

A paper entitled "Equidistant letter sequences in the book of Genesis," by D. Witztum, E. Rips and Y. Rosenberg made the extraordinary claim that the Hebrew text of the Book of Genesis encodes events which did not occur until millennia after the text was written. The paper was published by "Statistical Science" in 1994 (Vol. 9 429-438), and was offered as a "challenging puzzle" whose solution may contribute to the field of statistics.

In reply, another paper entitled "Solving the Bible code puzzle" by B. McKay, D. Bar-Natan, M. Bar-Hillel and G. Kalai appeared in Statistical science in 1999 (Vol. 14 (1999) 150-173). The new paper argues that Witztum, Rips and Rosenberg's case is fatally defective, indeed that their result merely reflects on the choices made in designing their experiment and collecting the data for it. The paper presents extensive evidence in support of that conclusion.

(My own interests which are summarized in Section 8 of our paper are detailed in another technical report with Bar Hillel and Mckay entitled "The two famous rabbis experiments: how similar is too similar?" See also this site.)

The questions:

Olle Häggström's specific question was:

"I once suggested that your paper might be useful in a statistics course on advanced undergraduate level, for the purpose of illustrating the pitfalls of data mining and related techniques. Would you agree?"

In addition to Olle's question let me ask a more general question.

Is there something related to statistics that we have learned, (including perhaps some interesting questions to ask) from the Bible Code episode.

Just to make it clear, my question is restricted to insights related to statistics and not to any other aspect of this episode.

Gil Kalai
  • 291
  • 3
  • 7
  • 1
    this is an interesting subject. I am curious why you (McKay et al 1999) would choose 'War and Peace' as a control rather than, for example, random strings of letters (perhaps weighted by their observed frequencies). In other words, is it sufficient for the text to be sufficiently long, or does it have to be sufficiently long and comprehensible (or sufficiently long and of some literary value)? – David LeBauer Jan 17 '11 at 15:33
  • 3
    David, the Choice of "War and Peace" as a control text (More precisely the beginning of the Hebrew translation of "War and Peace" of the same length as the Book of Genesis) was done by the original researchers. The story according to Aumann is this: When Bob Aumann who carefully followed the experiment told Kenneth Arrow about the marvelous findings in "Genesis", Arrow asked what about "War and Peace". Aumann then started reporting about the war and peace situation in Israel but it turned out that what Arrow asked about was if the same phenomenon cannot be found in "War and Peace". – Gil Kalai Jan 17 '11 at 16:57
  • 1
    The Bible code episode would be a good illustration of the strengths of the Bayesian view of probability. In particular, the Bayes factor $P(D|H)/P(D|not H)$ are insufficiently large given that we would assign a small prior probability to $P(H)$. (H being the hypothesis there exists some mechanism whereby world events are encoded in the Bible.) – charles.y.zheng Mar 09 '11 at 09:58
  • 2
    By the way, you are free to post your own answers. I'd be very interested, as you have presumably weathered a lot of analyses of the whole experience. – Iterator Aug 12 '11 at 01:29
  • Dear Iterator, yes, yes, I plan to do it at one time. – Gil Kalai Dec 25 '14 at 08:14
  • 2
    One thing that statistics must be, is complete. For instance, I introduced professor Dror Bar-Natan to an additional Bible Code format, and was asking if he would examine it. He basically said that what was done, was done. In his opinion, his contributions in "Solving the Bible code puzzle", and many other contributions in the study of Bible Codes, had proven that Bible Codes were just a farce. However, he had studied just one possible bible code language of an infinite number of possible code languages, yet rejecting 1 language was said to be enough to reject all other languages. Logical ??? – Sean Feb 17 '18 at 00:58
  • Apparently not, if you consider that this still hasn't been answered. More seriously though: There actually there were some insights in the question and the comments. The main insight seems to be that you need a control if you want to demonstrate that something is unusual. – Thomas Levine Sep 05 '11 at 23:31
  • 1
    @Sean - be cautious about accidentally performing a p-value hacking [link](https://xkcd.com/882/). The human search for significance is our primary motivation in being, so it is important that we not treat it in a dangerously casual or a neglectful manner. – EngrStudent Nov 18 '21 at 19:28

1 Answers1

1

Kepler said, in effect, that it was unnecessary to postulate angelic intervention to explain the moon's orbit. William of Ockham, an English Franciscan friar, was excommunicated for leaving Avignon without permission. One can suspect that the principle of parsimony that he espoused was not especially popular with establishmentarians.

In statistics, we would generally refer to models with supernumerary postulates as overfit or predictive/extrapolative, and testing for overfitting or extrapolation is not merely a matter of doing controlled experiments as comparison between two models does not address the parsimony or predictive accuracy of either one. As Einstein intimated, one should only postulate the least number of relevant variables that adequately explains the data, and no fewer. Note the implication of deterministic data interpretation. If a predictive model is based upon an assumption, then whichever prediction is being made either has to be related to some hypothesis that is verifiable and deterministic, or we are engaging in what physicists call 'hand-waving'.

Re: Olle Häggström's question "...illustrating the pitfalls of data mining and related techniques." This can happen if one fails entirely to postulate a pre hoc deterministic context. In this case, the numerical analysis invites no deterministic interpretation, and invites us to discount hypothesis testing, and as such the interpretation invokes faith, not science. Such efforts, if done conscientiously, are likely to be futile, and even the declaration of a miraculous event is often based upon faith-based determinism. As if we needed further on this, Newton spent most of his lifetime's intellectual activity looking for numerical structure within the bible, and yet was silent on that topic. It is plausible to ascribe that failure, along with Newton's secret failure to deduce chemistry having only alchemy as a starting point, to problems that he could not solve. Thus, even physicists, including Newton who was arguably the first physicist, can fall victim to their own unverifiable imaginings. Nor has this ever ceased to be problematic. For example, string theorists have faith that eventually something verifiable will eventually result from their current multi-dimensional exercises in pure mathematics, and so far, there is nothing of the sort.

Many modern theologians would argue against a literal interpretation of the mere wording of biblical Divine inspiration, as in the contrary case we would pay undue attention to concepts such as what constitutes a firmament (vault) in the heavens that opened to let the waters come forth upon the Earth (Genesis). It is plausible that the concept of the Earth's being late bombardment epoch supplied by water from ice asteroids formed in the region beyond the solar system's snow line, as it contains words that would have been anachronisms in biblical times, is a reasonable substitute for the rather more terse concept of a firmament. What is perhaps astounding, is that there are modern equivalents to "Let there be light" but I caution that that could refer to several events in modern cosmology, e.g., the ending of the dark universe several hundred million years after the big bang, or the illumination of the sun a mere 4.6 billion years ago. Furthermore, there is no general agreement on these concepts, as some interpretations of the bible are literal and deny the existence of anything older than 6000 years ago, again from counting up numbers as cited in the bible. Keep in mind that scribes hand copied text 2000 years ago, so that minor changes in actual text, as exemplified by the Dead Sea Scrolls, would have been inevitable, which brings up the question of how one can do precise numerology on paraphrased text. All of this is beside the point, which is that the bible has a clear evolution in time of what constituted ethical comportment, and that today, there are multiple versions variously accepted by different sects and religions as dogma, not all of which can have identical numerological significance.

There is a lot of wisdom in the bible, but given the preceding text, it is unlikely to be efficiently unpacked using numerology, as there is no unique textural version of the bible to refer to. Thus, my answer to the question Is there something related to statistics that we have learned, (including perhaps some interesting questions to ask) from the Bible Code episode [Sic, ?] is that most of the learning is from the negative side of the argument, i.e., what not to do, and why not. That does not mean that one cannot analyze the bible numerically, just that it is unlikely to yield very much in the way of useful information.

There are two types of information. Rather than say this directly, let us illustrate them in practice. George Box famously said "Essentially, all models are wrong, but some are useful." What that implies is that when we formulate a hypothetical explanation, what we have done does not illustrate any fundamental truth, but rather a "working hypothesis," that has the property of explanatory power that makes it useful. Such is the way of the investigator, condemned to perpetually search for words to express the truth, but words are, at best a convention.

What then is truth? Truth is not in words but rather behind them. In specific, each of us sees only our own personal world view, for example, we need faith to believe that life is worth living. Thus, truth is a first person phenomenon and words are not truth; they are shared values that by their nature have no insight of their own such that no matter how well we choose our words, it takes an individual, each of us, to actually have an insight that implies any feeling of mystery, wonder or appreciation. That is, we live in our own skins and our existence is a first person, subjective, phenomenon. Thus we can argue about whether in Exodus God told Moses "I am He who am," or the more literal translation "I am that which am," but the mystery is not in the words but rather in the first person context that inspired them.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • 2
    I was unable to detect an answer to the questions in this post. Could you point it out or make it more prominent? – whuber Dec 04 '21 at 22:22
  • @whuber OK, I agree to do that. – Carl Dec 04 '21 at 22:24
  • @whuber Done, as per your request. Gee, we seem to communicate quite frequently. Anyway, I hope to have shed some light on the subject, in particular, for any postulate, one has to examine the unspoken assumptions, which in this case, are so numerous as to virtually exclude an unambiguous interpretation. – Carl Dec 04 '21 at 22:34