Is Babe Ruth's statement meaningful?

Question

Quote from Babe Ruth:

Every strike brings me closer to the next home run.

As I understand memorylessness, this is meaningless. For every at-bat, there is a certain probability that he will strike, and there is a certain probability that he will hit a home run, and that's that. The likelihood of a home run at any particular point in time does not increase as strikes accrue.

However, I have an intuitive understanding of what he means. Is there some statistically-rigorous way to express it or make sense of it?

Maybe it makes sense for someone looking back on Babe Ruth's career with the benefit of hindsight. Or, maybe if we imagine an omniscient deity who can see the entire timeline of the universe at once. The deity can indeed see that, from any particular moment, there are N strikes remaining before Ruth hits the next home run. Another strike reduces that number to N-1. So, indeed, every strike brings him closer to the next home run.

Epilogue

If I could go back in time and rewrite this question, I would have omitted all the baseball references and simply described a guy rolling dice, hoping for a seven. He says, "I'm hoping for a seven, but I'm not bothered when I get something else, because every roll of the dice brings me closer to that seven!" Assuming he eventually rolls a seven, is his assertion the gambler's fallacy? Why or why not?

Thanks to @Ben for articulating that this is not the gambler's fallacy. It would have been the gambler's fallacy if he had instead said, "Every roll of the dice which does not result in a seven makes it more likely that the next roll results in a seven."

The guy didn't make any such statement, and he didn't make any statement at all about probability, merely about the passage of time.

By assuming that there is a seven in his future, we have made it undeniably true that every roll of the dice brings him closer to the seven. In fact, it is trivially true. Every second that ticks by, even when he is sleeping, brings him closer to that seven.

Just to help with googling, what you're describing at the end is Eternalism https://en.wikipedia.org/wiki/Eternalism_(philosophy_of_time) and I agree that this is the framework in which Ruth's statement makes sense. — John Madden, Nov 16 '21 at 15:55
One way this could be true, the first time a player faces the pitcher they may not perform as well as future at bats and thus more likely to hit a home run later in the game. — Glen, Nov 16 '21 at 15:58
@Glen — I see where you're coming from. For this question, I'm trying to keep it simple and ignore any psychological effects. I'm just thinking about it as a series of independent events, like coin flips or rolls of the dice. — SlowMagic, Nov 16 '21 at 16:05
[Gambler's fallacy](https://en.wikipedia.org/wiki/Gambler's_fallacy). — Stephan Kolassa, Nov 16 '21 at 16:37
I am with @StephanKolassa that this is Gambler's fallacy. "Uggg, bad cards again! I guess that brings me closer to my next straight-flush, though!" — Dave, Nov 16 '21 at 16:52
I think treating this as the Gambler's Fallacy might not give Ruth's quotation due credit. It's not necessarily even about probability. Wouldn't most people understand it as an exhortation to make the effort? In other words, being able to observe an event in a stochastic point process depends on continuing the process. — whuber, Nov 16 '21 at 17:25
I actually think the last paragraph is more accurate than it might seem. Babe Ruth himself is implicitly conditioning on there being a next home run (and implicitly that it is a finite number of swings away). Given that he will hit a home run *some time in the future*, as far off as it might be, it is true that each swing brings him one step closer to it. Babe Ruth himself is acting as the deity, but allowing N to be unknown but finite, and therefore not having any omniscience beyond the knowledge that he will hit a next home run. — Noah, Nov 16 '21 at 18:37
@Noah — You expressed it so much more elegantly than I did! — SlowMagic, Nov 16 '21 at 19:03
@StephanKolassa I disagree, Babe didn't say that every strike makes a home run more *likely* - his statement would still be true even if strikes made him *less* likely to hit a home run. Any activity with a duration brings him closer to some arbitrary time in the future - sitting in the dugout or eating a ham sandwich *also* bring him closer to his next home run. I don't think it's the Gambler's Fallacy to say "every day I am closer to death" (even if the chance of death never changes) as it only deals with the timing but not likelihood of events. — Nuclear Hoagie, Nov 17 '21 at 14:58
There's a trade-off. If Babe Ruth doesn't mind getting a strike because it brings him closer to his next inevitable home run, then his happiness shouldn't increase when he actually hits the home run - after all, he knew it was inevitably going to happen. — fblundun, Nov 17 '21 at 16:57
It would benefit readers from non commonwealth and colonies to explain the baseball context. The rules that are relevant to this statement are not a common knowledge among dorks and nerds around the world — Aksakal, Nov 17 '21 at 18:37
All he said was in effect that the only way to hit a home run is to continue trying to hit the ball, and that human nature is such that if you cannot forgive yourself for a strike, you won't be all there to connect with a solid pitch. In other words, @whuber is right again. This is not a gambler's fallacy in the sense that Babe Ruth's batting average was not a negative sum game. In the Babe's case, the payoff far outweighed the losses. — Carl, Nov 17 '21 at 19:35
@Aksakal — Good point. For people in non-baseball countries, I think all you need to know is this: "strike" = undesirable outcome; "home run" = desirable outcome — SlowMagic, Nov 17 '21 at 19:37
@SlowMagic The point is that professional sportsmen/women always make money, but even professional gamblers often do not. More precisely, baseball, soccer, football, basketball and the like are positive sum games, i.e., you earn points only; there are no negative scores, even the penalties are zero or positive sum for the opposing team. This website is mixed, i.e., you can both earn points and lose them, ditto for professional gambling. Indeed, casino gambling, or lotteries are, on average, negative sum games. — Carl, Nov 20 '21 at 23:36

Ben · Accepted Answer · 2021-11-17T22:57:37.807

It is both meaningful and (usually) correct

You are overcomplicating this by bringing probability into a simple non-probabilistic assertion. You need not invoke an omniscient deity in order to accept that there is a reality that exists independently of knowledge of it. (You seem to be operating under the assumption that reality is only admissible to discussion if there is an omniscient being with total knowledge of it; this is a reasonably common misconception of probability, which is examined in this related question.)

The simplest rigorous examination of this statement is a non-statistical analysis based on looking at the underlying population of values pertaining to all the balls Babe Ruth ever hit. Let $X_1,...,X_N$ be the ordered career outcomes of all balls faced by Babe Ruth, with $X_i = \bullet$ denoting a strike and $X_i = \diamond$ denoting a home-run (we need not specify the notation for other possible outcomes). At the end of ball $n$ the number of balls until the next home-run is:

$$B_n \equiv \min \{ k \in \mathbb{N} | X_{n+k} = \diamond \}.$$

Now, we know that a strike and a home-run are mutually exclusive --- i.e., no single ball can be both. Consequently, if ball $n+1$ is a strike (i.e., if $X_{n+1} = \bullet$) and if $B_n<\infty$ (i.e., if Babe has at least one home-run left in his career) then we can easily show that $B_{n+1} = B_n-1$. This confirms Babe's statement that his strike brings him (one ball) closer to his next home-run.

The only exception to this is when Babe gets to the point where he has already hit his last home-run, so that there are no more home-runs left to come in his career. At this point with have $B_n = \infty$ and getting a strike on ball $n+1$ still gives $B_{n+1} = \infty$. In this latter case Babe is no closer to the next home-run, because there is no next home-run.

Of course, at the time of Babe's last home-run, he probably didn't know that would be his last. (According to this historical account, Babe's last home-run was on 25 March 1935. He went on to play five more times without another home-run.) At that point his saying would be wrong, and looking back in hindsight we now know this.

Ultimately, this statement by Babe Ruth is no more controversial than if he asserted, "The elapsing of time spent not getting a home-run brings me closer to my next home-run". That is of course also true, setting aside the situation where he has no future home-runs to get closer to.

Finally, I do not agree with other comments/answers here that assert that this is the gambler's fallacy. It could (but might not) be a manifestation of the gambler's fallacy if he instead said, "Every strike makes it more likely that I will get a home-run in the future". That could be an example of the gambler's fallacy because it would assert that a bad outcome now makes a good outcome in the future more likely. (On the other hand, if strikes are not independent then it might not be.) In any case, merely asserting that the elapsing of time required for a bad outcome to occur now makes a subsequent good outcome closer in time is not the gambler's fallacy, and is not a fallacy at all.

Arguably, even if Ruth _had_ claimed that each strike makes a home run more likely, it still isn’t necessarily the gambler’s fallacy—he might very well learn something with each attempt that doesn’t go quite right, so his at-bats _aren’t_ independent. — KRyan, Nov 17 '21 at 02:18
@KRyan I am not a baseball expert but the opposing pitcher is also learning things about Ruth which makes Ruth's next home run more or less likely. If they think he can consistently hit it out of the park, then maybe they will purposely walk him. This all goes to show that his next at-bat is not independent. — emory, Nov 17 '21 at 03:00
@emory Yes, absolutely true—in which case Babe Ruth’s statement could be taken as an assertion that he’s getting more out of it than his opponents were. Considering his record, this may be plausible. Certainly isn’t fallacious, even if someone could demonstrate it doesn’t happen to be true. — KRyan, Nov 17 '21 at 03:29
@emory: Your analysis neglects that Babe Ruth is a pitcher. This gives him a unique position that can no longer occur; he could counter the pitcher learning about him by playing from experience on both sides. He may well be able to balls that the catcher can catch. — Joshua, Nov 17 '21 at 04:36
This is a great answer, but viewed through that lense the statement is also sort of ... trivial. Yeah, every strike brings him "closer" to the next home-run, but so would actually hitting a home-run (or even sitting out a round and eating a Snickers instead). It's questionable if he wasn't actually more thinking along the lines of the gambler's fallacy. — xLeitix, Nov 17 '21 at 11:58
@xLeitix I was thinking the same thing, but perhaps his intent was "Don't worry about the strikes, because you're still getting closer to the next home run." It's kind of like the saying that you learn from failure -- you also learn from successes, but it's still useful to recognize the value of mistakes. — Barmar, Nov 17 '21 at 14:55
@Joshua it seems you know a lot more about baseball than I do, but my analysis was limited to disproving that each at-bat was independent - which I think I did. A baseball expert such as yourself could construct the exact dependency. — emory, Nov 17 '21 at 15:26
For what it's worth, there's a well-documented effect in baseball where batters hit better and better each time they face the same pitcher, so batters are presumably learning more about the pitcher than vice versa. Also, in Ruth's day, a pitcher would typically pitch an entire game, much more than they do today. This would give even more opportunity for a batter to learn about the pitcher. — isaacg, Nov 17 '21 at 16:39
So then it becomes more of a historical question than a statistical one - In other words, at what point in his career did Babe Ruth make this statement? If it was before his last 6 games, it was an unambiguously true statement, otherwise it was false, even though he could not have known which was the case at the time. — Darrel Hoffman, Nov 17 '21 at 17:57
(+1) No need to invoke determinism either: at the time of Babe Ruth's utterance it was either true or false that he would hit a home run on any particular future occasion, just as now it's either true or false that he did; regardless of whether Laplace's daemon could have known. At least according to orthodox two-valued Logic. — Scortchi - Reinstate Monica, Nov 17 '21 at 22:10

score 3 · Answer 2 · answered Nov 16 '21 at 18:51

Suppose that at at-bat $t$ Babe has access to the information in the filtration $\mathcal{F}_t$. Write that Ruth's next home run with be at at-bat $N$. Further suppose that each at-bat has probability $p$ of being a homerun.

Based on the currently available information at at-bat $t$, our best guess of $N$ is $\mathbb{E}[N \mid \mathcal{F}_t] = t + \frac{1-p}{p}$. At time $t+1$,our best guess is $\mathbb{E}[N \mid \mathcal{F}_{t+1}] = t + 1 + \frac{1-p}{p}$. Notice that the expected home run time is always a constant $\frac{1-p}{p}$ at-bats in the future. Ruth is forgetting that the filtration updates.

Bridgeburners · Answer 3 · 2021-11-19T21:26:34.167

As I see it, there are two interpretations to this statement.

The Non-gambler's fallcy interpretation is merely a statement about the deterministic course of events that will occur in the future. In that case, you can index all hit hits that are home runs (i.e. the set $\{k_1, k_2, \dots k_m\}$ where $k_i$ is the index corresponds to his $i$'th homerun (total of $m$ homeruns) out of the set of all his hits. After swinging for his $j$th career ball, for $k_i < j < k_{i+1}$, he is one hit closer to his $k_{i+1}$'th hit, which is a homerun. This is not a probabilistic statement, but a statement that is true if you take a deterministic viewpoint of events.

But, in this interpretation, you could say it's meaningless because the fact that he hit the $j$'th ball has nothing to do with getting closer to his next home run. All that mattered here was the passage of time. You could equally say that sitting at the dinner table, or falling asleep, or sitting in a room and counting to ten also brings him closer to his next homerun.

The gambler's fallacy interpretation would be this:

"After each strike, the expected number of subsequent hits before my next homerun is lower."

This is the Gambler's fallacy, assuming that each hit in isolation has an equal probability and is independent of other hits. (But this is a questionable assumption.) Under this assumption, if the probability of each homerun is $p$, you can calculate that the expected number of swings before the next homerun is $$ E[\text{number of swings before homerun}] = \sum_{i=1}^\infty i (1-p)^{i-1} p = \frac{1}{p}. $$ This number is independent of the history of hits, making the statement the gambler's fallacy.

Now, my suspicion is that, if you presented both the interpretations to Babe Ruth, he would suggest that the first interpretation is closer to what he means than the second.

I don't follow your first argument, because getting up to bat and swinging is quite different from falling asleep (that is, doing nothing). I think your model might not be capturing some important aspects of the situation. — whuber, Nov 19 '21 at 21:56

Geoffrey Johnson · Answer 4 · 2021-12-01T17:31:52.193

This is very fun to think about. I will answer in terms of probabilities since that is how you framed your original question. Ben has chosen to provide a non-probabilistic answer by retrospectively looking at the sequence of Babe's observed outcomes.

Let's consider a simplified Bernoulli probability model for whether Babe hits a home run. If we let $X$ be the result of an at-bat where $X=1$ "with probability $p$" for a homerun and $X=0$ otherwise, then $P(X=1|p)=p:=\underset{n\rightarrow \infty}{lim}\frac{1}{n}\sum_{i=1}^n X_i$ is a statement about the proportion of homeruns over many, many (infinite in fact) at-bats. $p$ is an unknown fixed constant. When we use probability we are describing the emergent pattern of events over many, many samples. In order for a pattern to emerge that contains homeruns, an eventual home run is inevitable with enough at-bats. It might be more clear to simply say that each at-bat brings him closer to the next home run. This, then, is a no-brainer, a statement of the obvious − Babe cannot strike out indefinitely. Using our simplified Bernoulli model, does this mean that the long-run probability of Babe hitting a home run depends on his past performance?

Where we sometimes run into trouble is if we try assigning probability statements to single events. We can be confident in a single event occurring based on our knowledge of the long-run performance of the process, but a single event does not have a probability. Probability is a proportion so always ask yourself, "A proportion of what?" A proportion of many samples.

The memoryless property is stating that the emergent pattern over many, many samples is the same no matter how the sequence begins, i.e $P(X=1|p, x_1,...,x_n)=P(X=1|p)$ where we are considering that each at-bat is independent of the others. I don't think these ideas are at odds with Babe's statement. Dave's answer (now since deleted) is correct in terms of expected values, but he is incorrect to apply these probability statements to a single at-bat. In some sense Babe is owed a homerun (in repeated trials) because he cannot strike out indefinitely and still have an expected value of $p$ (or $\frac{1}{p}$ for the geometric distribution of the number of at-bats) where $0<p<1$. Some discuss this idea using the phrase "regression to the mean." Perhaps Babe's statement and your intuition are leading us to the CDF of the geometric distribution, $P(K\le k)=1-(1-p)^{k}$ where $K$ is the number of at-bats until Babe's next homerun. $P(K\le k)$ is also a limiting proportion of many samples or trials, where each homerun constitutes the conclusion of a single sample or trial. Such statements about repeated trials give us confidence in what we should expect to see in our reality since our reality can be viewed as one such trial.

Suppose we had many measurements allowing us to estimate Babe's long-run homerun rate, as well as knowing his observed sequence. We could use this information to calculate a predictive p-value testing a hypothesis about the number of at-bats until Babe's next home run. This predictive p-value is also a long-run probability, but it gives us confidence in our hypothesis regarding the next observed result while incorporating information on Babe's long-run performance AND his most recent performance. This is not a p-value testing a hypothesis about $p$, it is a p-value testing a hypothesis about the next observed result, $H_0$: $K\ge k$ or $H_0$: $K\le k$, unconditional on the unknown fixed true $p$. This is the sort of analysis used to construct prediction intervals for, say, a time series or a Poisson process. Here is an answer discussing the prediction of a single coin toss, analogous to predicting the result of the next at-bat without considering the recent sequence leading up to the next at-bat.

As a separate but related thought experiment, think of flipping a coin with $p=0.4$ for the probability of heads and getting straight tails in 10 flips, where $X=1$ denotes a flip landing on heads and $K=k$ is the number of flips until a heads appears. If we are confident $p$ is indeed close to $0.4$ from earlier experiments based on 1,000 flips (and nothing about the coin or the flipper has changed) then we must be witnessing a rare event and it would only be natural to bet on $K=11$ since the likelihood of such a streak continuing is exceedingly rare. The predictive p-value testing the hypothesis $H_0$: $K\ge 12$ $[X=0$ (tails) on the next flip$]$ is the probability of the discrepancy between the observed result and the hypothesized result or something more extreme, $1-\Phi\bigg(\frac{\frac{400}{1000}-\frac{0}{11}}{\sqrt{0.4(1-0.4)/1000 + 0.4(1-0.4)/11}}\bigg)=0.004,$ approximated using a Wald-type test. We are therefore $100(1-0.004)\%=99.6\%$ confident the next flip will result in a heads. Additionally the p-value testing the hypothesis $H_0$: $K=11$ $[X=1$ (heads) on the next flip$]$ is $\Phi\bigg(\frac{\frac{400}{1000}-\frac{1}{11}}{\sqrt{0.4(1-0.4)/1000 + 0.4(1-0.4)/11}}\bigg)=0.981$. I am not suggesting the coin becomes more likely to land heads or tails in any given flip (see my discussion above regarding probability). I am suggesting that if we were to repeat this experiment many thousands of times, the proportion of times where the coin produces a $\hat{p}=0.4$ in 1,000 flips and then lands on tails after a string of 10 consecutive tails (or something more extreme) is incredibly small.

The key to succeeding in the long run with this predictive p-value strategy is to bet only on the length of a run before any data are observed (Neyman-Pearson's error rate). Otherwise, if you have historical data, are in the middle of a run, and interested not only in the ultimate length of the run but also the result of the next flip, one can view the predictive p-value as a weight of the evidence without error rate guarantees (Fisher's evidential p-value).

If we take it as known that $p=0.4$ and reference the CDF of a geometric distribution we see that the probability of the number of flips being at least 12 until a head is observed is $(1-0.4)^{11}=0.004$. This probability from the geometric CDF could be viewed as a p-value testing the hypothesis $H_0$: $p=0.4$. We could instead retain this hypothesis and call into question whether the next flip will land heads. One might also examine the conditional probability $P(K>k|K\ge 11)$.

Knowing the limiting proportion of heads, if we want a safe bet on the outcome of each coin toss that will pay off in the long run then we should just always bet tails. This is like investing in the S&P500. If we want a safe bet on the length of a run then the geometric distribution would indicate we should bet on shorter-length runs.

Above is a time series of 50 flips from a coin with $p=0.4$. Observing runs of a few heads or tails is not uncommon, but if a run were to continue long enough we would naturally anticipate it to end, e.g. consider the string of tails leading up to flip 21. In order for $p$ to remain a constant $0.4$ the coin cannot land on heads or tails indefinitely. This is the probabilistic way of interpreting Babe's statement without invoking the gambler's fallacy, i.e. that the limiting proportion $p$ changes as a result of what we observe.

Here is a wikipedia page on prediction intervals and predictive p-values. Here is a paper on the topic.

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/131566/discussion-on-answer-by-geoffrey-johnson-is-babe-ruths-statement-meaningful). — whuber, Nov 18 '21 at 18:51
Just a comment of opinion - this answer in places is too lengthy without good reason. In particular that picture with two bars of proportions seems like a filler. Maybe it would be received better if concentrated on a concrete point. — Karolis Koncevičius, Nov 20 '21 at 14:43
@Karolis, I appreciate your comment and initially opted for a brief answer as you suggested. However, whuber was critical of my solution, even questioning how I was interpreting probability (see the comments moved to chat). For this reason I have lengthened my solution and included the bar chart to make it unequivocal. — Geoffrey Johnson, Nov 20 '21 at 15:00

Is Babe Ruth's statement meaningful?

4 Answers4

It is both meaningful and (usually) correct