Having trouble explaining the failure in probability logic involved in extrapolating short term sports trend to a full game

Question

I am watching basketball with a friend and the Pistons lead the Hawks 42-18 after the first quarter. My friend then says this is just as likely as the Pistons winning the game 168-72. This seems wrong, but I don't know how to explain it. I know it might have something to do with the low probability of repeating a rare event four times in a row, but a natural response is that, given that they've already dominated the first quarter, doesn't it make it more likely that they can do the same, or more, in subsequent quarters?

Why isn't winning a game by 96 points just as likely as winning a quarter by 24 points?

A more extreme example: a team returns the opening kickoff for a touchdown and leads 7-0 with 14:50 remaining in the first quarter, and the announcer predictably says: "at this pace, they'll win the game 2520-0! Haha!".

The question is more generally about where the failure in probability logic occurs when short-term temporal trends are directly extrapolated over a longer period of time. I know the variance probably increases with team so that could have something to do with it, but that also seems to increase the likelihood of extreme events, especially conditional on the trajectory starting off extreme.

There are so many factors that could lead to the Pistons not winning 168-72. Players get tired or injured. They could get into foul trouble and have to sit on the bench for a significant amount of time. The opposition (Atlanta Hawks) could change strategy such as going from man to man defense to zone or vice versa. This can't all be put into a statistical model to accurately determine the final outcome. — Michael R. Chernick, Jan 19 '17 at 01:57

score 1 · Answer 1 · answered Jan 19 '17 at 02:04

I think this is just a simple problem with extrapolation. Time series make it more subtle, but the issue is still there.

For example, if I draw a line through human male heights and weights I might predict that a 90,000 kg man will be 2,000 meters tall. The problem is there is no 90,000 kg man, so we immediately see how foolish this "model" (extrapolation) is.

Similarly if I'd draw a line through Apple stock prices a few years ago, I might predict that within ten years I'd be a billionaire. Same problem -- extrapolation -- but it's disguised because ten years from my prediction will come, even though a 90,000 kg man will never be.

Time series also tend to have limits, saturations, cycles, positive and negative correlations and feedback, etc as well.

And in your sports example, it's not like scoring is a law of nature or that teams might not change their strategy. Perhaps the Hawks will purposely slow the pace down, perhaps the Pistons have played so hard they can't sustain that pace. Perhaps teams tend to play to win, not to score the maximum number of points, and 24 points is reasonably comfortable later in the game. These aren't random numbers, they're tied to human performance, motivations, and strategies.

score 1 · Answer 2 · edited Apr 13 '17 at 12:44

Imagine you flip a coin heads 8 times in a row. How many heads do you expect in the next 8 flips?

If you know the coin is fair, the streak of 8 heads is irrelevant and you will expect 4 heads in the next 8 flips (with the total number of heads following the binomial distribution with $p = .5$) Conditional on knowing the probability $p$ of flipping a coin heads, prior history is irrelevant.
On the other hand, flipping 8 heads in a row may lead you to believe that the coin is not fair, that probability of heads $p$ is greater than $\frac{1}{2}$. If you believe in subjective probability (i.e. you're willing to treat $p$ as a random variable), you would update your beliefs about $p$ for the coin using Bayes rule.

A slightly more general setting

Let $\Delta s_t = s_t - s_{t-1}$ be the change in score from time $t-1$ to time $t$. Obseve that the score $s_t$ can be written as:

$$ s_t = \sum_\tau \Delta s_{\tau} $$

Let's assume each $\Delta s_t$ is drawn independently from some distribution $\mathcal{S}$. We can write $\Delta s_t = \mu + \epsilon_t$ where $\{\epsilon_t\}$ is a white noise process.

Like the coin flip example above, all that matters for forecasting the expected score $\mathrm{E}[s_t \mid \mathcal{F} ]$ is $\mu$. Past history matters only to the extent that it helps us know what $\mu$ is.

Examples:

Let's assume watching the Pistons beat up the Hawks in the 1st quarter doesn't tell us anything about $\mu$. If there are 100 time increments left in the game and we then think $E[\mu | \mathcal{F}] = .02$ then we'll expect the Piston to increase their lead by 2 points.
Let's assume watching the Pistons beat up the Hawks has made us update our forecast of $\mu$ from $.02$ to $.2$. We then expect the Pistons to increase their lead by 20 points over the next 100 time increments.

And it gets even more complicated... (each time increment need not be independent)

If you're an astute observe, you probably realized that I previously assumed that each time increment is conditionally independent (given $p$ or $\mu$). This could easily be violated:

Teams way ahead may take their feet of the gas.
Teams way way behind may entirely give up, stop trying to win.
Teams way behind may adopt negative expectation, variance increasing strategies.
Changing conditions (eg. injuries etc...) can make $\mu$ a time varying process $\mu_t$.

All of these would lead you to not treat $\Delta s_t$ as iid draws from some distribution.

Summary/Conclusion

My basic intuition is that when you observe a team run up the score in the 1st quarter, you're mostly observing random noise rather than useful information about some parameter $\mu$. Forecasting the second quarter would be 42-18 after the first quarter was 42-18 would be similarly problematic as forecasting 42-18 flips of heads in the next 60 flips based upon observing 42-18 flips of heads to tails in your first 60 flips.

I didn't read the whole answer but the preamble is a situation with independent flips. The question supposes the quarters are not independent. I do agree with your final conclusion about it being such a large margin due to random noise rather than true signal, but I still don't think the 42-18 heads is an apt analogy because, if I saw 42 heads out of 60 to start, I would soundly reject the idea that it's a fair coin. Maybe the answer to the OP has something to do with the regression effect? Not sure. — gammer, Jan 19 '17 at 02:57
@gammer I address that in the "it's even more complicated" section. I think the key conceptual point is how much you learn about $\mu$ based upon recent history. You can easily observe this issue in the IID case. Of course, the real world almost certainly isn't IID, and that can make things much more complicated... — Matthew Gunn, Jan 19 '17 at 03:00
But, having no other information, wouldn't you best guess at the next round of 60 flips be 42 heads? Why not use the MLE of the success probability there? — gammer, Jan 19 '17 at 03:03
Good point @gammer! I have a [strong prior](http://www.stat.columbia.edu/~gelman/research/published/diceRev2.pdf) though that it's extremely difficult to bias a coin... Though I dunno, [maybe you can](https://izbicki.me/blog/how-to-create-an-unfair-coin-and-prove-it-with-math.html)! — Matthew Gunn, Jan 19 '17 at 03:04
Fair enough. In this case, I think the true coin is probably actually biased in favor of Atlanta... Right now, I can see it's the fourth quarter and Detroit leads by 24 still. So, maybe pretty close to 50/50... Lol — gammer, Jan 19 '17 at 03:07

Having trouble explaining the failure in probability logic involved in extrapolating short term sports trend to a full game

2 Answers2

A slightly more general setting

And it gets even more complicated... (each time increment need not be independent)

Summary/Conclusion