Intuition behind strong vs weak laws of large numbers (with an R simulation)

Question

This rather looks quite basic, but when referring to weak and strong law of large numbers this is the definition I look at (Casella and Berger)

equation

Can you please give an 'intuition' in understanding the difference between them.

Also, what does the limits inside the probability signify for the strong law?

Can you give me a simulation in R to signify the difference between them?

Your "weak" law is really convergence in probability, unless you're referring to the weak law of large numbers, where $Y$ is constant. — Alex R., Apr 04 '16 at 21:47
The difference is the difference between convergence in probability and convergence almost surely. For convergence in probability, the random variable can take values far from the limit infinitely many times, while for convergence almost surely, this is impossible -- after a certain $n$, with probability $1$ the random variables will be and _remain_ arbitrarily close to the limit. Thus in some sense converge almost surely better resembles our intuition for deterministic limits. — Chill2Macht, Jun 30 '17 at 16:39

score 9 · Accepted Answer · answered Apr 05 '16 at 08:17

It might be clearer to state the weak law as $$\overline{Y}_n\ \xrightarrow{P}\ \mu \,\textrm{ when }\ n \to \infty , \text{ i.e. } \forall \varepsilon \gt 0: \lim_{n\to\infty}\Pr\!\left(\,|\overline{Y}_n-\mu| \lt \varepsilon\,\right) = 1$$ and the strong law as $$\overline{Y}_n\ \xrightarrow{a.s.}\ \mu \,\textrm{ when }\ n \to \infty , \text{ i.e. } \Pr\!\left( \lim_{n\to\infty}\overline{Y}_n = \mu \right) = 1$$

You might think of the weak law as saying that the sample average is usually close to the mean when the sample size is big, and the strong law as saying the sample average almost certainly converges to the mean as the sample size grows.

The difference happens when failures of the sample average to be close to the mean are big enough to prevent convergence.

As an illustration using R, take Wikipedia's first example, with $X$ being exponentially distributed random variable with parameter $1$ and $Y= \dfrac{\sin(x) e^x}{x}$ so $E[Y]=\frac{\pi}{2}$. Let's consider $100$ cases where the sample size is $10000$:

set.seed(1)
cases <- 100
samplesize <- 10000
Xmat <- matrix(rexp(samplesize*cases, rate=1), ncol=samplesize)
Ymat <- sin(Xmat) * exp(Xmat) / Xmat
plot(samplemeans <- rowMeans(Ymat), 
    main="most sample averages close to expectation")
abline(h=pi/2, col="red")

but now look at the failure of the running sample average over the same $1$ million observations to get to the mean and stay there

plot(cumsum(Ymat)/(1:(samplesize*cases)),
    main="running sample average not always converging to expectation")
abline(h=pi/2, col="red")

so this was an illustration of convergence in probability? My doubt is, how does the definition that interchanges the limit and probability in the mathematical expression of weak and large law reflect this difference in concept? — Nisha, Sep 19 '20 at 14:05
@Nisha $ \forall \varepsilon \gt 0: \lim\limits_{n\to\infty}\Pr\!\left(\,|\overline{Y}_n-\mu| \lt \varepsilon\,\right) = 1$ is the definition of convergence in probability to $\mu$ while $\Pr\!\left( \lim\limits_{n\to\infty}\overline{Y}_n = \mu \right) = 1$ is almost sure convergence to $\mu$. The latter implies the former (Fatou's lemma) but this and other examples show that the former does not imply the latter — Henry, Sep 19 '20 at 16:44

Intuition behind strong vs weak laws of large numbers (with an R simulation)

1 Answers1