2

This is continuation of my problem

Calculate variance of sum random variables

Suppose random variable $X$ takes 3 values $1, 2, 3$ with probability $\frac{1}{2}$, $\frac{1}{3}$ and $\frac{1}{6}$. Sample 1000 times from $X$ independently and do the following operations. Take a number $T=0$. If 1 comes, add 0.18 with $T$, if 2 comes add 0.05 with $T$ if 3 comes add -0.29 with $T$. Then expected value of $T$ will be 58.33 and variance will be 27.65. Now I want to find the probability $T<50$. For this I tried to use central limit theorem namely I approximated the distribution by Normal $N(58.33,27.65)$. Then integrate the density function of this normal distribution from -infinity to 50 to get the probability. Can we use better approximation than this? Since we know higher moments of $T$, can we somehow use these values?

str
  • 45
  • 4
  • 2
    You don't actually need to approximate: you can find this distribution exactly. There are indeed ways to use the higher moments for improved approximation. See https://en.wikipedia.org/wiki/Edgeworth_series#The_Edgeworth_series and https://en.wikipedia.org/wiki/Cornish%E2%80%93Fisher_expansion for instance. – whuber Oct 05 '19 at 20:00
  • 2
    Another approximation is the saddlepoint approximation---related to the Edgeworth approximation but often better: https://stats.stackexchange.com/questions/191492/how-does-saddlepoint-approximation-work/191781#191781 – kjetil b halvorsen Oct 05 '19 at 20:13

1 Answers1

2

You are adding many i.i.d. random variables with nice distribution, so the CLT will work fantastically. If you are after an approximately exact result, you could have a look at the asymptotic expansions mentioned in the comments or FFT.

An other (simple) option is simulation: Here, we draw 100'000 times from the distribution of interest and calculate mean, variance and the probability to be below 50.

R code

set.seed(0)
one_rv <- function(n = 1000) {
  sum(sample(c(0.18, 0.05, -0.29), size = n, 
             replace = TRUE, prob = c(1/2, 1/3, 1/6)))
}

many_rv <- replicate(1e5, one_rv())
mean(many_rv)                             # Mean: 58.3512
var(many_rv)                              # Variance: 27.69896
mean(many_rv < 50)                        # Prob < 50 by simulation: 0.056600
pnorm(50, mean = 58.33, sd = sqrt(27.65)) # By normal approx:        0.056579

The probabilities are very close (and both versions are approximations).

Michael M
  • 10,553
  • 5
  • 27
  • 43