5

I have $N$ Bernoulli variables, $X_1,...,X_N$ and $X_i\sim B(1, \pi_i)$, $\pi$ is known for each $X_i$, and $Y=X_1+...+X_N$, now I need to get the destribution of $Y$.

If $X_i$ and $X_j$ are independent when $i\ne j$, then I can use the simulation:

1. Generate X1, ..., XN via their distribution, and then
   get the value of Y;
2. Repeat step 1 for 10000 times, and then I can get
   Y1, ..., Y10000, so I can know the distribution of Y.

But now $X_i$ and $X_j$ are dependent, so I also need to take into account the correlation, assuming that $\text{cor}(X_i, X_j)=0.2$ when $i\ne j$, how can I insert the correlation to the simulation? Or get the distribution of $Y$ via other ways?

Tim
  • 108,699
  • 20
  • 212
  • 390
PepsiCo
  • 231
  • 2
  • 4
  • I might have missed something, but the sum of $N$ **independent** Bernoulli trials is the binomial distribution. Are you saying that your simulation for independent trials has that correlation, or that you want to construct a simulation to have the specified correlation? – Sycorax Jan 16 '14 at 16:25
  • Thanks for your editing. I have not make the problem clearly. The N bernoulli variables are dependent, and with different `\pi_i`, so I don't know how to insert the effect of correlation. – PepsiCo Jan 16 '14 at 16:36
  • 2
    It is unclear whether such a distribution exists. Correlation coefficients must satisfy a stringent algebraic relationship: see http://arxiv.org/pdf/physics/0605189.pdf. – whuber Jan 16 '14 at 17:28
  • 2
    I'm looking for a solution to the same problem. Here are two papers on how to do this, though I have yet to find an implementation in R: Klotz, J., (1973). Statistical inference in Bernoulli trials with dependence. Annals of Statistics, 1, 373- 379. Ladd, D.W., (1975). An Algorithm for the Binomial Distribution with dependent trials. J. Amer. Statist. Assoc., 70, 333-340. – Roman Feb 15 '15 at 19:57
  • The correlation coefficient isn't enough to determine the distribution. See the comment thread at https://stats.stackexchange.com/a/285008/919, for instance. If you could explain what your variables represent, that might give us some clues about their joint distribution. – whuber Nov 22 '20 at 20:12

0 Answers0