If there are multiple possible approximations, I'm looking for the most basic one.
2 Answers
You can approximate it with the multivariate normal distribution in the same way that binomial distribution is approximated by univariate normal distribution. Check Elements of Distribution Theory and Multinomial Distribution pages 15-16-17.
Let $P=(p_1,...,p_k)$ be the vector of your probabilities. Then the mean vector of the multivariate normal distribution is $ np=(np_1,np_2,...,np_k)$. The covariance matrix is a $k \times k$ symmetric matrix. The diagonal elements are actually the variance of $X_i$'s; i.e.$ np_i(1-p_i)$, $i=1,2...,k$. The off-diagonal element in the ith row and jth column is $\text{Cov}(X_i,X_j)=-np_ip_j$, where $i$ is not equal to $j$.
-
I guess my statistical sophistication is not enough to connect the dots in this answer. If I have the sample size n and the probabilities P, how do I calculate the mean vector and the covariance matrix of the multivariate normal distribution? – ericstalbot Aug 17 '12 at 20:35
-
1Check out the 2nd reference. – Stat Aug 17 '12 at 21:06
-
3Stat, so that this answer can stand by itself (and be resistant to link rot), would you mind giving a summary of the solution? – whuber Aug 17 '12 at 21:32
-
Let P=(p_1,...,p_k) be the vector of your probabilities. Then the mean vector of the multivariate normal distribution is np=(np_1,np_2,...,np_k). The covariance matrix is a k X k symmetric matrix. The diagonal elements are actually the variance of X_i's i.e. np_i(1-p_i), i=1,2...,k. The off diagonal element in the ith row and jth column is Cov(X_i,X_j)=-np_ip_j, where i is not equal to j. – Stat Aug 17 '12 at 22:38
-
4Does this need a continuity correction? How would you apply it? – Jack Aidley May 23 '14 at 11:04
-
2The covariance matrix is not positive definite, but rather positive semi-definite, and is not full-rank. This makes the resulting multinormal distribution undefined. This is the problem I faced. Any idea how to handle it? – Mohammad Alaggan Sep 13 '16 at 12:29
-
2@M.Alaggan: The mean/covariance matrices defined here have one minor issue: For a multinomial distribution with $k$ variables, the equivalent multivariate normal has $k-1$ variates. This is evident in the simple binomial example, which is approximate by the (ordinary) normal distribution. For further discussion, see Example 12.7 of [Elements of Distribution Theory](https://www.amazon.com/dp/1107630738). – M.S. Dousti Jul 21 '17 at 17:34
-
@M.S.Dousti in the case of the binomial example you get $$\Sigma = \begin{bmatrix} n p_1(1-p_1)&-n p_1p_2 \\ -np_1p_2& np_2(1-p_2)\end {bmatrix}$$ which is a bivariate variable (number of successes and failures) that has it's components fully correlated $$\frac{-np_1p_2}{\sqrt {np_1 (1-p_1)np_2 (1-p_2)}}=-1$$ where you need to use $p_1+p_2=1$ So, I consider this not so much an issue. It only means that the distribution in k variables are in a hyperplane of the k-dimensional space and could be reduced to k-1 variables, but it does not make the expression in terms of k variables a problem/issue. – Sextus Empiricus Sep 02 '19 at 19:24
-
I'd say that the distribution restricted to the hyperplane is not _exactly_ trivial, but the result is established enough that it's on wikipedia: see https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case – stephematician Mar 17 '20 at 03:54
The density given in this answer is degenerate, and so I used the following to calculate the density that results from the normal approximation:
There's a theorem that says given a random variable $X = [X_1, \ldots, X_m]^T \sim \text{Multinom}(n, p)$, for an $m$-dimensional vector $p$ with $\sum_i p_i = 1$ and $\sum_i X_i = n$, that;
$$ X \xrightarrow{d} \sqrt{n} \, \text{diag}(u) \, Q \begin{bmatrix} Z_1 \\ \vdots \\ Z_{m-1} \\ 0 \end{bmatrix} + \begin{bmatrix} n p_1 \\ \vdots \\ n p_m \end{bmatrix}, $$
for large $n$, given;
- a vector $u$ with $u_i = \sqrt{p_i}$;
- random variables $Z_i \sim N(0,1)$ for $i = 1, \ldots, m-1$, and;
- an orthogonal matrix $Q$ with final column $u$.
That is to say, with some rearrangement, we can work out an $m-1$ dimensional multivariate normal distribution for the first $m-1$ components of $X$ (which are the only interesting components because $X_m$ is the sum of the others).
A suitable value of the matrix $Q$ is $I - 2 v v^T$ with $v_i = (\delta_{im} - u_i) / \sqrt{2(1 - u_m)}$ - i.e. a particular Householder transformation.
If we restrict the left-hand side to the first $m-1$ rows, and restrict $Q$ to its first $m-1$ rows and $m-1$ columns (denote these $\hat{X}$ and $\hat{Q}$ respectively) then:
$$ \hat{X} \xrightarrow{d} \sqrt{n} \text{diag}(\hat{u}) \hat{Q} \begin{bmatrix} Z_1 \\ \vdots \\ Z_{m-1} \end{bmatrix} + \begin{bmatrix} n p_1 \\ \vdots \\ n p_{m-1} \end{bmatrix} \sim \mathcal{N} \left( \mu, n \Sigma \right), $$
for large $n$, where;
- $\hat{u}$ denotes the first $m-1$ terms of $u$;
- the mean is $\mu = [ n p_1, \ldots, n p_{m-1}]^T$, and;
- the covariance matrix $n \Sigma = n A A^T$ with $A = \text{diag}( \hat{u} ) \hat{Q}$.
The right hand side of that final equation is the non-degenerate density used in calculation.
As expected, when you plug everything in, you get the following covariance matrix:
$$ (n\Sigma)_{ij} = n \sqrt{p_i p_j} (\delta_{ij} - \sqrt{p_i p_j}) $$
for $i,j = 1, \ldots, m-1$, which is exactly the covariance matrix in the original answer restricted to its first $m-1$ rows and $m-1$ columns.
This blog entry was my starting point.

- 91,027
- 3
- 150
- 376

- 181
- 8
-
1Another useful resource is the links provided in: https://stats.stackexchange.com/questions/2397/asymptotic-distribution-of-multinomial – stephematician Mar 17 '20 at 05:21
-
1Good answer (+1) --- Note that you can embed links with the syntax ```[textual description](hyperlink)```. I have taken the liberty of editing this answer to embed your links. – Ben Mar 17 '20 at 06:09