Confidence intervals for the probabilities of each outcome in a multinomial

Question

I have a survey that includes questions where you can choose one option from a list.

For example: "How often do you go to the supermarket?"

A. Every day
B. 2-6 times per week
C. Once per week or less
D. Never

Say 1000 people respond to the survey and 100 people choose option A, 400 option B, 400 option C and 100 option D. How do you calculate the confidence intervals of these proportions? Are they simply binomial confidence intervals (i.e. just normal proportions)? Or is it something more complicated, because the proportions are related to each other?

I think this answer suggests that each response can be considered as an independent binomial - e.g. A vs. (B or C or D), or B vs. (A or C or D). After someone edited the title of my question to something more useful (ie. including the word 'multinomial'), I was able to find a R package (https://cran.r-project.org/web/packages/MultinomialCI/MultinomialCI.pdf) and some academic references. In this case the confidence intervals are a bit different to those calculated using 4 binomial calculations. Though I am still learning so would be interested to see if others respond to the question. — Dan, Oct 16 '20 at 11:21
The package you link to calculates *simultaneous* confidence intervals. This means that the confidence intervals are calculated for all proportions jointly, instead of just for one proportion. — COOLSerdash, Oct 16 '20 at 11:43
That seems the right way to think about it? (Thinking about all of the proportions at the same time) — Dan, Oct 17 '20 at 12:14

Dan · Answer 1 · 2020-10-15T13:00:25.150

I tried bootstrapping confidence intervals (using R). The results are similar to standard binomial confidence intervals - which I think suggests that the standard confidence intervals are OK? (Though not sure of the theory behind that ...)

# make sample data

set.seed(4)
n <- 1000
d <- sample(1:4, size = n, replace = T, prob = c(0.1, 0.4, 0.4, 0.1))

# binomial confidence intervals

rbind(tabulate(d) / n, 
      sapply(table(d), function(x) prop.test(x, n)$conf.int[1:2]))

#              1         2         3         4
#[1,] 0.09300000 0.4260000 0.3780000 0.1030000
#[2,] 0.07606873 0.3951986 0.3479774 0.0851969
#[3,] 0.11313228 0.4573769 0.4089719 0.1239218

# bootstrapped confidence intervals

b <- 10000
bs <- sample(d, size = n * b, replace = T)
bs <- matrix(bs, ncol = b)
bs <- sapply(1:4, function(x) colSums(bs == x)) / n
bs <- apply(bs, 2, quantile, prob = c(0.025, 0.975))
rbind(tabulate(d) / n, bs)

#       [,1]  [,2]  [,3]  [,4]
#       0.093 0.426 0.378 0.103
# 2.5%  0.075 0.396 0.348 0.084
# 97.5% 0.111 0.456 0.408 0.122

Confidence intervals for the probabilities of each outcome in a multinomial

1 Answers1