0

Let's assume I have a small sample (say 20 data points) and I'm interested in estimating the true proportion of the population based on this sample. My data looks as below (assume 20 Orange balls and zero Red balls in our sample, however, the population may contain Red balls but we found zero in our sample). The results of the code below is 0 0 (upper and lower confidence levels) since we have zero red balls in our sample (see this document regarding the code below). My question is based on this analysis can we say we are 95% confidence that proportion value in the population equal to zero? can this results be trusted? any other alternative methods can be used to address similar case?

#my data
my_sample = c(rep(1, 0), rep(0, 20))
#boot function see code below
my_sample.boot = boot.mean(my_sample, 1000, binwidth = 1/30)

#boot function
boot.mean = function(x,B,binwidth=NULL) {
  n = length(x)
  boot.samples = matrix( sample(x,size=n*B,replace=TRUE), B, n)
  boot.statistics = apply(boot.samples,1,mean)
  se = sd(boot.statistics)
  require(ggplot2)
  if ( is.null(binwidth) )
    binwidth = diff(range(boot.statistics))/30
  p = ggplot(data.frame(x=boot.statistics),aes(x=x)) +
    geom_histogram(aes(y=..density..),binwidth=binwidth) + 
geom_density(color="red")
  plot(p)
  interval = mean(x) + c(-1,1)*2*se
  print( interval )
  return( list(boot.statistics = boot.statistics, interval=interval, se=se, 
plot=p) )
}
Curious
  • 103
  • 1
  • 1
    If this is a binomial experiment you can construct a one-sided exact confidence interval or apply the rule of three. Check other posts that discuss this. – Michael R. Chernick May 05 '17 at 03:20
  • Yes this is a binomial experiment and the estimated minimum sample size to meet 95% confidence level is 250 (based on normal approximation to binomial) however not enough data available to meet this sample size and this is why I considered bootstrapping. – Curious May 05 '17 at 03:25
  • Sample size depends on the required width of the interval. you can get away with less if you allow a wider interval and Clopper-Pearson method and rule of three give more accurate intervals in the case of 0 failures. – Michael R. Chernick May 05 '17 at 03:35

1 Answers1

1

You have a rare event that doesn't occur in your sample. You can't use bootstrap to infer anything about this rare event. You'd need to increase your sample size so that the rare event is actually part of the sample if you want to use bootstrap and even then you shouldn't use bootstrap for samples with extremely low number of successes.

Furthermore, I don't understand why you use the normal approximation to create a symmetric confidence interval. You can use the bootstrap resample means to construct a confidence interval:

#larger sample
my_sample = c(rep(1, 1), rep(0, 200))

boot.mean = function(x,B,binwidth=NULL) {
  n = length(x)
  boot.samples = matrix( sample(x,size=n*B,replace=TRUE), B, n)
  boot.statistics = apply(boot.samples,1,mean)
  se = sd(boot.statistics)
  interval = quantile(boot.statistics, c(0.025, 0.975))
  return( list(boot.statistics = boot.statistics, interval=interval, se=se) )
}

set.seed(42)
boot.mean(my_sample, 1e5)$interval
#      2.5%      97.5% 
#0.00000000 0.01492537 

or easier with the boot package:

library(boot)
set.seed(42)
boot.mean <- boot(my_sample, function(x, i) mean(x[i]), 
                  R = 1e5)

#confidence interval not constraint to be symmetric
boot.ci(boot.mean, type = "bca")
#Intervals : 
#  Level       BCa          
#95%   ( 0.0000,  0.0149 )  
#Calculations and Intervals on Original Scale

For further information regarding confidence intervals for binomial experiments and methods you should actually use see this excellent answer.

Roland
  • 5,758
  • 1
  • 28
  • 60
  • Thank you for your reply. I'm actually using Wilson score method not the normal approximation to binomial, need to update my comment above. For practical reasons we can't get more data to meet the minimum sample size this is why I'm looking into other methods. – Curious May 05 '17 at 09:42