Let's assume I have a small sample (say 20 data points) and I'm interested in estimating the true proportion of the population based on this sample. My data looks as below (assume 20 Orange balls and zero Red balls in our sample, however, the population may contain Red balls but we found zero in our sample). The results of the code below is 0 0
(upper and lower confidence levels) since we have zero red balls in our sample (see this document regarding the code below). My question is based on this analysis can we say we are 95% confidence that proportion value in the population equal to zero? can this results be trusted? any other alternative methods can be used to address similar case?
#my data
my_sample = c(rep(1, 0), rep(0, 20))
#boot function see code below
my_sample.boot = boot.mean(my_sample, 1000, binwidth = 1/30)
#boot function
boot.mean = function(x,B,binwidth=NULL) {
n = length(x)
boot.samples = matrix( sample(x,size=n*B,replace=TRUE), B, n)
boot.statistics = apply(boot.samples,1,mean)
se = sd(boot.statistics)
require(ggplot2)
if ( is.null(binwidth) )
binwidth = diff(range(boot.statistics))/30
p = ggplot(data.frame(x=boot.statistics),aes(x=x)) +
geom_histogram(aes(y=..density..),binwidth=binwidth) +
geom_density(color="red")
plot(p)
interval = mean(x) + c(-1,1)*2*se
print( interval )
return( list(boot.statistics = boot.statistics, interval=interval, se=se,
plot=p) )
}