How's the Garvan?
The problem is we don't know how many zero counts are observed. We have to estimate this. A classic statistical procedure for situations like this is the Expectation-Maximisation algorithm.
A simple example:
Assume we draw from an unknown population (of 1,000,000) with a poisson constant of 0.2.
counts <- rpois(1000000, 0.2)
table(counts)
0 1 2 3 4 5
818501 164042 16281 1111 62 3
But we don't observe the zero counts. Instead we observe this:
table <- c("0"=0, table(counts)[2:6])
table
0 1 2 3 4 5
0 164042 16281 1111 62 3
Possible frequencies observed
k <- c("0"=0, "1"=1, "2"=2, "3"=3, "4"=4, "5"=5)
Initialise mean of Poisson distribution - just take a guess (we know it's 0.2 here).
lambda <- 1
Expectation - Poisson Distribution
P_k <- lambda^k*exp(-lambda)/factorial(k)
P_k
0 1 2 3 4 5
0.367879441 0.367879441 0.183939721 0.061313240 0.015328310 0.003065662
n0 <- sum(table[2:6])/(1 - P_k[1]) - sum(table[2:6])
n0
0
105628.2
table[1] <- 105628.2
Maximisation
lambda_MLE <- (1/sum(table))*(sum(table*k))
lambda_MLE
[1] 0.697252
lambda <- lambda_MLE
Second iteration
P_k <- lambda^k*exp(-lambda)/factorial(k)
n0 <- sum(table[2:6])/(1 - P_k[1]) - sum(table[2:6])
table[1] <- n0
lambda <- (1/sum(table))*(sum(table*k))
population lambda_MLE
[1,] 361517.1 0.5537774
Now iterate until convergence:
for (i in 1:200) {
P_k <- lambda^k*exp(-lambda)/factorial(k)
n0 <- sum(table[2:6])/(1 - P_k[1]) - sum(table[2:6])
table[1] <- n0
lambda <- (1/sum(table))*(sum(table*k))
}
cbind( population = sum(table), lambda_MLE)
population lambda_MLE
[1,] 1003774 0.1994473
Our population estimate is 1003774 and our poisson rate is estimated at 0.1994473 - this is the estimated proportion of the population sampled. The main problem you will have in the typical biological problems you are dealing with is assumption that the poisson rate is a constant.
Sorry for the long-winded post - this wiki is not really suitable for R code.