Determining Population Size from Error

Question

Assume you have a set of unique values $V$ with a known distribution (e.g. Normal, Uniform) but unknown size $|V|$.

Assume you have function $O(t)$ that tells you the value $v \in V$ that is closest to the target value $t$

If you generate another (smallish) set values $T$ under the same distribution, I feel that you can determine an approximation of $|V|$, given $|T|$, and the RMS or something of $T-O(T)$

Is there a formula?

Example:

V = [0.090, 0.135, 0.288, 0.413, 0.434, 0.715, 0.717, 0.797, 0.841, 0.904] (actual set)
T =     [0.096, 0.767, 0.900] (generated values)
O(T) =  [0.090, 0.797, 0.904] (closest values to T in V)
error = [0.006, 0.030, 0.004]

The assumption is that as $V$ grows, the error will shrink.

My values come from a Bernoulli$(1/2)$ distribution. $V$ is large enough that there is at least one $0$ and one $1$ in $V$. Thus, $O(t)$ is $0$ when $t\lt 1/2$, $1$ when $t\gt 1/2$, and indeterminate when $t=1/2$. Since this will be the case no matter how large $V$ is, how do you suppose $|V|$ could possibly be estimated? — whuber, Sep 18 '17 at 20:36
What is $t$, the target value? Are you estimating something from a distribution? Perhaps you can include one concrete example and show how it works, and then ask for a generalisation. — Gijs, Sep 18 '17 at 20:41
@Gijs $t$ can be any value. $O(t)$ just magically polls $V$ for the nearest value. $T$ is a generated set with the same distribution as $V$ to test $V$ against. — guest, Sep 18 '17 at 21:00

score 0 · Answer 1 · answered Sep 19 '17 at 08:22

0

If you know the distribution of $V$, find the distribution of the error of $O(t)$, which is indeed a formula of $N$, then given the observed errors you may try and derive the maximum likelihood estimator for $N$.

The distribution of the error of $O(t)$ is not easy I would think but you can start with this question and it's answers: Spacings between discrete uniform random variables.

answered Sep 19 '17 at 08:22

Gijs

3,409
11
18

1

could you post the formula please, at least for uniform or normal distributions? – guest Sep 19 '17 at 16:37

score 0 · Answer 2 · answered Sep 19 '17 at 16:55

0

In the absense of a better answer I will use the following

Given Uniform distribution: $|V| \approx$ (1/error) - 1

this conclusion was reached by looking at ideally distributed sets, and their element's distance from an extremity:

[0.5] 1/2 => 1
[0.33, 0.66] 1/3 => 2
[0.25, 0.50, 0.75] 1/4 => 3
[0.2, 0.4, 0.6, 0.8] 1/5 => 4

answered Sep 19 '17 at 16:55

guest

111
3

What is "error" in this answer? How is it computed from the $|T|$ ordered pairs $(t_i, O(t_i)),\ t_i\in T$ generated by your procedure? – whuber Sep 19 '17 at 17:31
MAE or RMSE should both be fine. I'm not sure which is better though – guest Sep 20 '17 at 14:48

Determining Population Size from Error

2 Answers2