1

Assume you have a set of unique values $V$ with a known distribution (e.g. Normal, Uniform) but unknown size $|V|$.

Assume you have function $O(t)$ that tells you the value $v \in V$ that is closest to the target value $t$

If you generate another (smallish) set values $T$ under the same distribution, I feel that you can determine an approximation of $|V|$, given $|T|$, and the RMS or something of $T-O(T)$

Is there a formula?

Example:

V = [0.090, 0.135, 0.288, 0.413, 0.434, 0.715, 0.717, 0.797, 0.841, 0.904] (actual set)
T =     [0.096, 0.767, 0.900] (generated values)
O(T) =  [0.090, 0.797, 0.904] (closest values to T in V)
error = [0.006, 0.030, 0.004]

The assumption is that as $V$ grows, the error will shrink.

guest
  • 111
  • 3
  • 1
    My values come from a Bernoulli$(1/2)$ distribution. $V$ is large enough that there is at least one $0$ and one $1$ in $V$. Thus, $O(t)$ is $0$ when $t\lt 1/2$, $1$ when $t\gt 1/2$, and indeterminate when $t=1/2$. Since this will be the case no matter how large $V$ is, how do you suppose $|V|$ could possibly be estimated? – whuber Sep 18 '17 at 20:36
  • What is $t$, the target value? Are you estimating something from a distribution? Perhaps you can include one concrete example and show how it works, and then ask for a generalisation. – Gijs Sep 18 '17 at 20:41
  • @Gijs $t$ can be any value. $O(t)$ just magically polls $V$ for the nearest value. $T$ is a generated set with the same distribution as $V$ to test $V$ against. – guest Sep 18 '17 at 21:00

2 Answers2

0

If you know the distribution of $V$, find the distribution of the error of $O(t)$, which is indeed a formula of $N$, then given the observed errors you may try and derive the maximum likelihood estimator for $N$.

The distribution of the error of $O(t)$ is not easy I would think but you can start with this question and it's answers: Spacings between discrete uniform random variables.

Gijs
  • 3,409
  • 11
  • 18
0

In the absense of a better answer I will use the following

Given Uniform distribution: $|V| \approx$ (1/error) - 1

this conclusion was reached by looking at ideally distributed sets, and their element's distance from an extremity:

[0.5] 1/2 => 1
[0.33, 0.66] 1/3 => 2
[0.25, 0.50, 0.75] 1/4 => 3
[0.2, 0.4, 0.6, 0.8] 1/5 => 4
guest
  • 111
  • 3
  • What is "error" in this answer? How is it computed from the $|T|$ ordered pairs $(t_i, O(t_i)),\ t_i\in T$ generated by your procedure? – whuber Sep 19 '17 at 17:31
  • MAE or RMSE should both be fine. I'm not sure which is better though – guest Sep 20 '17 at 14:48