4

I have random sets of $N$ random 32-bit strings, where all bits are i.i.d. with $\mathbb{P}(0) = \mathbb{P}(1) = 1/2$. Define
$\ \ \ \ $weight( 32-bit x ) = number of 1 bits in x, i.e. Hamming distance to 0
$\ \ \ \ $minweight( set $X$ ) = min$_{\text{x} \in X}$ weight( x )

How does the average minweight( $X$ ) vary with $N$, the size of $X$ ?

My $N$ is small enough that it doesn't matter whether $X$ has duplicates, or not.
A back-of-the envelope estimate would be fine.

(A harder question is to analyze the algorithm for bit-string-nearest-neighbour-searching on stackoverflow, but that seems rather specialized. If anyone could suggest another forum or reference, I'd appreciate it.)

denis
  • 3,187
  • 20
  • 34
  • @cardinal, thanks. I had thought of that -- weight( 32 bits ) ~ Normal( 16 +- 2.8 ) -- but then for N = a million thought that E normal^N would be inaccurate; in fact I get 2.2 +- .5 vs. Monte-Carlo ~ 3.5 bits, dunno. A direct, intuitive way would be nice. – denis Apr 20 '12 at 16:10

1 Answers1

5

Given the small length of the strings of interest, providing a (computable) exact solution is straightforward.

As noted in the comments, if $X$ is a random bit-string of length 32, then the weight $$ W = \mathrm{weight}(X) \>, $$ the Hamming distance with respect to zero, is just the sum of the digits and so is binomial with parameters $n = 32$ and $p = 1/2$.

Distribution of the minimum

Now, recall that if $W_1,\ldots,W_n$ is a random sample of size $n$ from distribution $F$, then $$ \renewcommand{\Pr}{\mathbb P} \Pr\left( \min_{1\leq i \leq n} W_i > w\right) = (1 - F(w))^n \>. $$

In our case, each $W_i$ is discrete and so it suffices to consider the probability of each atom.

Let $\newcommand{\Wo}{W_{(1)}}\Wo$ denote the minimum of the sample of size $n$. Then, $\Wo \in \{0,\ldots,32\}$, and so $$ \Pr(\Wo = w) = \Pr(\Wo > w - 1) - \Pr(\Wo > w) = (1-F(w-1))^n - (1-F(w))^n \>. $$

Plugging in $F(w) = 2^{-32} \sum_{k=0}^w {32 \choose k}$ will yield the desired probabilities.

Note that this results extends easily to the case of general $n$ and $p$, as well.

A note on the expected value

The expected value can readily be computed by the standard formula $$ \mathbb E \Wo = \sum_{w=0}^{32} w \Pr(\Wo = w) \>, $$ though to get a rough estimate, we can note that $$ \mathbb E \Wo \leq 32 \Pr(\Wo > 0) = 32 (1 - 2^{-32})^n \>, $$ so the expected value converges to zero geometrically fast.

cardinal
  • 24,973
  • 8
  • 94
  • 128