4

I have two approaches for data sampling:

  1. Sampling from a uniform distribution in $[0, 1]$ and rejecting values outside a certain limit, i.e. $0.50<p<0.51$.
  2. Sampling from a uniform distribution in $(0.50,0.51)$.

Having seen the plots of both ways, it seems they follow the same distribution, but how do I mathematically prove it? If they do follow the same distribution, what might the distribution be? Is it $\mathcal{U}(0.50,0.51)$?

I'm particularly interested in a rigorous explanation, but an intuitive one is also fine.

Note:

You may also refer to these two posts from Stack Overflow since they're both very related and things which motivate me to ask this question:

  1. Is Excel VBA's Rnd() really this bad?
  2. How to fix broken formats when exporting R output to a TXT file?
  • You prove it by a fairly simple argument involving conditional probability – Glen_b Aug 31 '16 at 12:59
  • @Glen_b Would you be so kind as to put your comment (please elaborate a bit more) in the answer? – Anastasiya-Romanova 秀 Aug 31 '16 at 13:17
  • Frankly, the request for a proof - particularly for such an elementary case - looked rather like [routine bookwork](http://stats.stackexchange.com/tags/self-study/info), as might be set for an assignment, homework etc ... which is why I mentioned an approach but did not give the proof. In what context do you require a proof outside of that? – Glen_b Aug 31 '16 at 22:38
  • This is just a matter of curiosity and I can assure you this is **not** an assignment or homework. Besides, I need scientific references to fix my answer in link 1 since using sampling no 2 is in Excel's favour. – Anastasiya-Romanova 秀 Sep 01 '16 at 01:41
  • hi I've asked a question on MSE would you mind taking a look (nothing to do with this question though)? http://math.stackexchange.com/q/1935624/291503 – dontloo Sep 22 '16 at 10:35
  • @dontloo It's either $$\frac{dL}{dg}=\frac{f(g)}{\frac{dg}{dx}}$$ or $$\frac{dL}{dg}=\int\frac{df}{dg}\ dx$$ – Anastasiya-Romanova 秀 Sep 26 '16 at 02:30
  • @Anastasiya-Romanova秀 thank you lots, now I understand the first by the accepted answer under that question, how is the second notation derived? – dontloo Oct 11 '16 at 08:41
  • @dontloo Please refer to this: https://en.wikipedia.org/wiki/Leibniz_integral_rule – Anastasiya-Romanova 秀 Oct 11 '16 at 09:32
  • @Anastasiya-Romanova秀 awesome – dontloo Oct 11 '16 at 09:41
  • @Anastasiya-Romanova秀 hi It's been a while but I just took another look at your answer to my question, the first formula is treating $\int f(g(x))dx$ as an indefinite integral while the second takes it as a definite integral, am I right? – dontloo Feb 14 '17 at 08:36

2 Answers2

3

In both cases you sample from $\mathcal{U}(0.50,0.51)$ distribution.

In fist case you sample uniformly from $\mathcal{U}(0,1)$ and reject the values outside the $(0.50,0.51)$ range. In second case you sample uniformly from $\mathcal{U}(0.50,0.51)$. In both cases you would sample values of interest uniformly. Rejecting the values outside $(0.50,0.51)$ have no influence on the values in the interval, so there is no reason why both methods should differ.

Uniform distribution over $(a,b)$ is constant for all the values within the interval, so also in the $(a', b')$ interval where $a' \ge a$ and $b' \le b$.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Is there a test statistic like p-value or a proper procedure to prove they do follow $\mathcal{U}(0.50,0.51)$? – Anastasiya-Romanova 秀 Aug 31 '16 at 12:43
  • 1
    You need only to prove that it fits the range of interest (in both cases it does by design) and that it is uniform -- there are multiple tests for uniformity check http://stats.stackexchange.com/questions/40384/fake-uniform-random-numbers-more-evenly-distributed-than-true-uniform-data or http://math.stackexchange.com/questions/2435/is-there-a-simple-test-for-uniform-distributions or http://stackoverflow.com/questions/24409639/prove-a-random-generated-number-is-uniform-distributed etc. – Tim Aug 31 '16 at 12:48
  • 2
    You can prove they are the same distribution because their density functions are identical. Namely, zero outside [0.5,0.51] and constant (which must therefore be 100) within that interval. – JDL Aug 31 '16 at 13:52
1

Think about a set of data, e.g. grains of salt uniformly spread over 1 square meter, and you measure their size.

On the first proposal you take in the whole set, and then reject values outside your desired target range. That sub-set follows an uniform distribution.

On the second proposal you take a sub-set inside your desired target range as sample, those also follow an uniform distribution.

Measurement does not affect the data, neither sampling, even if the disteibution is not uniform "if the sample is big enough to represent well the whole set".

JStark
  • 11
  • 1
  • Thanks for the intuitive explanation though your example is a case of the bivariate uniform distribution. Here I'm talking about an univariate uniform distribution. (+1) – Anastasiya-Romanova 秀 Sep 13 '16 at 01:59