1

I have the following code I am trying to figure out the distribution which it is sampling from and I need to be able to write a function to generalize sampling for this distn for any N.

set.seed(12032345)
N <- 300
x <- runif(150, 0,1)
x <- c(x, rnorm(150, 0, 1))
x <- sample(x, N, replace = FALSE)

My approach: To me it looks like we have a 50:50 mixture of U(0,1) and N(0,1) and then just sampling from that vector but since N is even would it make sense to generalize it by simulating uniform random variables and determining which half of the (0,1) interval it is in and sampling from U(0,1) if the variate is below 0.5 and sampling from N(0,1) if above?

Tim
  • 108,699
  • 20
  • 212
  • 390
user153009
  • 255
  • 2
  • 5
  • 2
    Wait... You have 300 samples and you sample 300 without replacement. So you just get the original samples back? – SmallChess Mar 20 '17 at 03:18
  • If you want to sample 300 mixed samples, I don't think you need the `sample` command at all. Randomly shuffle `x` is sufficient. – SmallChess Mar 20 '17 at 03:20

1 Answers1

3

As noted by Student T, your simulation code is wrong. What your code does it generates exactly 150 samples from standard normal distribution and exactly 150 samples from standard uniform distribution and then shuffles them.

set.seed(12032345)

N <- 1e5
x <- c(runif(N/2, 0,1), rnorm(N/2, 0, 1))
y <- sample(x, N, replace = FALSE)

all.equal(sort(x), sort(y))
## [1] TRUE

so the sample step does nothing but changing the order of the samples. So samples generated like this would be "too good to be true" for a mixture, since the mixing proportion would not vary.

If you want to sample from the mixture of standard normal and standard uniform distributions appearing with equal mixing proportions, you should also vary the sample mixing proportions, i.e. sample from each of the distributions with equal probability

res <- numeric(N)

for (i in 1:N) {
  if (runif(1) > 0.5)
    res[i] <- runif(1)
  else
    res[i] <- rnorm(1)
}

More hacky way is to observe that if the mixing proportion is $p$ and the sample size is $N$, then number of values drawn from uniform distribution will follow binomial distribution

k <- rbinom(1, N, 0.5)
res2 <- sample(c(runif(k, 0,1), rnorm(N-k, 0, 1)))

In above example sample is also useless, since it just shuffles the values.

See also the sampling from a mixture of two Gamma distributions thread.

Tim
  • 108,699
  • 20
  • 212
  • 390