0

I have an empirical joint distribution function

$ \hat{F}(x_1,x_2) = Pr(X_1 < x_1, X_2 < x_2) $

Can I generate bivariate random number from this distribution with a certain condition such as $ X_1 > c_1, X_2 > c_2 $ ?

user67275
  • 871
  • 1
  • 8
  • 27
  • Draw from $\hat F$ and reject if $X_1 \leq c_1$ and $X_2 \leq c_2$... – Tim Oct 22 '15 at 11:01
  • @Tim That would work provided the chance of rejection is not too great. The best solution would depend on what $\hat F$ is and what algorithms are available to draw from it and its truncated version. – whuber Oct 22 '15 at 15:36
  • @whuber if its empirical distribution and we do not know nothing more about it, then it seems to be available solution - but maybe OP can edit for more details? – Tim Oct 22 '15 at 18:43
  • 1
    @Tim If this is truly an empirical distribution then *much* more efficient methods are available. For instance, simply drop all observations with $X_1\le c_1$ or $X_2\le c_2$ from the data and use what's left as the empirical distribution to sample from. – whuber Oct 22 '15 at 19:26
  • @whuber yes, I agree, I would draw those values just like you said, but it is equivalent with rejecting the truncated values. – Tim Oct 22 '15 at 20:07
  • 1
    @Tim "Equivalent" mathematically, but not computationally! – whuber Oct 22 '15 at 20:29
  • @whuber Now it seems quite easy to condition on (X1>c1,X2>c2). But my true question is how to sample from F(x_1,x_2) – user67275 Oct 23 '15 at 00:04
  • That's not what you are asking, though. What really is your question? – whuber Oct 23 '15 at 01:53
  • @whuber Once I can get a random sample from F(x_1,x_2), then it seems easy to condition on X_1>c_1, X_2>c_2. So, I only need to find a method to random sample from empirical distribution F(x_1,x_2) – user67275 Oct 23 '15 at 02:42
  • @user67275 check my answer and the link I refer to. Empirical distribution is a discrete distribution, so you sample the same way as you would sample from any discrete distribution, just instead of sampling single values, you sample the $(x_1,x_2)$ pairs. – Tim Oct 23 '15 at 08:39
  • If that's your question, then I believe you will find many good answers at http://stats.stackexchange.com/questions/26858. Does that help? – whuber Oct 23 '15 at 14:06

1 Answers1

2

Assuming that the only thing that you have is an empirical distribution, the simplest way to go is to draw values from $\hat F$ and reject if $X_1 \leq c_1$ and $X_2 \leq c_2$. Less naive implementation would be to subset $(X_1, X_2)$ values so to drop the values below threshold and draw from $\hat F_\text{trunc}$, the same way as you would do with any other discrete distribution. Simple example in R of such approach can be find below.

set.seed(123)

c1 <- -1
c2 <- 0.5
X <- data.frame(X1 = rnorm(100), X2 = rnorm(100)) # creating fake data
X_trunc <- subset(X, X1 > c1 & X2 > c2) # subset

# draw 1000 samples
X_trunc[sample(nrow(X_trunc), 1000, replace = TRUE), ]

The above example assumes that you have the full data, however if the only thing that you have is the empirical distribution tables with probabilities for $(x_1,x_2)$, than the procedure is the same but you draw the $(x_1,x_2)$ pairs with $\hat F(x_1,x_2)$ probabilities as in the example below.

library(dplyr)

# lets calculate the probabilities for x1,x2 pairs
FX <- group_by(X, X1, X2) %>%
  summarise(n = n()) %>%
  ungroup() %>%
  mutate(prob = n/sum(n))

# next, we subset and sample as above but from F(X1,X2)
FX_trunc <- subset(FX, X1 > c1 & X2 > c2)

# notice that here we sample with parameter prob set to F(x1,x2)
FX_trunc[sample(nrow(FX_trunc), 1000, replace = TRUE, prob = FX_trunc$prob), ]

Drawing from bivariate distribution does not differ in here from drawing from univariate distribution, the values to be drawn are pairs, or more precisely indexes for those pairs.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • I appreciate your response, but it does not answer my question. You generated random numbers from normal distribution, but I want to generate them from an empirical (bivariate) distribution function. – user67275 Oct 23 '15 at 08:50
  • @user67275 I used normal distribution to draw the numbers but it does not matter, it is just an example, `X` can be *any* bivariate distribution. Check my edit where I show in greater detail how to deal with $\hat F$ if you do not have the data but only the table of probabilities. – Tim Oct 23 '15 at 08:55
  • If you have the tables of probabilities then you can recover the full data from them and apply the first method. – whuber Aug 18 '16 at 14:44