2

I am building an experimental design with 4 variables defined on (0,1). In notation, $x_i \in [0,1]$ with $ i=1,..., 4$). Two of these variables must satisfy the condition that $x_1 + x_2 \leq 1$. How can I perform Latin Hypercube Sampling with this condition?

I thought about rejection sampling when $x_1+x_2 > 1$, but realize that rejection sampling does not work with Latin hypercube sampling.

R Carnell
  • 2,566
  • 5
  • 21
rms
  • 23
  • 4
  • 1
    After some research, I came to realize that the editions by @rcarnell describe my question much more clearly than I could express initially. Thanks for the support – rms Jul 09 '20 at 17:57

2 Answers2

2

Strategy:

  1. Draw $X_1, ..., X_5$ from a uniform LHS
  2. Transform $X_1, X_2, X_3$ such that $X_1+X_2+X_3=1$ using the strategy I explained previously for R. The basic idea is to transform the marginal draws using the quantiles of gamma functions, then normalize those gamma quantiles. The result is a distribution like a Dirichlet distribution (although not exactly).
  3. Drop $X_3$ since it is not necessary. If $X_1+X_2+X_3=1$ and $X_i > 0$ then $X_1 + X_2 < 1$.
  4. Transform $X_4$ and $X_5$ to the desired distribution
require(lhs)

qdirichlet <- function(X, alpha)
{
  # qdirichlet is not an exact quantile function since the quantile of a
  #  multivariate distribtion is not unique
  # qdirichlet is also not the quantiles of the marginal distributions since
  #  those quantiles do not sum to one
  # qdirichlet is the quantile of the underlying gamma functions, normalized
  # This has been tested to show that qdirichlet approximates the dirichlet
  #  distribution well and creates the correct marginal means and variances
  #  when using a latin hypercube sample
  lena <- length(alpha)
  stopifnot(is.matrix(X))
  sims <- dim(X)[1]
  stopifnot(dim(X)[2] == lena)
  if(any(is.na(alpha)) || any(is.na(X)))
    stop("NA values not allowed in qdirichlet")
  
  Y <- matrix(0, nrow=sims, ncol=lena)
  ind <- which(alpha != 0)
  for(i in ind)
  {
    Y[,i] <- qgamma(X[,i], alpha[i], 1)
  }
  Y <- Y / rowSums(Y)
  return(Y)
}

set.seed(19753)
X <- randomLHS(500, 5)
Y <- X
# transform X1, X2, X3 such that X1 + X2 + X3 =1
# change the alpha parameter to change the mean of X1 and X2
Y[,1:3] <- qdirichlet(X[,1:3], rep(2,3))
# transform parameter 4 and 5 
Y[,4] <- qnorm(X[,4], 2, 1)
Y[,5] <- qunif(X[,5], 1, 3)
# drop the unncessary X3
Y <- Y[,-3]

# check that X1 + X2 < 1
stopifnot(all(Y[,1] + Y[,2] < 1.0))

# plots
par(mfrow = c(2,2))
for (i in c(1,2,4,5))
  hist(X[,i], breaks = 20, main = i, xlab = "")

par(mfrow = c(2,2))
for (i in 1:4)
  hist(Y[,i], breaks = 20, main = i, xlab = "")
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
R Carnell
  • 2,566
  • 5
  • 21
  • thank you for the answer, @RCarnell ! Just a small additional question: Is it possible to get a uniform distribution by changing the line `Y[,1:3] – rms Jul 11 '20 at 01:32
  • No. With the regular Dirichlet distribution and with this `qdirichlet` function, if you specify $\alpha = (1,1)$ you get two uniform distributions. If you specify $\alpha = (1,1,1)$, you get "flat" distributions, but not uniform. They are decreasing in density from left to right. I don't agree with the answer that was given to the question you cited, I'll take a look at responding. Please accept my answer if it was helpful. – R Carnell Jul 12 '20 at 19:08
  • could you indicate reference(s) I can use to support the usage of this approach? The objective is both self-learning and writing a methodology section. – rms Jul 16 '20 at 22:10
  • 1
    For Latin hypercubes: Large Sample Properties of Simulations Using Latin Hypercube Sampling, Stein, 1987, Technometrics. For the fact that a Dirichlet distribution can be simulated by drawing gamma distributions and dividing by the sum, https://web.archive.org/web/20150219021331/https://www.ee.washington.edu/techsite/papers/documents/UWEETR-2010-0006.pdf. For the qdirichlet function, just me! – R Carnell Jul 17 '20 at 01:39
0

In order to implement the strategy described by @RCarnell in python, this is a translation of the function qdirichlet. The usage is similar to the one presented in the original answer

def dirichlet_ppf(X, alpha):
    # dirichlet_ppf is not an exact quantile function since the quantile of a
    #  multivariate distribtion is not unique
    # dirichlet_ppf is also not the quantiles of the marginal distributions since
    #  those quantiles do not sum to one
    # dirichlet_ppf is the quantile of the underlying gamma functions, normalized
    # This has been tested to show that dirichlet_ppf approximates the dirichlet
    #  distribution well and creates the correct marginal means and variances
    #  when using a latin hypercube sample
    #
    # Python translation of qdirichlet function by  R. Carnell
    # original: https://stats.stackexchange.com/a/476433/244679
    import numpy as np
    from scipy.stats import gamma
        
    X = np.asarray(X)
    alpha = np.asarray(alpha)
    
    assert alpha.ndim == 1, "parameter alpha must be a vector"
    assert X.ndim == 2, "parameter X must be an array with samples as rows and variables as columns"
    assert X.shape[1] == alpha.shape[0], "number of variables in each row of X and length of alpha must be equal"
    assert not (np.any(np.isnan(X)) or np.any(np.isnan(alpha))), "NAN values are not allowed in dirichlet_ppf"
    
    Y = np.zeros(shape=X.shape)
    for idx, a in enumerate(alpha):
        if a != 0. :
            Y[:, idx] = gamma.ppf(X[:, idx], a)
    
    return Y / Y.sum(axis=1)[:, np.newaxis]
rms
  • 23
  • 4
  • @RCarnell, I've translated your R code to Python. Please check if the credits are ok. – rms Jul 16 '20 at 22:11