2

Edit: I found a solution to the problem, which is at the bottom of the post. I'm going to leave the post as it is in case someone else encounters a similar problem!


I was banging my head to a solution to the problem in the title and came up with a solution that partly works, but I'm not really happy with the result. With the algorithm I will outline I do generate a distribution with the input median and intervals, but the shape is definetly wrong. Let me explain:

The problem

In my field, it is typical to use Markov Chain Monte Carlo (MCMC) algorithms in order to generate posterior samples for parameters of highly non-linear models given the data.

For reasons I don't yet understand, although it is custom to generate MCMC samples for the posteriors, it is not custom for the authors to share those posterior distributions but, instead, some summary statistics on the posterior samples of the parameters given the data are published; the most common of which is to share the median and the "$1\sigma$" limits around the median, which usually means that, if $\theta$ is the parameter of interest and we denote its median by $\mu$, then the published value $\mu^{+\sigma_1}_{-\sigma_2}$ means that

$$\int^{\mu+\sigma_1}_{\mu} p(\theta | D) = 0.34,$$

$$\int^{\mu}_{\mu-\sigma_2} p(\theta | D) = 0.34,$$

where $p(\theta | D)$ is the posterior distribution of the parameter $\theta$ given the data $D$.

The problem is that I need to sample from those posterior distributions, so I tried to come up with an efficient idea on how to sample from a skewed distribution given the values of the median ($\mu$) and the upper and lower "1-sigma" limits ($\sigma_1$ and $\sigma_2$).

Proposed solution

After some thought, I came to the following conclusion: if I draw a sample from this posterior, I have an equal probability ($1/2$) of sampling a point to the left and to the right of the median. If for some reason I want to sample a point to the left of the median, then I have two parameters to do so: a location parameter ($\mu$) and a scale parameter ($\sigma_2$), while the same applies if I sample a point to the right of the median. Given a location and scale parameter, the safest choice seems to be to draw samples from a gaussian distribution. Thus, I came up with the following algorithm:

  1. Flip a coin (i.e., pick a number at random from the set $\Omega: \{0,1\})$.
  2. If heads (i.e., you got $0$ from the set $\Omega$), then sample a value from the distribution $X\sim N(0,\sigma_1)$, and let $Y = |X|$. If tails, sample a value from the distribution $X\sim N(0,\sigma_2)$, and let $Y = - |X|$
  3. Samples from the target distribution are given by $Z = \mu + Y$.

A small python code to sample $n$ points from this distribution would be as follows:

def sample_from_errors(mu, sigma1, sigma2, n):
    # Flip n coins (0 = heads; sample from right gaussian, 1: tails; sample from left gaussian):
    coin_flip = np.random.random_integers(0,1,n)
    # Extract which of those are heads and the number of them:
    idx_heads = np.where(coin_flip==0)[0]
    n_heads = len(idx_heads)
    # Extract samples from the left gaussian:
    samples_left_gaussian = mu - np.abs(np.random.normal(0,sigma2,n-n_heads))
    # Same for the right one:
    samples_right_gaussian = mu + np.abs(np.random.normal(0,sigma1,n_heads))
    return np.append(samples_left_gaussian,samples_right_gaussian)

Same, in R:

sample_from_errors <- function(mu, sigma1, sigma2, n){

 # Flip n coins (0 = heads; sample from right gaussian, 1: tails; sample from left gaussian):
 coin_flip <- sample(c(0,1), n, replace = TRUE)
 # Extract which of those are heads and the number of them:
 n_heads = n-sum(coin_flip)
 # Extract samples from the left and right gaussians:
 samples_left_gaussian = mu - abs(rnorm(n-n_heads, mean = 0, sd = sigma2))
 samples_right_gaussian = mu + abs(rnorm(n_heads, mean = 0, sd = sigma1))
 return (c(samples_left_gaussian,samples_right_gaussian))

}

Does it work?

The thing is that the above mentioned algorithm does work, in the sense that if you generate samples using the algorithm and calculate the median and $1-\sigma$ limits as defined in the introduction to this problem, you retrieve the original input values. The problem is the shape of the distribution that you get. For example, if you plot the histogram given $\mu = 10$, $\sigma_1 = 2$ and $\sigma_2 = 10$, you get the following ($n=10^6$ in this case):

Example of sampled values using the algorithm

(Red solid line is the input median, red dashed lines the $1-\sigma$ limits) You can clearly see that the heights of the values in the histograms are different around sampled values of ~10 (close to the input median) which makes total sense: the heights of the gaussians that I'm sampling from ($1/\sqrt{2\pi \sigma_i^2}$) are variance dependent, and that explains the different heights at each side; the lower the variance, the greater the height of the distribution.

One way to solve this would be to resample the right side of the distribution in order to match the left sides' height, but that would change the lower $1-\sigma$ limit and, thus, kill the whole purpose of the algorithm.

What other methods have I tried?

(This method didn't work in my original post due to a mistake I made in the optimization code. It now works nicely!)

I tried fitting skew-normal distributions given the CDFs values that I'm looking for (i.e., parametrize given the 0.15866, 0.5 and 0.84134 quantiles), and this works fine now. Here is some python code that attempts at trying to perform this:

import numpy as np
from scipy.integrate import quad
from scipy.optimize import leastsq

def skew_normal_cdf(x,mu,sigma,alpha):
    """
    This function simply calculates the CDF at x given the parameters 
    mu, sigma and alpha of a Skew-Normal distribution. It takes values or 
    arrays as inputs.
    """
    # In case the input x is an array:
    if type(x) is np.ndarray:
       out = np.zeros(len(x))
       for i in range(len(x)):
           # quad calculates the integral from -inf to x[i]:
           out[i] = quad(lambda x: skew_normal(x,mu,sigma,alpha), -np.inf, x[i])[0]
       return out

    else:
        return quad(lambda x: skew_normal(x,mu,sigma,alpha), -np.inf, x)[0]

def skew_normal(x, mu, sigma, alpha):
    """
    This function returns the value of the Skew Normal PDF at x, given
    mu, sigma and alpha
    """
    def erf(x):
        # save the sign of x
        sign = np.sign(x)
        x = abs(x)

        # constants
        a1 =  0.254829592
        a2 = -0.284496736
        a3 =  1.421413741
        a4 = -1.453152027
        a5 =  1.061405429
        p  =  0.3275911

        # A&S formula 7.1.26
        t = 1.0/(1.0 + p*x)
        y = 1.0 - (((((a5*t + a4)*t) + a3)*t + a2)*t + a1)*t*np.exp(-x*x)
        return sign*y

    def palpha(y,alpha):
        phi = np.exp(-y**2./2.0)/np.sqrt(2.0*np.pi)
        PHI = ( erf(y*alpha/np.sqrt(2)) + 1.0 )*0.5
        return 2*phi*PHI

    return palpha((x-mu)/sigma,alpha)*(1./sigma)

def fit_skew_normal(median,sigma1,sigma2):

    # Define the sign of alpha, which should be positive if right skewed
    # and negative if left skewed:
    alpha_sign = np.sign(sigma1-sigma2)

    """
    This function is an attempt to fit a Skew Normal distribution given
    the median, upper error bars (sigma1) and lower error bar (sigma2).
    """

    # First, define the residuals:
    def residuals(p, data, x):
        mu, sqrt_sigma, sqrt_alpha = p
        return y - model(x, mu, sqrt_sigma, sqrt_alpha)

    # Now define the model:
    def model(x, mu, sqrt_sigma, sqrt_alpha):
        """
        Model for the non-linear least-squares problem. 

        Note that we pass the square-root of the scale (sigma) and shape (alpha) parameters, 
        in order to define the sign of the former to be positive and of the latter to be fixed given
        the values of sigma1 and sigma2:
        """
        return skew_normal_cdf (x, mu, sqrt_sigma**2, alpha_sign * sqrt_alpha**2)

    # Define the observed quantiles:
    y = np.array([0.15866, 0.5, 0.84134])

    # Define the values at which we observe the quantiles:
    x = np.array([median - sigma2, median, median + sigma1])

    # Start assuming that mu = median, sigma = mean of the observed sigmas, and alpha = 0 (i.e., start from a gaussian):
    guess = (median, np.sqrt ( 0.5 * (sigma1 + sigma2) ), 0)

    # Perform the non-linear least-squares optimization:
    plsq = leastsq(residuals, guess, args=(y, x))[0]

    # Return the fitted parameters of the Skew Normal (mu, sigma and alpha):
    return plsq[0], plsq[1]**2, alpha_sign*plsq[2]**2

# Just to test the code, generate first a set of parameters observed medians, sigma1 and sigma2:
median = 10.
sigma1 = 2.
sigma2 = 10.

# Now try and search for the Skew Normal distribution that matches those 
# values:
plsq = fit_skew_normal(median,sigma1,sigma2)

# Print fitted values:
print 'Fitted params:',plsq

In this case, this code gives:

Fitted params: (14.70, 9.01, -28.25)

And the associated plot is:

Fit to a skew-normal given the quantiles

Which is good enough for my purposes I think!

Néstor
  • 3,717
  • 26
  • 37
  • 2
    I got lost near the beginning, because $\sigma$ does not seem to be a "standard deviation" at all. Are you simply asking for a family of distributions that can be parameterized by their $0.16, 0.50,$ and $0.84$ quantiles? (If so, almost any three-parameter family can be parameterized this way, suggesting you should offer additional criteria for choosing the family. For instance, how heavy do you want the tails to be?) – whuber Jan 12 '15 at 15:55
  • 1
    How would you "share" an MCMC posterior distribution with the general public?! – Xi'an Jan 12 '15 at 15:56
  • @whuber: one solution would be to ask for a family of distribution parametrized by their quantiles, yes. However, in the last part of the problem, I answered that I tried that approach and it doesn't work that well. Also, selecting how heavy I would want the tails to be is out of the question, because I don't have information to define that (you can't know just given the quantiles!). – Néstor Jan 12 '15 at 16:00
  • 1
    as @whuber stated, you cannot reconstruct the marginal posterior and even less the joint posterior from those bits of information. – Xi'an Jan 12 '15 at 16:01
  • @Xi'an: in the case of my field it would be super easy: simply share your posterior chains ($\sim 10^6$ points; we astronomers are used to that!). The number of parameters in this case are on the order of $\sim 10-20$. As you see, if I could get a look at the posterior chains, I would not only be able to retrieve several parameters to adjust an ad-hoc distribution, but would also be able to look at the correlations between the parameters, and consecuently be able to sample directly from the posterior instead of assuming things about the distributions of each parameter! – Néstor Jan 12 '15 at 16:02
  • @Xi'an: I know it can't be reconstructed exactly, that's also out of the question (and that's not even in my original question!). However, it would be neat to have the representation that assumes "less" information given the parameters that I have at hand. For example, in the case of values for which $\sigma_1 = sigma_2$, I'm happy to assume the posterior is close to a gaussian (it is the maximum entropy distribution given those bits of information, after all). However, the analogous function for uneven limits is unclear to me. – Néstor Jan 12 '15 at 16:08
  • Failing with an attempted solution is not evidence that you have formulated the question incorrectly! How, exactly, did using Gamma distributions and Skew-normal distributions fail? Did you not obtain distributions having the three desired quantiles? In other words, is the problem with the nature of the solution itself or does it lie in limitations of the algorithms you have applied? – whuber Jan 12 '15 at 16:18
  • 1
    BTW, @mpiktas has solved this problem, with working `R` code, at http://stats.stackexchange.com/a/6065. – whuber Jan 12 '15 at 16:25
  • @whuber: I'll post my attempt to that approach, but the limitations are basically on the algorithms that I have applied, which are basically different implementations of the Levenberg-Marquardt algorithm for non-linear least-squares. And yes; after failing that attempt I did saw the mpiktas solution, but tried to implement the same with the skew-normal and gamma distributions with the result that I quoted in my original question. – Néstor Jan 12 '15 at 17:03
  • 1
    You will have difficulties when the distribution family you specify is a poor fit to the actual posterior distribution. One way to gain enough flexibility in many cases is to add a location parameter. Both the three-parameter Gamma and Skew-normal distributions families should not cause problems. Could you provide some sample inputs that had caused the algorithm to fail? – whuber Jan 12 '15 at 17:07
  • I did now (the bottom part of my question implements my approach to fitting the Skew Normal). – Néstor Jan 12 '15 at 17:28
  • 1
    I think the problem may be that no Skew-normal distribution can have such quantiles. Skew-normal distributions just cannot get that skewed! It looks like some care is needed in choosing your family of distributions. – whuber Jan 12 '15 at 17:40
  • 1
    I made a mistake in the last part of my non-linear least squares fit. Now it works; I'm attaching the results. I suppose then that I don't have a question anymore...should I erase this, or leave it as it is? Thanks anyways for all the input, which made me realize of the mistake! – Néstor Jan 12 '15 at 17:55
  • 1
    Since your solution appears to be the one proposed by @mpiktas, I will vote to close this thread as a duplicate--which will leave the two threads connected. – whuber Jan 12 '15 at 18:47

0 Answers0