12

Problem: I am parameterizing distributions for use as a priors and data in a Bayesian meta-analysis. The data are provided in the literature as summary statistics, almost exclusively assumed to be normally distributed (although none of the variables can be < 0, some are ratios, some are mass, and etc.).

I have come across two cases for which I have no solution. Sometimes the parameter of interest is the inverse of the data or the ratio of two variables.

Examples:

  1. the ratio of two normally distributed variables:
    • data: mean and sd for percent nitrogen and percent carbon
    • parameter: ratio of carbon to nitrogen.
  2. the inverse of a normally distributed variable:
    • data: mass/area
    • parameter: area/mass

My current approach is to use simulation:

e.g. for a set of percent carbon and nitrogen data with means: xbar.n,c, variance: se.n,c, and sample size: n.n, n.c:

set.seed(1)
per.c <- rnorm(100000, xbar.c, se.c*n.c) # percent C
per.n <- rnorm(100000, xbar.n, se.n*n.n) # percent N

I want to parameterize ratio.cn = perc.c/perc.n

# parameter of interest
ratio.cn <- perc.c / perc.n

Then choose the best fit distributions with range $0 \rightarrow \infty$ for my prior

library(MASS)
dist.fig <- list()
for(dist.i in c('gamma', 'lognormal', 'weibull')) {
    dist.fit[[dist.i]] <- fitdist(ratio.cn, dist.i)
}

Question: Is this a valid approach? Are there other / better approaches?

Thanks in advance!

Update: the Cauchy distribution, which is defined as the ratio of two normals with $\mu=0$, has limited utility since I would like to estimate variance. Perhaps I could calculate the variance of a simulation of n draws from a Cauchy?

I did find the following closed-form approximations but I haven't tested to see if they give the same results... Hayya et al, 1975 $$\hat{\mu}_{y:x} = \mu_y/mu_x + \sigma^2_x * \mu_y / \mu_x^3 + cov(x,y) * \sigma^2_x * \sigma^2_y / \mu_x^2$$ $$\hat{\sigma}^2_{y:x} = \sigma^2_x\times\mu_y / mu_x^4 + \sigma^2_y / mu_x^2 - 2 * cov(x,y) * \sigma^2_x * \sigma^2_y / mu_x^3$$

Hayya, J. and Armstrong, D. and Gressis, N., 1975. A note on the ratio of two normally distributed variables. Management Science 21: 1338--1341

David LeBauer
  • 7,060
  • 6
  • 44
  • 89
  • should I post the Update question about calculating the variance on random draws from the Cauchy as a separate question? – David LeBauer Oct 15 '10 at 17:39
  • david - since your variables are all positive, why do you want to fuss with $\mu = 0$? btw - in your simulation, you seem to be generating variables per.c and per.n that are independent. is that correct - and if so, is that what you want? – ronaf Oct 18 '10 at 02:25
  • no, I don't want to fuss with $\mu$ = 0; these variables are generally treated as independent, and covariance data is rarely available. Since C is fairly constant, independence is a reasonable assumption. – David LeBauer Oct 18 '10 at 16:47
  • I don't understand why the expectation of the ratio doesn't exist. If $ X $ and $ Y $ are jointly normally distributed with mean different than zero, then the mean of $ Z = \frac{X}{Y} $ is given by $ \int \int \frac{x}{y} p \left( x, y \right) dx dy $, what am I missing? – Royi Apr 13 '15 at 18:39

2 Answers2

6

You might want to look at some of the references under the Wikipedia article on Ratio Distribution. It's possible you'll find better approximations or distributions to use. Otherwise, your approach seems sound.

Update I think a better reference might be:

See formulas 2-4 on page 195.

Update 2

On your updated question regarding variance from a Cauchy -- as John Cook pointed out in the comments, the variance doesn't exist. So, taking a sample variance simply won't work as an "estimator". In fact, you'll find that your sample variance does not converge at all and fluctuates wildly as you keep taking samples.

ars
  • 12,160
  • 1
  • 36
  • 54
  • Thanks for the reference, that is where I found the Haaya 1975 reference and the equations in my question, although I'd appreciate reassurance that the equations are appropriate for my problem. – David LeBauer Oct 15 '10 at 17:43
  • Taking a quick look at Haaya, it seems that they're concerned with obtaining a Normal approximation for the ratio and use simulations to determine when that applies (using the coefficient of variation, cv). Does the cv in your case meet the criteria? If so, the approximations apply. – ars Oct 15 '10 at 17:58
  • 1
    @David: use Marsaglia 1965 instead as updated in the answer. – ars Oct 15 '10 at 19:18
  • NB: Marsaglia published an [update in JSS in 2004](http://www.jstatsoft.org/v16/i04/paper). – David LeBauer Feb 11 '13 at 19:43
  • I don't understand why the expectation of the ratio doesn't exist. If $ X $ and $ Y $ are jointly normally distributed with mean different than zero, then the mean of $ Z = \frac{X}{Y} $ is given by $ \int \int \frac{x}{y} p \left( x, y \right) dx dy $, what am I missing? – Royi Apr 13 '15 at 18:41
0

Could you not assume that $y^{-1} \sim N(.,.)$ for the inverse of a normal random variable and do the necessary bayesian computation after identifying the appropriate parameters for the normal distribution.

My suggestion below to use the Cauchy does not work as pointed out in the comments by ars and John.

The ratio of two normally random variables follows the Cauchy distribution. You may want to use this idea to identify the parameters of the cauchy that most closely fits the data you have.

  • a. I need to estimate the variance and the variance of the Cauchy distribution is not defined. – David LeBauer Oct 15 '10 at 17:26
  • b. If I understand your second point, yes, I could assume that y-1 ~ N(mu, sigma), but I still need to calculate mu and sigma from the summary statistics given for y; also, I've chosen not to consider distributions with values < 0 for variables only defined > 0 (even though in many of the cases p(X<0 | X~N(mu,s)) -> 0 ) – David LeBauer Oct 15 '10 at 17:32
  • Doesn't the Cauchy apply for zero mean normals? – ars Oct 15 '10 at 17:57
  • @ars You are correct. The cauchy then may be of limited use. –  Oct 15 '10 at 18:15
  • Ars: Yes, I believe the Cauchy result requires zero means. But that still means that at least in that special case, the variance that David is trying to estimate DOES NOT EXIST. – John D. Cook Oct 15 '10 at 18:15
  • @David: Simply invert y and compute the sample mean and sample standard deviation and use those as estimates of mu and sigma. A normal may approximate y^-1 well if sigma is relatively small. –  Oct 15 '10 at 18:17
  • @John: true, good point; I missed David's first comment. – ars Oct 15 '10 at 18:25
  • @Srikant I can't compute the sample standard deviation since I don't have the raw data... although I could do the calculation on simulated data sets, and take the average of these simulations. – David LeBauer Oct 15 '10 at 19:27
  • The variance is infinite for any normal distribution in the denominator, not just those with zero means. Similarly, the mean is undefined for any normal distribution in the denominator. – Rob Hyndman Oct 16 '10 at 03:53
  • @rob - how do you define the variance of Y = 1/X when X is normal, as EY is undefined? – ronaf Oct 18 '10 at 03:26
  • @ronaf. Good point. E[Y^2] is infinite whenever X is normal. The same goes for E[Y^m] for any even m. If m is odd, the result is undefined. Is that better? – Rob Hyndman Oct 18 '10 at 03:57
  • @rob + @david if the variables in the ratio are *asymptotically* normal [with a non-zero mean in the denominator] and also consistent, the ratio is also asymptotically normal - as the usual delta method shows. perhaps that will render moot the discussion about non-existence of moments. – ronaf Oct 19 '10 at 03:15
  • @ronaf. Actually, that doesn't help. It is possible to have asymptotic normality as well as having non-existent moments. Asymptotics with probability distributions can have weird properties. – Rob Hyndman Oct 19 '10 at 03:16
  • @rob - i have in mind the ratio of two sample means. even tho the actual moments may be infinite or undefined, there is an asymptotic mean and asymptotic variance that go with the limiting normal distribution [again assuming the denominator is not consistently estimating zero]. it is often the parameters of the asymptotic distribution that are relevant for analyses of the data [judging by the way in which many statistical analyses are carried out these days]. in that case, the non-existence of actual moments is a side issue [or a quibble?]. – ronaf Oct 19 '10 at 03:44
  • @ronaf. I don't really follow your last comment. The asymptotic normal distribution will have a well-defined mean and variance. But they are not the same as the true asymptotic mean and variance which are undefined whenever the denominator has a non-zero density at 0. In practice, this may not matter. Imagine, for example, if the denominator is N(100,1). Then the sample values of the ratio will behave nicely with very high probability. – Rob Hyndman Oct 19 '10 at 04:26
  • @rob - if i catch your drift, i think we are on the same page regarding actual vs asymptotic parameters. we seem to agree that the latter are what really matter. – ronaf Oct 27 '10 at 03:01