31

Intuitively, the mean is just the average of observations. The variance is how much these observations vary from the mean.

I would like to know why the inverse of the variance is known as the precision. What intuition can we make from this? And why is the precision matrix as useful as the covariance matrix in multivariate (normal) distribution?

Insights please?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
cgo
  • 7,445
  • 10
  • 42
  • 61
  • 1
    In computing the likelihood of multi variate Gaussian distribution, precision matrix is more convenient to use. The variance matrix has to be inverted first. – user112758 May 08 '16 at 07:35
  • To nitpick a bit, the variance is not how far the observation vary from the mean because variance is not expressed in the same units as the mean. "Point $A$ is 8 square meters away from point $B$" is unintelligible... (Tim's answer (+1) should address your specific question I believe.) – usεr11852 May 08 '16 at 08:50
  • Precision is a measure of, among other things, how likely we are to be surprised by values distant from the mean. – Alexis Mar 21 '17 at 02:33
  • 1
    I think the original question is an excellent one, because I would have thought that precision would be more of a margin of error, e.g., half the width of an uncertainty interval. This would have been more on the square root of variance scale. – Frank Harrell Jun 08 '19 at 14:27

3 Answers3

33

Precision is often used in Bayesian software by convention. It gained popularity because gamma distribution can be used as a conjugate prior for precision.

Some say that precision is more "intuitive" than variance because it says how concentrated are the values around the mean rather than how spread they are. It is said that we are more interested in how precise is some measurement rather than how imprecise it is (but honestly I do not see how it would be more intuitive).

The more spread are the values around the mean (high variance), the less precise they are (small precision). The smaller the variance, the greater the precision. Precision is just an inverted variance $\tau = 1/\sigma^2$. There is really nothing more than this.

mdeff
  • 148
  • 8
Tim
  • 108,699
  • 20
  • 212
  • 390
14

Precision is one of the two natural parameters of the normal distribution. That means that if you want to combine two independent predictive distributions (as in a Generalized Linear Model), you add the precisions. Variance does not have this property.

On the other hand, when you're accumulating observations, you average expectation parameters. The second moment is an expectation parameter.

When taking the convolution of two independent normal distributions, the variances add.

Relatedly, if you have a Wiener process (a stochastic process whose increments are Gaussian) you can argue using infinite divisibility that waiting half the time, means jumping with half the variance.

Finally, when scaling a Gaussian distribution, the standard deviation is scaled.

So, many parameterizations are useful depending on what you're doing. If you're combining predictions in a GLM, precision is the most “intuitive” one.

Neil G
  • 13,633
  • 3
  • 41
  • 84
  • Hi Neil, could you provide and example or some links to resources that further explain the "additive" property of the precision when combining two distributions? I am not sure, how to interpret it. – Kilian Batzner Jan 07 '18 at 10:52
  • @KilianBatzner http://digitool.library.mcgill.ca/webclient/DeliveryManager?application=DIGITOOL-3&owner=resourcediscovery&custom_att_2=simple_viewer&forebear_coll=&user=GUEST&pds_handle=&pid=117165&con_lng=ENG page 15. – Neil G Jan 07 '18 at 19:59
  • "That means that if you want to combine two independent predictive distributions (as in a Generalized Linear Model), you add the precisions. Variance does not have this property." - Why wouldn't variance have this property if it's simply the inverse of precision? – skeller88 Jul 16 '20 at 21:46
  • @skeller88 because additivity doesn't imply that reciprocals are additive. – Neil G Jul 16 '20 at 21:51
  • I see. Here's evidence that variance is not additive: https://stats.stackexchange.com/questions/390609/explanation-for-additive-property-of-variance – skeller88 Jul 16 '20 at 22:08
  • @skeller88 It seems like you're not carefully reading. Can you see on second look how your reference doesn't apply? If the normal distributions are independent, then their covariance is zero. That's why the variance of the convolution is the sum of the variances. – Neil G Jul 17 '20 at 00:04
  • @NeilG, the mcgill.ca link appears to be dead. What's the title and author of the reference? – mdeff Jul 31 '20 at 17:06
  • 1
    @mdeff Girdhar, Neil. The informative message model. Diss. McGill University, 2013. – Neil G Aug 01 '20 at 13:31
1

Here is my attempt at an explanation:

A) An intuition for precision can be found in the context of measurement error. Suppose you are measuring some quantity of interest with some measurement instrument (e.g., measuring a distance with measuring tape). If you were to take several measurements of the quantity of interest with the same measurement instrument, you will likely end up with variation in the results i.e. measurement error. These errors are often well approximated by a normal distribution. The precision parameter of a normal distribution tells you how "precise" your measurements are in the sense of having larger or smaller errors. The larger the precision, the more precise your measurement, and thus the smaller your errors (and vice-versa).

B) The reason that precision matrices are sometimes preferred over covariance matrices is due to analytical and computational convenience: they are simpler to work with. This is why normal distributions were classically parameterized via the precision parameter in the Bayesian context before the computer revolution when calculations were done by hand. The parameterization remains relevant today when working with very small variances as it helps to address underflow in numerical computations.

The simplicity of the alternative can also be illustrated by comparing the densities of both parameterizations. Notice below how the use of $\tau = \frac{1}{\sigma^2}$ eliminates the need to divide by a parameter. In a Bayesian context (when parameters are treated as random variables) division by a parameter can make calculating posterior distributions painful.

$$p_Y(y; \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{y-\mu}{\sigma})^2}$$

$$p_Y(y; \mu, \tau) = \sqrt{\frac{\tau}{2\pi}}e^{-\frac{1}{2}\tau(y - \mu)^2}$$

David Nelson
  • 121
  • 3