10

I wonder what is the difference between these two kind of priors:

  • Non-informative
  • Improper
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Bram
  • 243
  • 2
  • 7
  • 4
    It might be helpful if you could provide some context here. What do you understand about these already? Is there a particular point of confusion? – gung - Reinstate Monica Jun 05 '17 at 18:27
  • 1
    Relevant links https://stats.stackexchange.com/questions/175792/definition-of-improper-priors and https://stats.stackexchange.com/questions/133707/definition-of-weakly-informative-prior/133710#133710 and https://stats.stackexchange.com/questions/73490/what-exactly-is-weakly-informative-prior – Tim Jun 05 '17 at 21:34
  • @Tim thank you. I was looking for _non-informative_ instead of _weakly informative_. – Bram Jun 06 '17 at 03:03
  • 1
    Possible duplicate of [What is an "uninformative prior"? Can we ever have one with truly no information?](https://stats.stackexchange.com/questions/20520/what-is-an-uninformative-prior-can-we-ever-have-one-with-truly-no-information) – Xi'an Jun 08 '17 at 20:10

2 Answers2

17

Improper priors are $\sigma$-finite non-negative measures $\text{d}\pi$ on the parameter space $\Theta$ such that$$\int_\Theta \text{d}\pi(\theta) = +\infty$$As such they generalise the notion of a prior distribution, which is a probability distribution on the parameter space $\Theta$ such that$$\int_\Theta \text{d}\pi(\theta) =1$$They are useful in several ways to characterise

  1. the set of limits of proper Bayesian procedures,which are not all proper Bayesian procedures;
  2. frequentist optimal procedures as in (admissibility) complete class theorems such as Wald's;
  3. frequentist best invariant estimators (since they can be expressed as Bayes estimates under the corresponding right Haar measure, usually improper);
  4. priors derived from the shape of the likelihood function, such as non-informative priors (e.g., Jeffreys').

Because they do not integrate to a finite number, they do not allow for a probabilistic interpretation but nonetheless can be used in statistical inference if the marginal likelihood is finite$$\int_\Theta \ell(\theta|x)\text{d}\pi(\theta) < +\infty$$since the posterior distribution$$\dfrac{\ell(\theta|x)\text{d}\pi(\theta)}{\int_\Theta \ell(\theta|x)\text{d}\pi(\theta)}$$is then well-defined. This means it can be used in exactly the same way a posterior distribution derived from a proper prior is used, to derive posterior quantities for estimation like posterior means or posterior credible intervals.

Warning: One branch of Bayesian inference does not cope very well with improper priors, namely when testing sharp hypotheses. Indeed those hypotheses require the construction of two prior distributions, one under the null and one under the alternative, that are orthogonal. If one of these priors is improper, it cannot be normalised and the resulting Bayes factor is undetermined.

In Bayesian decision theory, when seeking an optimal decision procedure $\delta$ under the loss function $L(d,\theta)$ an improper prior $\text{d}\pi$ is useful in cases when the minimisation problem $$\arg \min_d \int_\Theta L(d,\theta)\ell(\theta|x)\text{d}\pi(\theta)$$ allows for a non-trivial solution (even when the posterior distribution is not defined). The reason for this distinction is that the decision only depends on the product $L(d,\theta)\text{d}\pi(\theta)$, which means that it is invariant under changes of the prior by multiplicative terms $\varpi(\theta)$ provided the loss function is divided by the same multiplicative terms $\varpi(\theta)$,$$L(d,\theta)\text{d}\pi(\theta)=\dfrac{L(d,\theta)}{\varpi(\theta)}\times\varpi(\theta)\text{d}\pi(\theta)$$

Non-informative priors are classes of (proper or improper) prior distributions that are determined in terms of a certain informational criterion that relates to the likelihood function, like

  1. Laplace's insufficient reason flat prior;
  2. Jeffreys (1939) invariant priors;
  3. maximum entropy (or MaxEnt) priors (Jaynes, 1957);
  4. minimum description length priors (Rissanen, 1987; Grünwald, 2005);
  5. reference priors (Bernardo, 1979, 1781; Berger & Bernardo, 1992; Bernardo & Sun, 2012)
  6. probability matching priors (Welsh & Peers, 1963; Scricciolo, ‎1999; Datta, 2005)

and further classes, some of which are described in Kass & Wasserman (1995). The name non-informative is a misnomer in that no prior is ever completely non-informative. See my discussion on this forum. Or Larry Wasserman's diatribe. (Non-informative priors are most often improper.)

Xi'an
  • 90,397
  • 9
  • 157
  • 575
9

A non-informative prior, rigorously speaking, is not a prior distribution. This is a function such that, if we consider it as if it were a distribution and apply Bayes' formula, we get a certain posterior distribution, which aims to reflect as best as possible the information contained in the data and only in the data, or to achieve a good frequentist-matching property (i.e. a $95\%$-posterior credible interval is approximately a $95\%$-confidence interval).

A non-informative prior is often "improper". A distribution has a well-known property: its integral equals one. A non-informative prior is said to be improper when its integral is infinite (therefore in such a case it is clear that it is not a distribution).

Stéphane Laurent
  • 17,425
  • 5
  • 59
  • 101
  • 3
    I consider this definition of a "non-informative" prior to be super-restrictive! – Xi'an Jun 08 '17 at 20:11
  • @Xi'an In view of the shortness of the OP, I think this short answer is rather appropriate. – Stéphane Laurent Jun 08 '17 at 21:04
  • @Xi'an It's a quote of Bernardo (more or less). Me I agree ^^ – Stéphane Laurent Jun 09 '17 at 12:49
  • @Xi'an I will paste the quote when I go home if you want. See you later. – Stéphane Laurent Jun 09 '17 at 12:50
  • 1
    @Xi'an I'm not at home yet but eg [here](https://people.eecs.berkeley.edu/~jordan/courses/260-spring09/other-readings/bernardo-reference-priors.pdf) *Reference posteriors are obtained by formal use of Bayes theorem with a reference prior function*. Benardo says reference prior **function**, not distribution. – Stéphane Laurent Jun 09 '17 at 12:55
  • 2
    More seriously @Xi'an, you mean it's restrictive to Bernardian noninformative priors ? That's right, and some others maybe. I know you have more knowledge than me in this topic. But I'm Bernardo-oriented (and matching priors). – Stéphane Laurent Jun 09 '17 at 17:20