Conjugate priors outside exponential family

Question

The usual exception I have come across regarding non-existence of conjugate prior outside the exponential family is the uniform distribution on $(0,\theta)$ (i.e. $U(0,\theta)$) where $\theta$ has a Pareto prior. Pareto distribution also acts as a conjugate prior in the $U(-\theta,\theta)$ family but this is basically the same example. Other common examples outside exponential family where the support depends on the unknown parameter are the shifted exponential distribution with shift $\theta$ and the Pareto distribution with scale $\theta$. Turns out they also allow conjugate priors with a sufficiently 'nice' distribution as I found out browsing some textbooks, but there was no motivation for how they came up with the priors.

For real $\theta$, suppose $\text{Exp}(\theta,1)$ denotes the shifted exponential density $$f(x)=e^{-(x-\theta)}\mathbf1_{[\theta,\infty)}(x)$$

And for positive $\alpha,\theta$, let $\text{Pareto}(\alpha,\theta)$ be the density $$f(x)=\frac{\alpha \theta^{\alpha}}{x^{\alpha+1}}\mathbf1_{[\theta,\infty)}(x)$$

These are related to the uniform distribution as follows:

$$X \sim \text{Pareto}(1,\theta)\implies \frac1X \sim U\left(0,\frac1{\theta}\right)$$

$$X \sim \text{Exp}(\theta,1) \implies e^{-X} \sim U\left(0,e^{-\theta}\right)$$

Using the Pareto prior for the uniform distribution, I considered $\frac1{\theta}\sim \text{Pareto}(\alpha,a)$ for the Pareto data and $e^{-\theta}\sim \text{Pareto}(\alpha,a)$ for the exponential data.

Now one can easily show that the prior for $\theta$ in the Pareto data has pdf (taking $\beta=\frac1a$) $$\pi(\theta)=\frac{\alpha}{\beta^\alpha}\theta^{\alpha-1}\mathbf1_{[0,\beta]}(\theta) \tag{1}$$

And for the exponential data, the prior has pdf (taking $\beta=-\ln a$)

$$\pi(\theta)=\alpha e^{\alpha(\theta-\beta)}\mathbf1_{(-\infty,\beta]}(\theta) \tag{2}$$

I verified that the distributions in $(1)$ and $(2)$ are indeed conjugate priors for $\theta$ in the $\text{Pareto}(1,\theta)$ and $\text{Exp}(\theta,1)$ distributions respectively.

Is this how the derivation of a conjugate prior works out given that I already have one for a related distribution? Is it always the case that if $g(\theta)$ has a conjugate prior in a given data $X\sim F_{g(\theta)}$, then $\theta$ also has a conjugate prior in the same data $X\sim F_{\theta}$? I guess this does not really make the priors in $(1)$ and $(2)$ distinct from the Pareto prior in $U(0,\theta)$.

The fact that conjugate priors can exist outside exponential family is apparently not surprising since one can construct a conjugate prior whenever a sufficient statistic of fixed dimension exists for the parametric family in question. Indeed the examples above show that not being a member of exponential family does not in itself make the distributions ineligible for a conjugate prior.

But I am not sure what exactly 'fixed dimension' means here. Is a sufficient statistic of fixed dimension essentially referring to a non-trivial sufficient statistic? Consider other distributions outside exponential family like $\text{Laplace}(\theta,1)$ or $\text{Cauchy}(\theta,1)$ with unknown location $\theta$. Suppose a sample of size $n$ is drawn from them. Am I correct in saying that because they do not allow non-trivial sufficient statistics, $\theta$ is guaranteed to not have any conjugate prior? Does this make sense when $n=1$?

I'm fairly sure "fixed dimension" means you can make $n$ observations and summarise them in a statistic that's a vector of a fixed size, and it remains a sufficient statistic regardless of $n$. For an example of a sufficient statistic that isn't of constant dimension, just take the sample values themselves. They form a sufficient statistic (though not a useful one) but its dimension is proportional to $n$. — N. Virgo, Jul 29 '20 at 15:06
The conjugate prior does not depend on the parameterisation of the data, ie, it is the same when you consider $x$, $1/x$, $\exp(x)$, and so on. — Xi'an, Jul 29 '20 at 19:56
"Does this make sense when $n=1$?" - I think not, as the fixed dimension and conjugate families of distributions relates to all values of $n$, not just a single one. The case of $n=1$ is just moving from the prior to the posterior distribution: you would need to show that any prior from a particular family led to a posterior from the same family with the same dimension of parameters to be able to extend this to all $n$ and so have a conjugate family. — Henry, Nov 17 '21 at 14:01

Xi'an · Answer 1 · 2020-08-01T07:54:14.450

The non-existence of conjugate priors outside exponential families is related to the Fisher-Darmois-Piman-Koopman lemma. Which states that, for parameterised families with fixed support (hence excluding the Uniform counterexamples), there cannot exist a sufficient statistic $S_n$ of fixed dimension whatever the sample size $n$ is. Here is a version of the Lemma due to H. Jeffreys (1939) [and reproduced from Oban (2009)):

Fisher-Darmois-Pitman-Koopman Lemma

Let the random quantities $X_1,X_2,...$ be conditionally i.i.d. given the value of some random quantity $\theta$, and assume that the conditional distribution $P_X(X_i|\theta)$ is dominated by a measure ν. Let $p(\cdot|θ)$ be the corresponding conditional density.

Assume further that the support of $f_{X|θ}$ is independent of the value of θ:$$∀θ_1,θ_2∈Ω_θ:\ \text{supp} p(.|θ_1) = \text{supp} p(.|θ_2)\quad ν-a.e.$$ Then if there is a sufficient statistic $S_n: Ω^n_x\mapsto Ω_s$ for each sample size $n\ge n_0$, and if $Ω_s$ has finite dimension,$P_X(\cdot|Θ)$ is an exponential family model.

Indeed, if there exists a conjugate family with a fixed and finite number $p$ of hyper-parameters, the posterior update of these hyper-parameters is sufficient (since Bayesian and classical sufficiencies are equivalent for dominated models).

Sorry but I don't understand how this answers my questions. – StubbornAtom Jul 29 '20 at 18:02 — StubbornAtom, Jul 29 '20 at 18:02

Conjugate priors outside exponential family

1 Answers1