In a previous post Bayes-Poincaré solution to k-sample tests for comparison and the Behrens-Fisher problem?, the classical Bayesian and likelihoodist solutions to 2-sample tests for comparison and the Behrens-Fisher problem have been analyzed and found to be incorrect in several respects, including:
- Two new and fictitious models/likelihoods ${M_0}$ and ${M_1}$ for the pooled data $\left( {{x_1},{x_2}} \right)$ are introduced under both hypotheses ${H_0}$ and ${H_1}$ on top of both original models, in violation of Ockam’s razor;
- Model ${M_0}$ requires a call to the principle of the identity of equality and identity, which is external to probability theory and false according to Henri Poincaré. This is not because two parameters have the same numerical values that they are identical;
- Conversely, under model ${M_1}$, this is not because there are two different parameters that their numerical values are necessarily different. They are different almost surely with probability $p = 1$ if the parameters are continuous and different with probability $p < 1$ if they are discrete;
- It follows that ${M_1}$ is not the logical negation of ${M_0}$ , even in the continuous case, in contradiction with the definition of the original hypotheses ${H_0}$ and ${H_1}$ ;
- The prior probabilities for models ${M_0}$ and ${M_1}$ are assigned, quite arbitrarily, and decorrelated from the prior probabilities for the hypotheses ${H_0}$ and ${H_1}$ that must be be computed from the prior probability distributions of the parameters of the original models;
- For a continuous parameter of interest with continuous marginal prior probability distributions under both experiments, the prior and posterior probabilities for the null hypothesis ${H_0}$ are equal to zero. It follows that the Bayes factor is undefined. Therefore, the solution cannot rely on a Bayes factor;
- The classical solution remains the same, regardless of whether the parameter of interest is discrete or continuous, while the situation is completely different since the Bayes factor is well-defined in the discrete case and undefined in the continuous one;
- ...
A simple alternative, fully probabilistic solution free from those defects and criticisms has been proposed. The solution is straightforward for a discrete parameter of interest and makes the classical one inadmissible in the statistical sense. But it is more unusual for a continuous one even if we are just doing our best, again, to follow Henri Poincaré.
So, as an example, let’s apply this solution to the Behrens-Fisher problem with the classical, standard, continuous improper Jeffreys' priors
$p\left( {{\mu _i},{\sigma _i}} \right) = p\left( {{\mu _i}} \right)p\left( {{\sigma _i}} \right) \propto \sigma _i^{ - 1},\;i = 1,2$
over $\mathbb{R} \times {\mathbb{R}^{ + *}}$.
In order to keep full control, we start with proper priors with compact support
$p\left( {\left. {{\mu _i},{\sigma _i}} \right|N,a,b} \right) = p\left( {\left. {{\mu _i}} \right|N} \right)p\left( {\left. {{\sigma _i}} \right|a,b} \right) = {\left( {2N} \right)^{ - 1}}\frac{{\sigma _i^{ - 1}}}{{\log \left( b \right) - \log \left( a \right)}}$
over $\left[ { - N,N} \right] \times \left[ {a,b} \right]$, $0 < N$, $0 < a < b$, $i = 1,2$.
We introduce two identical sequences of discrete uniform random variables ${\left( {\mu _i^l} \right)_{l \in {\mathbb{N}^*}}},\;i = 1,2$ defined on a partition of $\left[ { - N,N} \right]$ such as
${\Omega ^l} = \left\{ { - N, - N + \Delta \mu , - N + 2\Delta \mu ,...,N} \right\},\;\Delta \mu = \frac{N}{l}$
of cardinal $\left| {{\Omega ^l}} \right| = 2l + 1$.
The prior probability for the null hypothesis ${H_0}$ and the discrete parameters $\mu _1^l$ and $\mu _2^l$ is
$p\left( {\left. {{H_0}} \right|l,N} \right) = \sum\limits_{{\Omega ^l}} {p{{\left( {{\mu ^l}} \right)}^2}} = \sum\limits_{{\Omega ^l}} {{{\left( {2l + 1} \right)}^{ - 2}}} = \left( {2l + 1} \right){\left( {2l + 1} \right)^{ - 2}} = {\left( {2l + 1} \right)^{ - 1}}$
but, as we shall see, it is convenient to write it like this
$p\left( {\left. {{H_0}} \right|l,N} \right) = \frac{{\sum\limits_{{\Omega _l}} {1 \times 1} }}{{\sum\limits_{{\Omega _l}} 1 \sum\limits_{{\Omega _l}} 1 }} = \Delta \mu \frac{{\Delta \mu \sum\limits_{{\Omega _l}} {1 \times 1} }}{{\Delta \mu \sum\limits_{{\Omega _l}} 1 \,\Delta \mu \sum\limits_{{\Omega _l}} 1 }}\mathop \sim \limits_{\Delta \mu \to {0^ + }} \Delta \mu \frac{{\int\limits_{ - N}^N {{\text{d}}\mu } }}{{\int\limits_{ - N}^N {{\text{d}}\mu } \int\limits_{ - N}^N {{\text{d}}\mu } }} = \frac{N}{l}\frac{{2N}}{{{{\left( {2N} \right)}^2}}} = \frac{1}{{2l}}$
Dropping index $i$ for clarity, both joint posteriors write
$p\left( {\left. {{\mu ^l},\sigma } \right|x,l,N,a,b} \right) = \frac{{p\left( {\left. {{\mu ^l}} \right|l,N} \right)p\left( {\left. \sigma \right|a,b} \right)p\left( {\left. x \right|{\mu ^l},l,N,\sigma ,a,b} \right)}}{{\sum\limits_{{\Omega ^l}} {p\left( {\left. {{\mu ^{l'}}} \right|l,N} \right)\int\limits_a^b {p\left( {\left. \sigma \right|a,b} \right)p\left( {\left. x \right|{\mu ^{l'}},l,N,\sigma ,a,b} \right){\text{d}}\sigma } } }} = \frac{{p\left( {\left. \sigma \right|a,b} \right)p\left( {\left. x \right|{\mu ^l},l,N,\sigma ,a,b} \right)}}{{\sum\limits_{{\Omega ^l}} {\int\limits_a^b {p\left( {\left. \sigma \right|a,b} \right)p\left( {\left. x \right|{\mu ^{l'}},l,N,\sigma ,a,b} \right){\text{d}}\sigma } } }}$
with
$p\left( {\left. x \right|{\mu ^l},l,N,\sigma ,a,b} \right) = {\left( {\sqrt {2\pi } } \right)^{ - m}}{\sigma ^{ - m}}{e^{ - \frac{{{\sigma ^{ - 2}}}}{2}\sum\limits_{i = 1}^m {{{\left( {{x^i} - {\mu ^l}} \right)}^2}} }}$
We need to evaluate the integral
$\int\limits_a^b {p\left( {\left. \sigma \right|a,b} \right)p\left( {\left. x \right|{\mu ^l},l,N,\sigma ,a,b} \right){\text{d}}\sigma } = \frac{{{{\left( {\sqrt {2\pi } } \right)}^{ - m}}}}{{\log \left( b \right) - \log \left( a \right)}}\int\limits_a^b {{\sigma ^{ - m - 1}}{e^{^{ - \frac{{{\sigma ^{ - 2}}}}{2}\sum\limits_{i = 1}^m {{{\left( {{x^i} - {\mu ^l}} \right)}^2}} }}}{\text{d}}\sigma } $
Let
$A\left( {{\mu ^l}} \right) = \frac{1}{2}\sum\limits_{i = 1}^m {{{\left( {{x^i} - {\mu ^l}} \right)}^2}} $, $y = A\left( {{\mu ^l}} \right){\sigma ^{ - 2}} \Leftrightarrow \sigma = A{\left( {{\mu ^l}} \right)^{\frac{1}{2}}}{y^{ - \frac{1}{2}}}$, ${\text{d}}\sigma = - \frac{1}{2}A{\left( {{\mu ^l}} \right)^{\frac{1}{2}}}{y^{ - \frac{3}{2}}}{\text{d}}y$
Then
$ \int\limits_a^b {{\sigma ^{ - m - 1}}{e^{^{ - A\left( {{\mu ^l}} \right){\sigma ^{ - 2}}}}}{\text{d}}\sigma } = - \frac{{A{{\left( {{\mu ^l}} \right)}^{\frac{1}{2}}}}}{2}\int\limits_{A\left( {{\mu ^l}} \right){a^{ - 2}}}^{A\left( {{\mu ^l}} \right){b^{ - 2}}} {{{\left( {A{{\left( {{\mu ^l}} \right)}^{\frac{1}{2}}}{y^{ - \frac{1}{2}}}} \right)}^{ - m - 1}}{y^{ - \frac{3}{2}}}{e^{^{ - y}}}{\text{d}}y} = \\ \frac{{A{{\left( {{\mu ^l}} \right)}^{ - \frac{m}{2}}}}}{2}\int\limits_{A\left( {{\mu ^l}} \right){b^{ - 2}}}^{A\left( {{\mu ^l}} \right){a^{ - 2}}} {{y^{\frac{m}{2} - 1}}{e^{^{ - y}}}{\text{d}}y} = \frac{{A{{\left( {{\mu ^l}} \right)}^{ - \frac{m}{2}}}}}{2}\left[ {\Gamma \left( {\frac{m}{2},A\left( {{\mu ^l}} \right){b^{ - 2}}} \right) - \Gamma \left( {\frac{m}{2},A\left( {{\mu ^l}} \right){a^{ - 2}}} \right)} \right] \\ $
It follows that
$ p\left( {\left. {{\mu ^l}} \right|x,l,N,a,b} \right) = \frac{{\int\limits_a^b {p\left( {\left. \sigma \right|a,b} \right)p\left( {\left. x \right|{\mu ^l},l,N,\sigma ,a,b} \right){\text{d}}\sigma } }}{{\sum\limits_{{\Omega ^l}} {\int\limits_a^b {p\left( {\left. \sigma \right|a,b} \right)p\left( {\left. x \right|{\mu ^{l'}},l,N,\sigma ,a,b} \right){\text{d}}\sigma } } }} = \\ \frac{{A{{\left( {{\mu ^l}} \right)}^{ - \frac{m}{2}}}\left[ {\Gamma \left( {\frac{m}{2},A\left( {{\mu ^l}} \right){b^{ - 2}}} \right) - \Gamma \left( {\frac{m}{2},A\left( {{\mu ^l}} \right){a^{ - 2}}} \right)} \right]}}{{\sum\limits_{{\Omega ^l}} {A{{\left( {{\mu ^{l'}}} \right)}^{ - \frac{m}{2}}}\left[ {\Gamma \left( {\frac{m}{2},A\left( {{\mu ^{l'}}} \right){b^{ - 2}}} \right) - \Gamma \left( {\frac{m}{2},A\left( {{\mu ^{l'}}} \right){a^{ - 2}}} \right)} \right]} }} \\ $
Now that the normalization constant $\log \left( b \right) - \log \left( a \right)$ has cancelled out, we can take the limits $a \to {0^ + }$ and $b \to + \infty $ to get, iff $A\left( {{\mu _l}} \right) > 0$
$p\left( {\left. {{\mu ^l}} \right|x,l,N} \right) = \frac{{A{{\left( {{\mu ^l}} \right)}^{ - \frac{m}{2}}}\Gamma \left( {\frac{m}{2}} \right)}}{{\sum\limits_{{\Omega _l}} {A{{\left( {{\mu ^{l'}}} \right)}^{ - \frac{m}{2}}}\Gamma \left( {\frac{m}{2}} \right)} }} = \frac{{A{{\left( {{\mu ^l}} \right)}^{ - \frac{m}{2}}}}}{{\sum\limits_{{\Omega _l}} {A{{\left( {{\mu ^{l'}}} \right)}^{ - \frac{m}{2}}}} }}$
Therefore, the null hypothesis ${H_0}$ has posterior probability
$ p\left( {\left. {{H_0}} \right|{x_1},{x_2},l,N} \right) = \sum\limits_{{\Omega ^l}} {p\left( {\left. {\mu _1^l = {\mu ^l}} \right|{x_1},l,N} \right)p\left( {\left. {\mu _2^l = {\mu ^l}} \right|{x_2},l,N} \right)} = \\ \frac{{\sum\limits_{{\Omega ^l}} {{\text{SSE1}}{{\left( {{\mu ^l}} \right)}^{ - \frac{m}{2}}}{\text{SSE2}}{{\left( {{\mu ^l}} \right)}^{ - \frac{n}{2}}}} }}{{\sum\limits_{{\Omega ^l}} {{\text{SSE1}}{{\left( {{\mu ^l}} \right)}^{ - \frac{m}{2}}}} \sum\limits_{{\Omega ^l}} {{\text{SSE2}}{{\left( {{\mu ^l}} \right)}^{ - \frac{n}{2}}}} }} \\ $
if ${\text{SSE1}}\left( {{\mu ^l}} \right) = \sum\limits_{i = 1}^m {{{\left( {x_1^i - {\mu ^l}} \right)}^2}} $ and ${\text{SSE2}}\left( {{\mu ^l}} \right) = \sum\limits_{j = 1}^n {{{\left( {x_2^j - {\mu ^l}} \right)}^2}} $.
As expected, the ratio
$\frac{{p\left( {\left. {{H_0}} \right|{x_1},{x_2},l,N} \right)}}{{p\left( {\left. {{H_0}} \right|l,N} \right)}}$
now has a well-defined limit when $l \to + \infty $ , equivalently $\Delta \mu \to {0^ + }$
$ \frac{{p\left( {\left. {{H_0}} \right|{x_1},{x_2},l,N} \right)}}{{p\left( {\left. {{H_0}} \right|l,N} \right)}} = \Delta \mu \frac{{\Delta \mu \sum\limits_{{\Omega ^l}} {{\text{SSE1}}{{\left( {{\mu ^l}} \right)}^{ - \frac{m}{2}}}{\text{SSE2}}{{\left( {{\mu ^l}} \right)}^{ - \frac{n}{2}}}} }}{{\Delta \mu \sum\limits_{{\Omega ^l}} {{\text{SSE1}}{{\left( {{\mu ^l}} \right)}^{ - \frac{m}{2}}}} \Delta \mu \sum\limits_{{\Omega ^l}} {{\text{SSE2}}{{\left( {{\mu ^l}} \right)}^{ - \frac{n}{2}}}} }}/\Delta \mu \frac{{\Delta \mu \sum\limits_{{\Omega _l}} {1 \times 1} }}{{\Delta \mu \sum\limits_{{\Omega _l}} 1 \,\Delta \mu \sum\limits_{{\Omega _l}} 1 }} \\ \mathop \to \limits_{\Delta \mu \to {0^ + }} \frac{{p\left( {\left. {{H_0}} \right|{x_1},{x_2},N} \right)}}{{p\left( {\left. {{H_0}} \right|N} \right)}} = 2N\frac{{\int\limits_{ - N}^N {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } }}{{\int\limits_{ - N}^N {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{d}}\mu } \int\limits_{ - N}^N {{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } }} \\ $
because all functions are Riemann-integrable. In particular, this limit does not depend on the particular partition of $\left[ { - N,N} \right]$ we used only for convenience.
This is also the limit of the sequence of Bayes factors
$ \left. {{B_{01}}} \right|l,N = \frac{{p\left( {\left. {{H_0}} \right|{x_1},{x_2},l,N} \right)}}{{p\left( {\left. {{H_1}} \right|{x_1},{x_2},l,N} \right)}}/\frac{{p\left( {\left. {{H_0}} \right|l,N} \right)}}{{p\left( {\left. {{H_1}} \right|l,N} \right)}} = \frac{{p\left( {\left. {{H_0}} \right|{x_1},{x_2},l,N} \right)}}{{p\left( {\left. {{H_0}} \right|l,N} \right)}}/\frac{{1 - p\left( {\left. {{H_0}} \right|{x_1},{x_2},l,N} \right)}}{{1 - p\left( {\left. {{H_0}} \right|l,N} \right)}} \\ \mathop \to \limits_{\Delta \mu \to {0^ + }} \left. {{B_{01}}} \right|N = \frac{{p\left( {\left. {{H_0}} \right|{x_1},{x_2},N} \right)}}{{p\left( {\left. {{H_0}} \right|N} \right)}} = 2N\frac{{\int\limits_{ - N}^N {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } }}{{\int\limits_{ - N}^N {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{d}}\mu } \int\limits_{ - N}^N {{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } }} \\ $
For $m > 2$, $n > 2$ and non pathological data, the improper integrals converge
$ \mathop {\lim }\limits_{N \to + \infty } \int\limits_{ - N}^N {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{d}}\mu } = \int\limits_{ - \infty }^{ + \infty } {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{d}}\mu } < + \infty \\ \mathop {\lim }\limits_{N \to + \infty } \int\limits_{ - N}^N {{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } = \int\limits_{ - \infty }^{ + \infty } {{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } < + \infty \\ \mathop {\lim }\limits_{N \to + \infty } \int\limits_{ - N}^N {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } = \int\limits_{ - \infty }^{ + \infty } {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } < + \infty \\ $
It follows that we have the undesirable but perfectly normal result
$\left. {{B_{01}}} \right|N\mathop \sim \limits_{N \to + \infty } 2N\frac{{\int\limits_{ - \infty }^{ + \infty } {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } }}{{\int\limits_{ - \infty }^{ + \infty } {{\text{SSE1}}{{\left( \mu \right)}^{ - \frac{m}{2}}}{\text{d}}\mu } \int\limits_{ - \infty }^{ + \infty } {{\text{SSE2}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } }}\mathop \to \limits_{N \to + \infty } {B_{01}} = + \infty $
This is not a defect of the present method but of the uniform prior, due to the fact that
$\mathop {\lim }\limits_{N \to + \infty } \int\limits_{ - N}^N {{\text{d}}\mu } = + \infty $ while $\mathop {\lim }\limits_{N \to + \infty } \int\limits_{ - N}^N {{\text{SSE}}{{\left( \mu \right)}^{ - \frac{n}{2}}}{\text{d}}\mu } < + \infty $ for $n > 2$
This is a very intuitive result, the larger $N$, the smaller $p\left( {\left. {{H_0}} \right|N} \right)$ but almost not $p\left( {\left. {{H_0}} \right|{x_1},{x_2},N} \right)$ because the posterior distributions concentrate their mass around the sample means.
Hence, this issue should disappear for any location prior whose normalization constant remains bounded over $\mathbb{R}$ such as a Gaussian prior $\mathcal{N}\left( {0,{\tau ^2}} \right)$ . But who cares about non-compact priors?
Any objection against this solution please?
Some comments:
We can replace the common interval $\Theta = \left[ { - N,N} \right]$ by the common interval $\Theta = \left[ {L,U} \right]$ and by different intervals ${\Theta _i} = \left[ {{L_i},{U_i}} \right]$, $i = 1,2,...,k$ for $k$-sample problems;
More generally, for any compact proper marginal continuous prior with probability density function $f\left( {\left. {{\theta _i}} \right|{L_i},{U_i}} \right)$ on $\left[ {{L_i},{U_i}} \right]$ and any $\Delta {\theta ^l}$-fine partition $\Omega _i^l = \left\{ {\theta _{i,j}^l,j = 1,...,\left| {\Omega _i^l} \right|} \right\}$ of it, we introduce a sequence of discrete random variables ${\left( {\theta _i^l} \right)_{l \in {\mathbb{N}^*}}}$ with probability mass function $p\left( {\left. {\theta _{i,j}^l} \right|{L_i},{U_i}} \right) = \frac{{f\left( {\left. {{\theta _i} = \theta _{i,j}^l} \right|{L_i},{U_i}} \right)}}{{\sum\limits_{\Omega _i^l} {f\left( {\left. {{\theta _i} = \theta _{i,j}^l} \right|{L_i},{U_i}} \right)} }}$ . Of course, the partitions $\Omega _i^l$ , $i = 1,...,k$ must coincide on $\bigcap\limits_{i = 1}^k {\left[ {{L_i},{U_i}} \right]} $ ;
Approximating continuous r.v.s by sequences of discrete ones is a basic technique in probability theory. See for instance (Almost bullet-proof) Definition of Expectation. Compared to this example, we just make the further approximation $\int\limits_{\theta _{i,j}^l}^{\theta _{i,j + 1}^l} {f\left( {\left. {{\theta _i}} \right|{L_i},{U_i}} \right){\text{d}}} {\theta _i} \simeq \Delta {\theta ^l}f\left( {\left. {{\theta _i} = \theta _{i,j}^l} \right|{L_i},{U_i}} \right)$ in order to get Riemann sums. This makes the proof easier;
It may appear that the most suitable definition of the integral for the present purpose is the Henstock-Kurzweil gauge integral. Indeed:
It is equivalent to the Lebesgue integral for non-negative functions. As a consequence, our technique should apply to the standard Borel-Lebesgue-Kolmogorov measure-theoretic setting without any restriction;
Unlike the Riemann integral, there is no need to go through improper integrals;
Unlike the Lebesgue integral, it is based on Riemann sums that look essential in the present approach.
The multivariate case ${\Theta _i} \subset {\mathbb{R}^d},\;d > 1$ looks essentially the same but it may require some technical conditions/restrictions;
Eventually, we shall probably forget that the quantities $\left. {{B_{01}}} \right|N$ and ${B_{01}}$ above are defined as limits when $\Delta {\theta ^l} \to {0^ + }$ , ${L_i} \to - \infty $ and ${U_i} \to + \infty $, etc., just like we forget that a HK integral is a limit of Riemann sums or (sometimes) that an improper prior is a limit of proper ones, and talk only about generalized Bayes factors or Bayes-Poincaré factors.