10

"On the Behrens–Fisher Problem: A Review" by Seock-Ho Kim and Allen S. Cohen

Journal of Educational and Behavioral Statistics, volume 23, number 4, Winter, 1998, pages 356–377


I'm looking at this thing and it says:

Fisher (1935, 1939) chose the statistic $$ \tau = \frac{\delta-(\bar x_2 - \bar x_1)}{\sqrt{s_1^2/n_1+s_2^2/n_2}} = t_2\cos\theta - t_1\sin\theta $$ [where $t_i$ is the usual one-sample $t$-statistic for $i=1,2$] where $\theta$ is taken in the first quadrant and $$ \tan\theta = \frac{s_1/\sqrt{n_1}}{s_2/\sqrt{n_2}}.\tag{13} $$ [ . . . ] The distribution of $\tau$ is the Behrens–Fisher distribution and is defined by the three parameters $\nu_1$, $\nu_2$, and $\theta$,

The parameters $\nu_i$ had earlier been defined as $n_i-1$ for $i=1,2$.

Now the things that are unobservable here are $\delta$ and the two population means $\mu_1$, $\mu_2$, whose difference is $\delta$, and consequently $\tau$ and the two $t$-statistics. The sample SDs $s_1$ and $s_2$ are observable and are used to define $\theta$, so that $\theta$ is an observable statistic, not an unobservable population parameter. Yet we see it being used as one of the parameters of this family of distributions!

Could it be that they should have said the parameter is the arctangent of $\dfrac{\sigma_1/\sqrt{n_1}}{\sigma_2/\sqrt{n_2}}$ rather than of $\dfrac{s_1/\sqrt{n_1}}{s_2/\sqrt{n_2}}$?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Michael Hardy
  • 7,094
  • 1
  • 20
  • 38

1 Answers1

6

The Behrens-Fisher distribution is defined by $t_2\cos\theta - t_1\sin\theta$ where $\theta$ is a real number and $t_2$ and $t_1$ are independent $t$-distributions with degrees of freedom $\nu_2$ and $\nu_1$ respectively.

Behrens and Fisher's solution of the Behrens-Fisher problem involves the Behrens-Fisher distribution with $\theta$ depending on the observations because it is a pseudo-Bayesian (in fact, a fiducial) solution: this data-depending distribution is a posterior-like distribution of $\tau$ (with $\delta$ the only random part in the definition of $\tau$ because the data are fixed).

Stéphane Laurent
  • 17,425
  • 5
  • 59
  • 101
  • So you're saying it's the distribution of $t_2\cos\theta - t_1\sin\theta$ where $\theta$ is _not random_, even though they say $\theta=\arctan\dfrac{s_1/\sqrt{n_1}}{s_2/\sqrt{n_2}}$ and $s_1$ and $s_2$ are random? So it's the _conditional_ distribution _given_ the ratio of variances? It seems to me the authors should have been a lot more explicit about this. – Michael Hardy Aug 19 '12 at 18:26
  • So should this be viewed as another instance of Fisher's technique of conditioning on an ancillary statistic? – Michael Hardy Aug 19 '12 at 18:40
  • $s_1$ and $s_2$ are data-dependent, but the data are fixed, this is like a posterior distribution in Bayesian statistics. In the expression of $\tau$, each of $\bar x_1$, $\bar x_2$, $s_1$ and $s_2$ is fixed, and $\delta$ is random. – Stéphane Laurent Aug 19 '12 at 18:40
  • Answer to your 2nd comment: I don't know. Here this is fiducial statistics. – Stéphane Laurent Aug 19 '12 at 18:41
  • According to this answer, all of the randomness in $t_1$ and $t_2$ comes from the randomness in $\mu_1$ and $\mu_2$, and the rest is fixed. But the justification for saying that $t_1$ and $t_2$ have the particular probability distributions that are attributed to them, is the distribution of the data. Should we just say "that's because this is fiducial inference"? – Michael Hardy Aug 20 '12 at 01:48
  • @MichaelHardy I don't master fiducial statistics. I know there is also a Bayesian way yielding exactly the same solution. It would be more simple to have a look at these solutions, don't they described in the paper you cited ? – Stéphane Laurent Aug 20 '12 at 05:26