Is Fisher overestimating the variance here?

Question

[NB: the title of this post used to be "Please help me decipher Fisher on ANOVA". I've edited the question to make it more focused, and modified the title accordingly.]

Since anything written by R. A. Fisher seems to be fertile ground for misunderstanding, I will quote him at length.

At the beginning of ch VII ("Intraclass correlations and the analysis of variance"), on pp. 213-214 of his Statistical methods for research workers (1973, 14th ed. rev. & enl.), Fisher writes:

If we have measurements of $n^\prime$ pairs of brothers, we may ascertain the correlation between brothers in two slightly different ways. In the first place we may divide the brothers into two classes, as for instance elder brother and younger brother, and find the correlation between these two classes exactly as we do with parent and child. If we proceed in this manner we shall find the mean of the measurements of the elder brothers, and separately that of the younger brothers. Equally the standard deviations about the mean are found separately for the two classes. The correlation so obtained, being that between two classes of measurements, is termed for distinctness an interclass correlation. Such a procedure would be imperative if the quantities to be correlated were, for example, the ages, or some characteristic sensibly dependent upon age, at a fixed date. On the other hand, we may not know, in each case, which measurement belongs to the elder and which to the younger brother, or, such a distinction may be quite irrelevant to our purpose ; in these cases it is usual to use a common mean derived from all the measurements, and a common standard deviation about that mean. If $x_1,\; {x^\prime}_{\!1}; \;\; x_2,\;{x^\prime}_{\!2}; \;\; \cdots ;\;\;x_{n^\prime},\;{x^\prime}_{\!n^\prime}$ are the pairs of measurements given, we calculate $$\begin{array} & & \\ \overline{x} & = & \frac{1}{2n^\prime} \mathrm{S}(x + x^\prime), \\ s^2 & = & \frac{1}{2n}\{\mathrm{S}(x - \overline{x})^2 + \mathrm{S}(x^\prime - \overline{x})^2 \}, \\ r & = & \frac{1}{ns^2} \mathrm{S}\{(x-\overline{x})(x^\prime - \overline{x})\}. \end{array}$$ When this is done, $r$ is distinguished as an intraclass correlation, since we have treated all the brothers as belonging to the same class, and having the same mean and standard deviation. (…)

(I've made the point to reproduce Fisher's words, punctuation, typography, and notation exactly. You may take this to mean that a "sic" applies to the entire passage quoted above, including all the mathematical expressions.)

If I were to attempt to interpret these expressions using more modern notation (and displaying a modicum of elementary courtesy towards the reader), I'd write

$$\begin{array} & & \\ \overline{x} & = & \frac{1}{2n^\prime} \sum_{i=1}^{n^\prime}(x_i + {x^\prime}_{\!i}), \\ s^2 & = & \frac{1}{2n} \left\{ \sum_{i=1}^{n^\prime}(x_i - \overline{x})^2 + \sum_{i=1}^{n^\prime}({x^\prime}_{\!i} - \overline{x})^2 \right\}, \\ r & = & \frac{1}{ns^2} \sum_{i=1}^{n^\prime}(x_i-\overline{x})({x^\prime}_{\!i} - \overline{x}). \end{array}$$

Now, if we assume that $n = n^\prime - 1$, then the last two definitions become

$$\begin{array} & & \\ s^2 & = & \frac{1}{2n^\prime - 2} \left\{ \sum_{i=1}^{n^\prime}(x_i - \overline{x})^2 + \sum_{i=1}^{n^\prime}({x^\prime}_{\!i} - \overline{x})^2 \right\}, \\ r & = & \frac{1}{(n^\prime - 1)s^2} \sum_{i=1}^{n^\prime}(x_i-\overline{x})({x^\prime}_{\!i} - \overline{x}). \end{array}$$

In this form, the definition of $r$ matches that of a sample correlation coefficient for the case in which both variables have the same estimated mean $\overline{x}$ and estimated variance $s^2$.

But I'm still thrown off by the expression for this estimated variance $s^2$. According to Fisher's description $s^2$ should be the square of "a common standard deviation about [a common] mean". Therefore, I expected that the denominator in its pre-factor would be $2n^\prime - 1$. IOW, I expected the whole expression would be

$$ s^2 = \frac{1}{2n^\prime - 1} \left\{ \sum_{i=1}^{n^\prime}(x_i - \overline{x})^2 + \sum_{i=1}^{n^\prime}({x^\prime}_{\!i} - \overline{x})^2 \right\}\,. $$

But if $n = n^\prime - 1$, then Fisher uses a denominator of $2n = 2n^\prime - 2$. Does anyone know why?

Since the quote above comes from a 14th revised edition, it seems to me unlikely that this is merely a typo. Furthermore, in the following page (p. 215) he extends the discussion to the the case of "trios" of brothers, and there he gives the expression: $$ s^2 = \frac{1}{3n}\{\mathrm{S}(x - \overline{x})^2 + \mathrm{S}(x^\prime - \overline{x})^2 + \mathrm{S}(x^{\prime\prime} - \overline{x})^2\}\,, $$

or, in my notation, and still assuming that $n = n^\prime - 1$, $$ s^2 = \frac{1}{3n^\prime - 3} \left\{ \sum_{i=1}^{n^\prime}(x_i - \overline{x})^2 + \sum_{i=1}^{n^\prime}({x^\prime}_{\!i} - \overline{x})^2 + \sum_{i=1}^{n^\prime}({x^{\prime\prime}}_{\!i} - \overline{x})^2 \right\}\,. $$

Therefore, if we're dealing with typos here, they are matched typos.

Of course, for a sufficiently large $n^\prime$, the difference between a denominator of $2n^\prime - 1$ and one of $2n^\prime - 2$ is negligible, but my understanding is that Fisher is explicitly assuming that we are working with values of $n^\prime$ that are in general too small to justify this approximation. In fact, if the approximation $n^\prime \approx n^\prime - 1$ were valid, Fisher's notational distinction between $n^\prime$ and $n = n^\prime - 1$ would be hard to understand.

Also, your first rendition of $r$, with a single sum, is the correct one. The $\mathrm{S}$ notation is explained at the beginning of the book. — Zen, Jun 17 '13 at 03:23
Fisher's "two slightly different ways" ammount to treating the two brothers as coming from two different classes, or coming from the same class. The formulas above are proposed for this second case. — Zen, Jun 17 '13 at 03:26
@Zen, if $n = n^\prime - 1$, and if the equations apply to the second case, then I don't know what to make of $s^2$. It's not the (sample) variance of the $2n^\prime$ measurements, because for that the denominator would have to be $2n^\prime - 1$, not $2n = 2(n^\prime - 1)$. The $s^2$ as given is the average of two separately computed (sample) variances, which contradicts the text's reference to a "common standard deviation about that mean". — kjo, Jun 17 '13 at 09:34
Yeah, we can't make both $s^2$ and $r$ unbiased with the same value of $n$. Maybe we can find something in the ANOVA papers. All his papers are available at Adelaide University. http://www.adelaide.edu.au/library/special/digital/fisher/ — Zen, Jun 17 '13 at 13:23
@Zen: thanks for the pointer to Fisher's papers. I've reworked my question to take into account your suggestions. — kjo, Jun 17 '13 at 22:43

Is Fisher overestimating the variance here?

0 Answers0