In Oehlert (p. 218) the following algorithm for computing Tukey's one degree of freedom test for non-additivity is suggested:
- Fit a preliminary model; this will usually be an additive model.
- Get the predicted values from the preliminary model; square them and divide their squares by twice the mean of the data.
- Fit the data with a model that includes the preliminary model and the rescaled squared predicted values as explanatory variables.
- The improvement sum of squares going from the preliminary model to the model including the rescaled squared predicted values is the single degree of freedom sum of squares for the Tukey model.
- Test for significance of a Tukey type interaction by dividing the Tukey sum of squares by the error mean square from the model including squared predicted terms.
- The coefficient for the rescaled squared predicted values is $\eta$, an estimate of $\eta$. If Tukey interaction is present, transform the data to the power 1 − $\eta$ to remove the Tukey interaction.
Suppose the original model is $y=X\beta +e$. If I understand things correctly, the test can be constructed the following way: get $\hat y = X\hat \beta $ from this model and compute $Z = \hat y ^2 / (2\bar y)$. Then fit the model $y=X\beta + Z\eta + u$ and test the null $\eta = 0$ against $\eta \neq 0$ using an F-test, which is equivalent to the regular t-test of this parameter.
My question is this: wouldn't the fact that $Z$ is estimated from the data distort the size of this test? Specifically, I would think the variance is underestimated. And if so, are there any studies on the size of this test? I would also be interested in a convincing argument that we don't care about the size of this test, something an applied researcher I spoke to recently claimed.