Is the effectiveness of seemingly unrelated regression an example of Stein's paradox?

Question

The existence of Stein's Example prima facie appears similar to seemingly unrelated regression (SUR) insofar as simultaneously estimating multiple parameters seems more effective than training them separately.

In the Wikipedia articles linked above there seems to be a somewhat different emphasis. The article on Stein's example is focused on a minimum example to show the seemingly paradoxical decrease in mean squared error. The article on SUR emphasizes the effectiveness of training multiple regression equations together that could have been trained separately. But neither article refers to the other.

Is there a precise relationship between these notions? For instance, might SUR be an example of Stein's Example/Paradox?

One potentially interesting counterexample might be to show a SUR with strictly less than 3 parameters among its regression equations that is still more effective than training them separately. — DifferentialPleiometry, Oct 27 '21 at 18:49
One way in which they are very different is that Stein's example was much more of a surprise than SUR. In SUR, because the error terms in the different regressions are assumed related to each other, it appears quite plausible that combining information across the equations can help your estimates. In Stein's example, the errors are assumed independent of one another and yet still, combining information across settings leads to "better" (smaller MSE) outcomes, hence the surprise. — stats_model, Oct 27 '21 at 20:27

jbowman · Accepted Answer · 2021-10-27T22:36:34.863

No, it's not related to Stein's Paradox; instead, it's related to Generalized Least Squares (GLS.)

In SUR, there's an assumption that the regression errors of the individual equations are correlated across equations. This leads to a situation where you can, conceptually, construct a single, large, regression out of the component regressions, one which will have a non-diagonal error correlation matrix. This naturally leads to a feasible generalized least squares approach to estimating the parameters of the joint system of equations; if you knew the cross-equation correlations a priori, you would use GLS instead.

Is the effectiveness of seemingly unrelated regression an example of Stein's paradox?

1 Answers1