0

I am currently reviewing a paper in which the authors perform the following analysis. They use the same set of independent variables to explain 3 different variables. The hypotheses are of the form: X has a stronger positive effect on Y1 than on Y2 and Y3. The authors run separate regression models for the three DV's (Y1, Y2, Y3), each time with the same IV's. The sample is also identical, they just use a different scale to define the DV for each regression model.

so:

Y1 ~ a1 + b1X1 + c1X2 + d1X3 + e1

Y2 ~ a2 + b2X1 + c2X2 + d2X3 + e2

Y3 ~ a3 + b3X1 + c3X2 + d3X3 + e3

with hypotheses such as: c1 > c2 and c3, and all > 0

The authors then simple compare c1 with c2 and c3 and conclude that their hypothesis is supported because the fitted c1 is indeed positive and larger than c2 and c3.

To me, this feels incomplete. Seeing that c1 is larger than c2 doesn't mean the difference is statistically significant.

How should I respond in my review? I'd like to comment on this if the analysis is incorrect, but would also like to be able to suggest how the analysis could be done correctly. I hope you can help.

Peter Verbeet
  • 268
  • 1
  • 3
  • 8

2 Answers2

1

If the three Y variables are on different scales, then the coefficients will vary, even if all three Y s are the same variable.

E.g. suppose $Y_1$ is height in inches and $Y_2$ is height in millimeters. Then $b_2 > b_1$ (and similarly for c, d).

One solution to this problem is to standardize all the variables prior to the analysis.

For testing the differences, see this thread

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • Thanks. In this case, two of the Y variables are on the same scale (as far as I can tell from what the authors write) and one is quite a bit larger than the other two. In fact, Y1 and Y2 are measured on a 7-point scale and Y3 is Y1*Y2. I guess this complicates things? – Peter Verbeet Dec 27 '13 at 17:00
  • With respect to the thread you posted the link to, it seems like that refers to the case with separate samples rather than the models being run on the same sample (as is the case here). Perhaps that matters? – Peter Verbeet Dec 27 '13 at 17:06
  • Regarding your first comment, yes, it does. Regarding your second, I am not sure ... That is one reason I didn't mark this as a duplicate. – Peter Flom Dec 27 '13 at 17:28
1

What needs clarification in the paper is the definition of the term "stronger effect"
In a linear regression set up, the estimated coefficients measure the marginal effect on the conditional expected value of the DV of a regressor while keeping the rest of the regressors fixed.

Assume then that

$$\frac {\partial \hat E(Y_1\mid \mathbf X)}{\partial X_1} = \hat c_1 > \hat c_2 = \frac {\partial \hat E(Y_2\mid \mathbf X)}{\partial X_1}$$

For example, say that $ \hat c_1 =10\hat c_2$. Now assume that the conditional means of the DV's over the regressors sample means are

$$\hat E(Y_1\mid \mathbf X=E(\mathbf X)) = 100\hat E(Y_2\mid \mathbf X=E(\mathbf X))$$

Then the elasticities , evaluated say at the regressor sample means are

$$\varepsilon_{Y_1,X_1} =\frac {\partial \hat E(Y_1\mid \mathbf X)}{\partial X_1} \cdot \frac {E(X_1)}{\hat E(Y_1\mid X=E(\mathbf X))} \\ = 10\hat c_2 \cdot \frac {E(X_1)}{100\hat E(Y_2\mid \mathbf X=E(\mathbf X))} = 0.01\varepsilon_{Y_2,X_1} < < \varepsilon_{Y_2,X_1}$$

While the marginal effect of $X_1$ on $Y_1$ is 10-fold higher in absolute terms, it is just 1% of the percentage effect that $X_1$ has on $Y_2$ (as measured by the elasticity).

This was just an example to show that the concept of "stronger effect" must be carefully defined and be justified in the context of the specific real-world phenomenon under study -it may be the case that the absolute and not the percentage effect is meaningful/relevant after all.

Your concern regarding whether the difference is statistically significant is also legitimate -but again, one must construct the framework into which one can formally test whether the difference is statistically significant or not.

Another methodological issue is that if $Y_3 = Y_1Y_2$ for internal consistency we have that

$$Y_3 = (a_1 + b_1X_1 + c_1X_2 + d_1X_3 + e_1)\cdot (a_2 + b_2X_1 + c_2X_2 + d_2X_3 + e_2)$$ If you carry the multiplication, you will obtain the internally consistent specification for the $Y_3$ DV. If then you match this specification with the postulated one in the paper, you will see that all sorts of relations between the coefficients arise, but also, squared regressors and interaction terms that are being "hidden" in the $e_3$ error term, creating endogeneity of the regressors, and hence undermining the validity of the estimation in the $Y_3$ equation.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241