1

$Y$ (scores among black students) $\sim X_1 + X_1^2 + X_2 + (X_1 * X_2) + (X_1^2 * X_2)$

X1           0.089626*** (e.g. same ethnic teacher)
X1^2         -0.008001***
X1*X2        0.003887*** (e.g. same ethnic teacher * principals' leadership
X1^2*X2      -0.000231***

In this case, How can I interpret (X1^2*X2)?

SecretAgentMan
  • 1,463
  • 10
  • 30
Jay
  • 31
  • 1
  • 2
  • Without the coefficients this is unlikely to get a sensible answer. What is your scientific question which caused you to fit such a model? – mdewey Dec 24 '18 at 15:48
  • I put the coefficients and it would be two-level hierarchical linear modeling (students within schools). I would like to examine the effects of the size of black teachers on black students' score and how it varies by principals' multicultural leadership. And I assume that the relationship between the size and students' score may be non-linear. – Jay Dec 24 '18 at 16:11
  • 1
    What sort of values do X1 and X2 take? From your very brief post they sound like binary (0 or 1) variables, in which case I don't understand why you would square one of them. That question aside, in a test-score context it would be very rare to have a productive use for all 3 types of effects (main effect, interaction and squared term) involving the same variable. – rolando2 Dec 24 '18 at 19:59
  • Sorry for confusing. X1 is the percentage of black teachers and X2 is Likert scale. – Jay Dec 24 '18 at 21:24

1 Answers1

2

This regression can be written more simply as:

$$Y \sim (X_1 + X_1^2)*X_2.$$

This model involves main effect terms plus interaction for the variable $X_2$ and a second-order polynomial in the first variable $X_1$. In such a model, the main effects and interactions are:

$$\begin{matrix} \text{Main effect of variable } X_1 & & & & X_1+X_1^2 \\[6pt] \text{Main effect of variable } X_2 & & & & X_2 \\[6pt] \text{Interaction effect of variables } X_1 \text{ and } X_2 & & & & (X_1+X_1^2):X_2 \\[6pt] \end{matrix}$$

The individual term $X_1^2:X_2$ is not really meaningful in itself, since it is an interaction with only one of the terms in the second-order polynomial for your variable $X_1$. When interpreting the variables you should keep all the parts of your polynomial variable together.

Ben
  • 91,027
  • 3
  • 150
  • 376
  • I don't follow. The original regression explicitly references five functionally independent variables and (perhaps, depending on what "$\sim$" is intended to mean), implicitly references a constant and therefore estimates either five or six independent parameters. Your model estimates only three or four (including the constant). In what sense is it equivalent, then? – whuber Dec 26 '18 at 20:30
  • The symbol $\sim$ is used in the same way the OP uses it in his question ---i.e., as part of ```R``` syntax for a regression equation. (The symbol $*$ is product interaction and the symbol $:$ is single interaction.) In this syntax, the regression $Y \sim (X_1 + X_1^2) * X_2$ is just a shorthand for the equivalent regression $Y \sim X_1 + X_1^2 + X_2 + X_1:X_2 + X_1^2:X_2$ (both of which implicitly include constants, since there is no $-1$ term in either expression). – Ben Dec 26 '18 at 23:05
  • The model matrix for the formula `y ~ (x1 + x1^2)*x2` has four columns while the model matrix for the formula `y ~ x1 + I(x1^2) + x2 + I(x1*x2) + I(x1^2*x2)` corresponding to (perhaps one, but a natural one) interpretation of the OP's model has six columns. The models are not the same, so I'm struggling to see how your answer addresses the question. – whuber Dec 26 '18 at 23:35