Combining two slopes of linear regression

Question

Let's say we have two linear regression results

Y = a + bX + error
X = c + dZ + error

Why can't we just use 1) and 2) to deduce that slope of Y~Z = bd without having to do actual regression between Y and Z? I tried some experiments and found that the slope from the actual regression is a bit less than bd.

score 8 · Accepted Answer · answered Jul 08 '21 at 20:29

This post analyzes the situation; explains the sense in which linear relations are transitive; locates the problem in a lack of transitivity of the slope estimates; and characterizes all cases where transitivity of the estimates does hold.

You did well by explicitly writing the error terms. Let's name them (because they are separate things) and combine the models. Thus (using subscripts to distinguish separate observations) plugging

$$x_i = c + dz_i + \delta_i\tag{1}$$

into

$$y_i = a + bx_i + \varepsilon_i\tag{2}$$

gives

$$y_i = a + b\left(c + dz_i + \delta_i\right) + \varepsilon_i = (a + bc) + (bd)z_i + (b\delta_i + \varepsilon_i).$$

This is a standard regression model in the form

$$y_i = f + gz_i + \gamma_i\tag{3}$$

where $f= a+bc,$ $g = bd,$ and $\gamma_i = b\delta_i + \varepsilon_i.$ Moreover, the implicit assumptions about the errors (independent, zero means, and identical variances) in models $(1)$ and $(2)$ imply those assumptions hold for $(3),$ at least if we assume the $\delta_i$ are independent of the $\varepsilon_i,$ because

$$E[\gamma_i] = E[b\delta_i + \varepsilon_i] = b(0) + 0 = 0$$

and

$$\operatorname{Var}(\gamma_i) = b^2\operatorname{Var}(\delta_i) + \operatorname{Var}(\varepsilon_i) = \text{constant}.$$

What you will find, though, is that the error terms connecting $X$ and $Z$ change the parameter estimates. That is because the estimates depend on the details of the values of the independent variables ($X$ or $Z$) and the introduction of errors $\delta_i$ in the $X$-$Z$ regression changes the distributions of those values.

We can work out the difference. Putting hats on the names to denote estimates, as usual, the least squares regression formulas are

In model (1), $\hat d = \operatorname{Cov}(\mathbf x, \mathbf z) / \operatorname{Var}(\mathbf z).$
In model (2), $\hat b = \operatorname{Cov}(\mathbf y, \mathbf x) / \operatorname{Var}(\mathbf x).$
In model (3), $\hat g = \operatorname{Cov}(\mathbf y, \mathbf z) / \operatorname{Var}(\mathbf z).$

Therefore

$$\left(\hat b\right)\left(\hat d\right) = \frac{\operatorname{Cov}(\mathbf x, \mathbf z)\operatorname{Cov}(\mathbf y, \mathbf x)}{\operatorname{Var}(\mathbf z)\operatorname{Var}(\mathbf x)}.$$

Although we might hope this equals $\hat g,$ that would be tantamount to claiming

$$\operatorname{Cov}(\mathbf y, \mathbf z) \overset{?}{=}\frac{\operatorname{Cov}(\mathbf x, \mathbf z)\operatorname{Cov}(\mathbf y, \mathbf x)}{\operatorname{Var}(\mathbf x)}.\tag{*}$$

Upon dividing both sides by the standard deviation of $\mathbf z$ and the standard deviation of $\mathbf y$ we obtain

$$\operatorname{Cor}(\mathbf y, \mathbf z) \overset{?}{=}\operatorname{Cor}(\mathbf x, \mathbf z)\operatorname{Cor}(\mathbf y, \mathbf x).$$

There's no reason for correlations to satisfy this property, and counterexamples are easy to construct. Geometrically, $\mathbf x, $ $\mathbf y,$ and $\mathbf z$ are vectors and the correlations are the angles they form. Why should we suppose the angle between any two of them is determined by the other two angles? In fact, you can take three rods $x,$ $y,$ and $z;$ connect $z$ to $x$ securely and $y$ to $x$ securely, yet (except in special cases) still create a large range of angles between $y$ and $z.$

This geometric insight can be turned into simple examples. Consider this dataset of $(x,y,z)$ values: $(1,0,0),$ $(0,1,0),$ and $(0,0,1).$ In all three models the estimates are $1/2$ for the intercept and $-1/2$ for the slope; but obviously $-1/2 \ne (-1/2)\times(-1/2).$

The final nail in this coffin is to observe that applying $(*)$ to two permutations of $\mathbf{x}, \mathbf{y}, \mathbf{z}$ and multiplying gives

$$\operatorname{Cov}(\mathbf y, \mathbf z)\operatorname{Cov}(\mathbf z, \mathbf x) \operatorname{Cov}(\mathbf x, \mathbf y)=\left(\operatorname{Cov}(\mathbf y, \mathbf z)\operatorname{Cov}(\mathbf z, \mathbf x) \operatorname{Cov}(\mathbf x, \mathbf y)\right)^2,$$

showing that $(*)$ can hold only when the product of the three correlations is either $0$ or $1.$ That means either (a) at least one of the pairs of variables is uncorrelated or (b) all pairs are perfectly correlated (with an even number of correlations of $-1$).

Any blog I can refer to understand it better. Thanks. – Pooja Sonkar Jul 09 '21 at 05:10 — Pooja Sonkar, Jul 09 '21 at 05:10
This is a neat explanation. Thanks. – Himawari Jul 09 '21 at 06:20 — Himawari, Jul 09 '21 at 06:20

Combining two slopes of linear regression

1 Answers1

Linked