0

Consider the following regression model $$ y=\beta _{1}x_{1}+\beta _{2}x_{2}+u $$ where $x_{1}$ and $x_{2}$ are two random variables and $u$ is a disturbance term. I simulate the model drawing $x_{1}$, $x_{2}$ and $u$ from a standard normal distribution (mean 0 and standard deviation of 1) and making the two regressors $x_{1}$ and $x_{2}$ correlated with correlation coefficient of $0.995$. I set $\beta _{1}=\beta _{2}=0.5$.

The high correlation gives obvious problems visible in the large swings of the estimated coefficients in the table below. What strikes me, however, is that their sum ($\hat{\beta}_{1}+\hat{\beta}_{2}$) seems rather stable and centers around the true sum of 1. The table below shows the coefficients and their sum for five different draws.

(1) (2) (3) (4) (5)
beta1 -0.445 0.$_{.}$467 0.966 0.538 0.189
beta2 1.451 0.505 -0.0443 0.479 0.862
beta1+beta2 1.006 0.972 1.013 1.012 1.051

When one coefficient goes up, the other seems to go down.

Question: Is this something well known? Does this have a name?

This could as well be trivial, but it seems that this may somehow be related to the fact that while OLS gives inconsistent estimates when the regressors are highly correlated, it still allows us to predict $y$ fairly well. I am referring to what Diebold calls p-consistency. See for example

The full table with the estimation results of the five regressions is

| | y | y | y | y | y | |-----|------------------|----------------|------------------|---------------|--| |x1_1 |-0.445 (0.310) | | | | | |x2_1 |1.451*** (0.308) | | | | | |x1_2 | | 0.467 (0.313) | | | | |x2_2 | | 0.505 (0.313) | | | | |x1_3 | | | 0.966** (0.326) | | | |x2_3 | | | 0.0443 (0.328) | | | |x1_4 | | | | 0.538 (0.320)| | |x2_4 | | | | 0.479 (0.319)| | |x1_5 | | | | | 0.189 (0.312) | |x2_5 | | | | | 0.862** (0.312)| |N | 1000 | 1000 | 1000 | 1000 | 1000 | |R-sq | 0.5114 | 0.4926 | 0.4861 | 0.5155 | 0.5467 | Standard errors in parentheses

  • p<0.05, ** p<0.01, *** p<0.001
  • 1
    It's called *multicollinearity.* But I see you have already used that tag, suggesting you are trying to ask something else. Could you elaborate? – whuber Jan 22 '21 at 15:56
  • Yes, but does multicollinearity mean that the sum of the two coefficients must somehow add up to the true sum? This seems reasonable, I believe, but I wasn't sure about it. Also, if this would not be the case, we couldn't predict y so well despite the inconsistently estimated coefficients. – Bert Breitenfelder Jan 22 '21 at 16:02
  • I have difficulties formatting the second table. In the preview, the table looks fine but when I post it, it gets messed up. Sorry for that. – Bert Breitenfelder Jan 22 '21 at 16:02
  • "Multicollinearity" means almost collinear. Collinearity, in turn, implies at least one linear combination of the variables is constant. The dual version of that is at least one linear combination of the parameter estimates is constant. – whuber Jan 22 '21 at 16:10
  • I see. Yes, that's the answer. Thanks. – Bert Breitenfelder Jan 22 '21 at 16:13

0 Answers0