1

I'm using "decompose" in a figurative sense here.

I have a simple regression model: y = b + b1x1 + b2x2 + e

The "e" term is the residual and "b" the intercept.

I want to show the contribution of x1 (and also x2) toward y

For the contribution of x1 I remove x2 by setting it to zero. That yields: y|x1 = b + b1x1

For x2 then: y|x2 = b + b2x2

The problem is that these two contributions do not sum to the modeled y values as the intercept is double-counted: y|x1 + y|x2 = 2b + b1x1 + b2x2 != y

When I force the regression thru the origin it works fine and there are solid physical reasons to do so but I am looking for a more elegant way to do this?

Comments to answers here (I can't add comments below for some reason):

To Peter, yes, that is it. Imagine a stacked bar chart. I want my bar chart to have 3 components, the amount of y determined by x1, the amount of y determined by x2, and the residual. These should add up to the raw data.

David
  • 11
  • 1
  • 3
  • If you set e.g. $x_2$ to 0 then $b_1$ is only the contribution of $x_1$ when $x_2 = 0$. Is that what you want? – Peter Flom Nov 21 '13 at 20:22
  • This is not the right approach. Use instead the "matching" method described at http://stats.stackexchange.com/a/46508. Equivalently, perform multiple regression. Note that in any event the "contributions" of the factors are not additive unless all factors are orthogonal. – whuber Nov 21 '13 at 21:08
  • Yes, just use simple algebra to isolate that extra intercept, the answer should not be equal to $y$, but $y + \beta_0$. In a way, you're adding two regression planes on top of each other without noticing that the "height" of the regression planes was double counted. In other words, if you subtract the intercept from the sum of $y|x_1 + y|x_2$, you should be able to recover $\hat{y}$. Notice that it will not be equal to $y$, cause you haven't added the error term back to either side. – Penguin_Knight Nov 21 '13 at 21:12

2 Answers2

2

In the regression specification

$$y = b_0 + b_1x_1 + b_2x_2 + u$$

1) The constant term, even if it does not emerge from the theoretical model behind the regression specification, captures the possibly non-zero mean of the error term. This means that we know that in all likelihood there are other factors that affect $y$ -we just hope that they do not co-vary with $x_1$ and/or $x_2$ (and in the last 20 years I have seen perhaps one regression where the constant term appeared not to be statistically highly significant, thus re-enforcing the "wisdom" of including it in the regression "no matter what").

2)If these two regressors are correlated, then by eliminating the one in trying to capture the "pure" effect of the other, you accomplish the exact opposite: the coefficient estimate of the remaining regressor will come from a biased estimator and so it will have a higher probability of being misleading (this is the textbook standard case of "omitted variables" bias). On the contrary if both are included, you are closer to estimate "better" the marginal effect of each regressor (the coefficient) and hence its total contribution.

3) Finally, note that in the "addition" you attempt, you add two conditional values, that are conditional on different sets, and you add them unweighted. Most certainly they don't end up being equal to the unconditional quantity (analogously, think that if, say you have a Bernoulli r.v. $c=\{0,1\}$, then $P(Z\mid c=1) + P(Z\mid c=0) \neq P(Z)$).

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
1

If you want x1's contribution, you don't just put x2 as zero but every other unexplained variable influencing Y. This will result in b = 0, as it is the unexplained part of Y. So, for x1 contribution, Y= b1 x1 for x2 contribution, Y= b2 x2 and for everything else, Y= b. Hope it helps...

Vamsi
  • 111
  • 1