I am trying to run an OLS regression, with log of per capita calorie as my dependent variable and age and years of education of household head, log per capita expenditure as my independent variables (other controls to be added eventually). When I run the regression with just age and education as control, they are significant and positive. However, as soon as I add log per capita expenditure, education becomes negative and significant. I am puzzled by this result- but the correlation coefficient (for education and log per capita expenditure) is not that large. I have posted my regression results below, as well as summary statistics. I was wondering if someone could help me understand what is going on here. I realize that this sort of problem might (or might not ) be overcome using other techniques than OLS, but I have just started learning OLS and would like to understand how to deal with this in OLS, or at least know why it cannot deal with this.
Thanks,
Monzur
. regress log_pccal age_hhhead eduy_hhhead [pw=hhweight], r
Linear regression Number of obs = 3355
F( 2, 3352) = 105.40
Prob > F = 0.0000
R-squared = 0.0692
Root MSE = .25583
------------------------------------------------------------------------------
| Robust
log_pccal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_hhhead | .0049182 .0003602 13.65 0.000 .004212 .0056244
eduy_hhhead | .0075136 .0011997 6.26 0.000 .0051613 .0098659
_cons | 7.537586 .0171067 440.62 0.000 7.504045 7.571126
------------------------------------------------------------------------------
. regress log_pccal age_hhhead eduy_hhhead log_pcexp [pw=hhweight], r
Linear regression Number of obs = 3355
F( 3, 3351) = 601.38
Prob > F = 0.0000
R-squared = 0.4123
Root MSE = .20332
------------------------------------------------------------------------------
| Robust
log_pccal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_hhhead | .001919 .0002945 6.52 0.000 .0013415 .0024964
eduy_hhhead | -.0082508 .001044 -7.90 0.000 -.0102977 -.0062039
log_pcexp | .3777407 .0100402 37.62 0.000 .3580552 .3974262
_cons | 4.795607 .0730719 65.63 0.000 4.652337 4.938877
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
log_pcexp | 1.20 0.832228
eduy_hhhead | 1.16 0.863121
age_hhhead | 1.07 0.930743
-------------+----------------------
Mean VIF | 1.14
. su log_pccal eduy_hhhead log_pcexp, d
log_pccal
-------------------------------------------------------------
Obs 3698
Mean 7.783589
Std. Dev. .276406
Variance .0764003
Skewness .0350145
Kurtosis 3.511389
years of education of household head
-------------------------------------------------------------
Obs 3698
Sum of Wgt. 3698
Mean 2.984857
Std. Dev. 3.776812
Variance 14.26431
Skewness .9461994
Kurtosis 2.751041
log of hh per capita expenditure
-------------------------------------------------------------
Obs 3698
Sum of Wgt. 3698
Mean 7.762185
Std. Dev. .4636838
Variance .2150027
Skewness .4395734
Kurtosis 3.433132
. pwcorr log_pccal age_hhhead eduy_hhhead log_pcexp, sig
| log~ccal age_hh~d eduy_h~d log_pc~p
-------------+------------------------------------
log_pccal | 1.0000
|
|
age_hhhead | 0.2282 1.0000
| 0.0000
|
eduy_hhhead | 0.0855 -0.1133 1.0000
| 0.0000 0.0000
|
log_pcexp | 0.6401 0.1796 0.3254 1.0000
| 0.0000 0.0000 0.0000
|