Interpretation of quantitative variable in regressions with and without dummy variables

Question

I was provided with results from two regressions:

(1)

log(annual_salary) = B1*yrs_experience + B2*PhD + B3*Masters + B4*Bachelors + e

(2)

log(annual_salary) = B1*yrs_experience + e

where 'annual_salary' and 'yrs_of experience' are quantitative variables, and 'Phd', 'Masters' and 'Bachelors' are qualitative variables taking on values 1 or 0 when appropriate.

Should/will the coefficient on 'yrs_experience' be the same in both equations? And, what is the interpretation of the coefficient in both equations?

Note that the intercept is omitted in both cases. In (1) because the researchers are interested in all groups alone, not in their relation to a base group. This seems fine. In (2), however, I am not convinced that the researchers are doing the right thing in removing the intercept based on the discussion here.

Second one mean people with 0 yrs_experience will get the annual_salary $1. If this is acceptable, the model is acceptable. if not, need to add intercept in model. — user158565, Oct 22 '18 at 15:54

Isabella Ghement · Accepted Answer · 2018-10-22T17:44:57.483

The first regression model is in effect a collection of 3 sub-models, with each sub-model being applicable to a different type of degree holder.

Sub-model (1a): PhD degree holders

log(annual_salary) = B2 + B1*yrs_experience + e

Sub-model (1b): Masters degree holders

log(annual_salary) = B3 + B1*yrs_experience + e

Sub-model (1c): Bachelors degree holders

log(annual_salary) = B4 + B1*yrs_experience + e

The sub-models assume that each type of degree holder starts out with a different baseline log(annual_salary) but after that the log(annual_salary) increases at the same rate for each additional year of experience.

The coefficient B1 in each sub-model represents the amount by which the log(annual_salary) changes, on average, for each year of additional experience among people holding the same degree. (The change will most likely be an increase.)

The second model looks at the relationship between log(annual_salary) and yrs_experience, regardless of (or ignoring) the degree type.

The coefficient B1 in the second model represents the amount by which the log(annual_salary) changes, on average, for each year of additional experience regardless of (or ignoring) the type of degree.

As pointed out by a_statistician, the second model implies that if someone - regardless of their degree - has zero years of experience (i.e., no experience), then their predicted log(annual_salary) would be equal to 0, which is to say that their predicted annual_salary would be equal to 1 (the units of 1 would be the same as the units of the data). This implication may be either too strong or nonsensical altogeher, so I concur with you that the second model desperately needs an intercept.

There could be a third model considered, which allows the rate of change in the log(annual_salary) as a function of yrs_experience to be different for different degree holders.

Interpretation of quantitative variable in regressions with and without dummy variables

1 Answers1