1

This is closely related to a question I asked yesterday but I've now got a much more complete answer on which I was hoping to get feedback. The previous question was just looking for conceptual advice and was very helpful. You can find the relevant data and introduction here.

I wanted to find period effects for each age group. I've run two regressions using dummies as part of an interaction term. I'm hoping to see if my method is flawed and if my interpretation of the results is correct or not. They first regression is as follows:

> ## Generate YearDummy and AgeGroupDummy using factor()
> 
> YearDummy <- factor(YearVar)
> AgeGroupDummy <- factor(AgeGroup)
> 
> ## Check to see that YearDummy and CohortDummy are indeed factor variables
> 
> is.factor(YearDummy)
[1] TRUE
> is.factor(AgeGroupDummy)
[1] TRUE
> ## Regress on AgeGroup and include AgeGroup*YearDummy interaction terms
> 
> PooledOLS1 <- lm(PPHPY ~ AgeGroup + AgeGroup*YearDummy + 0, 
data=maildatapooled)
> summary(PooledOLS1)
Call:
lm(formula = PPHPY ~ AgeGroup + AgeGroup * YearDummy + 0, data = 
maildatapooled)
Residuals:
 Min 1Q Median 3Q Max 
-38.852 -10.632 3.298 11.275 26.481 
Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
AgeGroup 26.2212 3.5070 7.477 3.84e-10 ***
YearDummy1 119.8836 15.6840 7.644 1.99e-10 ***
YearDummy2 123.7458 15.6840 7.890 7.55e-11 ***
YearDummy3 103.2660 15.6840 6.584 1.28e-08 ***
YearDummy4 97.7102 15.6840 6.230 5.06e-08 ***
YearDummy5 103.3295 15.6840 6.588 1.26e-08 ***
YearDummy6 103.2330 15.6840 6.582 1.29e-08 ***
YearDummy7 84.8291 15.6840 5.409 1.16e-06 ***
YearDummy8 70.7114 15.6840 4.509 3.09e-05 ***
YearDummy9 90.9566 15.6840 5.799 2.65e-07 ***
YearDummy10 50.0885 15.6840 3.194 0.00224 ** 
YearDummy11 37.7004 15.6840 2.404 0.01933 * 
YearDummy12 33.1947 15.6840 2.116 0.03846 * 
AgeGroup:YearDummy2 1.8066 4.9597 0.364 0.71695 
AgeGroup:YearDummy3 -3.8022 4.9597 -0.767 0.44632 
AgeGroup:YearDummy4 -1.7436 4.9597 -0.352 0.72640 
AgeGroup:YearDummy5 -6.0494 4.9597 -1.220 0.22735 
AgeGroup:YearDummy6 -6.7992 4.9597 -1.371 0.17552 
AgeGroup:YearDummy7 -3.6752 4.9597 -0.741 0.46158 
AgeGroup:YearDummy8 -0.4799 4.9597 -0.097 0.92323 
AgeGroup:YearDummy9 -9.8190 4.9597 -1.980 0.05232 . 
AgeGroup:YearDummy10 -2.2452 4.9597 -0.453 0.65241 

My interpretation of the interaction term coefficients is that they represent the difference in slope of AgeGroup between the period of the corresponding YearDummy and the AgeGroup slope at the top of the results. This is kind of like the AgeGroup effect across different periods.

My second regression is as follows:

> ## Regress YearVar and Include YearVar*AgeGroupDUmmy
> 
> PooledOLS2 <- lm(PPHPY ~ YearVar + YearVar*AgeGroupDummy + 0, 
data=maildatapooled)
> summary(PooledOLS2)
Call:
lm(formula = PPHPY ~ YearVar + YearVar * AgeGroupDummy + 0, data = 
maildatapooled)
Residuals:
 Min 1Q Median 3Q Max 
-29.345 -9.325 -0.915 8.540 40.150 
Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
YearVar -7.089 1.252 -5.664 3.04e-07 ***
AgeGroupDummy1 142.292 9.211 15.447 < 2e-16 ***
AgeGroupDummy2 185.508 9.211 20.139 < 2e-16 ***
AgeGroupDummy3 218.170 9.211 23.685 < 2e-16 ***
AgeGroupDummy4 255.733 9.211 27.763 < 2e-16 ***
AgeGroupDummy5 278.180 9.211 30.200 < 2e-16 ***
AgeGroupDummy6 300.910 9.211 32.667 < 2e-16 ***
AgeGroupDummy7 282.325 9.211 30.650 < 2e-16 ***
YearVar:AgeGroupDummy2 -1.737 1.770 -0.981 0.3298 
YearVar:AgeGroupDummy3 -2.401 1.770 -1.357 0.1792 
YearVar:AgeGroupDummy4 -3.772 1.770 -2.131 0.0366 * 
YearVar:AgeGroupDummy5 -2.915 1.770 -1.647 0.1040 
YearVar:AgeGroupDummy6 -3.587 1.770 -2.026 0.0465 * 
YearVar:AgeGroupDummy7 -2.372 1.770 -1.340 0.1846 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.97 on 70 degrees of freedom
Multiple R-squared: 0.9946, Adjusted R-squared: 0.9935 
F-statistic: 917.9 on 14 and 70 DF, p-value: < 2.2e-16

My interpretation of the interaction term coefficients here is that they represent the difference in slope of YearVar between the corresponding AgeGroup in the interaction term and the YearVar result at the very top. That is, they are something like a period effect across the different age groups.

Can anyone see a problem with what I've done here or with my interpretation? This second regression is the closest thing to period effects across distinct age groups that I've been able to muster. Any critiques/new ideas are welcome.

thagzone
  • 217
  • 2
  • 10
  • If you have a genuinely new question to ask, there is no need to reproduce all the data and output that appear in your first question: just link to it. That will help us see in what way your new question differs from the old--and will help many more people actually read through to the end. – whuber Jun 27 '13 at 16:54
  • Ok, my apologies. I didn't know the procedure and I wanted to be thorough. I'll edit the beginning to simply link to my old question. – thagzone Jun 27 '13 at 16:57
  • 1
    I believe I've asked a genuinely new question, which is for feedback on my regressions and my interpretation of their coefficients. If you concur, I'd appreciate your un-marking this question as a duplicate. Thank you! – thagzone Jun 27 '13 at 17:43

1 Answers1

1

Some miscellaneous thoughts...

First, in a formula A*B expands to A + B + A:B, so A + A * B is redundant. It appears from your example that R does what you mean, but it'd be cleaner and less confusing to have a clean formula.

Second, why are you constraining the regressions to go through zero? It might make sense in your first regression, but it doesn't seem to make sense in the second.

Third, a good piece of advice from Andrew Gelman is, "Models with interactions can often be more easily interpreted if we first pre-process the data by centering each input variable about its mean or some other convenient reference point." And that actually applies to all of the coefficients, not just the interaction.

(The overall idea that the interaction represents the difference in slope of the groups is correct. The details of what that actually means can be tricky.)

EDIT: While I'm replying to this question, I'd ask about your previous question if you've considered all three possible options for a linear model: complete pooling, no pooling, and mixed-effects/hierarchical?

Wayne
  • 19,981
  • 4
  • 50
  • 99
  • Great, thanks for your answer. I think I had meant to take out the (A +...) business but had forgotten. Thanks for mentioning it. – thagzone Jun 27 '13 at 20:40
  • I had centered both models around 0 in order to eliminate the intercept because I was only interested in the interaction coefficients and figured it would be better to eliminate the intercept as opposed to one of the dummies. Maybe that doesn't make sense--I'll try it again with the intercept left in. Finally, if my aim is to find (crudely, probably) period effects across age groups, would you say the 1st or 2nd regression better accomplishes that? – thagzone Jun 27 '13 at 20:45
  • @thagzone: I didn't comment on the first posting because I'm not sure what you mean by "period effects". Do you simply mean a linear trend over time? I keep thinking that "period effect" would mean something like comparing changes decade-by-decade or something like that. – Wayne Jun 27 '13 at 20:52
  • Good point, it is a bit unclear. The time period in which PPHPY is measured is likely to have some effect on PPHPY (call them "period effects"), and what I want to know is if the time period had a different effect on different age groups. That is, if different age groups were affected disproportionately by those period effects. I hope that makes some sense. – thagzone Jun 27 '13 at 21:07