This is closely related to a question I asked yesterday but I've now got a much more complete answer on which I was hoping to get feedback. The previous question was just looking for conceptual advice and was very helpful. You can find the relevant data and introduction here.
I wanted to find period effects for each age group. I've run two regressions using dummies as part of an interaction term. I'm hoping to see if my method is flawed and if my interpretation of the results is correct or not. They first regression is as follows:
> ## Generate YearDummy and AgeGroupDummy using factor()
>
> YearDummy <- factor(YearVar)
> AgeGroupDummy <- factor(AgeGroup)
>
> ## Check to see that YearDummy and CohortDummy are indeed factor variables
>
> is.factor(YearDummy)
[1] TRUE
> is.factor(AgeGroupDummy)
[1] TRUE
> ## Regress on AgeGroup and include AgeGroup*YearDummy interaction terms
>
> PooledOLS1 <- lm(PPHPY ~ AgeGroup + AgeGroup*YearDummy + 0,
data=maildatapooled)
> summary(PooledOLS1)
Call:
lm(formula = PPHPY ~ AgeGroup + AgeGroup * YearDummy + 0, data =
maildatapooled)
Residuals:
Min 1Q Median 3Q Max
-38.852 -10.632 3.298 11.275 26.481
Coefficients:
Estimate Std. Error t value Pr(>|t|)
AgeGroup 26.2212 3.5070 7.477 3.84e-10 ***
YearDummy1 119.8836 15.6840 7.644 1.99e-10 ***
YearDummy2 123.7458 15.6840 7.890 7.55e-11 ***
YearDummy3 103.2660 15.6840 6.584 1.28e-08 ***
YearDummy4 97.7102 15.6840 6.230 5.06e-08 ***
YearDummy5 103.3295 15.6840 6.588 1.26e-08 ***
YearDummy6 103.2330 15.6840 6.582 1.29e-08 ***
YearDummy7 84.8291 15.6840 5.409 1.16e-06 ***
YearDummy8 70.7114 15.6840 4.509 3.09e-05 ***
YearDummy9 90.9566 15.6840 5.799 2.65e-07 ***
YearDummy10 50.0885 15.6840 3.194 0.00224 **
YearDummy11 37.7004 15.6840 2.404 0.01933 *
YearDummy12 33.1947 15.6840 2.116 0.03846 *
AgeGroup:YearDummy2 1.8066 4.9597 0.364 0.71695
AgeGroup:YearDummy3 -3.8022 4.9597 -0.767 0.44632
AgeGroup:YearDummy4 -1.7436 4.9597 -0.352 0.72640
AgeGroup:YearDummy5 -6.0494 4.9597 -1.220 0.22735
AgeGroup:YearDummy6 -6.7992 4.9597 -1.371 0.17552
AgeGroup:YearDummy7 -3.6752 4.9597 -0.741 0.46158
AgeGroup:YearDummy8 -0.4799 4.9597 -0.097 0.92323
AgeGroup:YearDummy9 -9.8190 4.9597 -1.980 0.05232 .
AgeGroup:YearDummy10 -2.2452 4.9597 -0.453 0.65241
My interpretation of the interaction term coefficients is that they represent the difference in slope of AgeGroup between the period of the corresponding YearDummy and the AgeGroup slope at the top of the results. This is kind of like the AgeGroup effect across different periods.
My second regression is as follows:
> ## Regress YearVar and Include YearVar*AgeGroupDUmmy
>
> PooledOLS2 <- lm(PPHPY ~ YearVar + YearVar*AgeGroupDummy + 0,
data=maildatapooled)
> summary(PooledOLS2)
Call:
lm(formula = PPHPY ~ YearVar + YearVar * AgeGroupDummy + 0, data =
maildatapooled)
Residuals:
Min 1Q Median 3Q Max
-29.345 -9.325 -0.915 8.540 40.150
Coefficients:
Estimate Std. Error t value Pr(>|t|)
YearVar -7.089 1.252 -5.664 3.04e-07 ***
AgeGroupDummy1 142.292 9.211 15.447 < 2e-16 ***
AgeGroupDummy2 185.508 9.211 20.139 < 2e-16 ***
AgeGroupDummy3 218.170 9.211 23.685 < 2e-16 ***
AgeGroupDummy4 255.733 9.211 27.763 < 2e-16 ***
AgeGroupDummy5 278.180 9.211 30.200 < 2e-16 ***
AgeGroupDummy6 300.910 9.211 32.667 < 2e-16 ***
AgeGroupDummy7 282.325 9.211 30.650 < 2e-16 ***
YearVar:AgeGroupDummy2 -1.737 1.770 -0.981 0.3298
YearVar:AgeGroupDummy3 -2.401 1.770 -1.357 0.1792
YearVar:AgeGroupDummy4 -3.772 1.770 -2.131 0.0366 *
YearVar:AgeGroupDummy5 -2.915 1.770 -1.647 0.1040
YearVar:AgeGroupDummy6 -3.587 1.770 -2.026 0.0465 *
YearVar:AgeGroupDummy7 -2.372 1.770 -1.340 0.1846
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.97 on 70 degrees of freedom
Multiple R-squared: 0.9946, Adjusted R-squared: 0.9935
F-statistic: 917.9 on 14 and 70 DF, p-value: < 2.2e-16
My interpretation of the interaction term coefficients here is that they represent the difference in slope of YearVar between the corresponding AgeGroup in the interaction term and the YearVar result at the very top. That is, they are something like a period effect across the different age groups.
Can anyone see a problem with what I've done here or with my interpretation? This second regression is the closest thing to period effects across distinct age groups that I've been able to muster. Any critiques/new ideas are welcome.