I'd like to model an interaction term between a continuous variable and categorical variable, while accounting for possible aliasing in the variables. I was wondering what the best way to do this was.
As an example, suppose I have a data set containing damages incurred, car type (sedan, truck, etc), car model, and car age.
Damages incurred ($) Type Model Age
1000 Sedan Hyundai G30S 10
300 Truck Ford F150 3
500 Motorcycle Yamaha F90 2
I'd like to include Age
as a predictor, but I have reason to suspect that car age is associated with the car type, i.e. for instance, car ages affect losses very differently for sedans than for trucks. So preferably I'd like to include an interaction term, Type:Age
, to account for this.
I also want to include Model
, however, once I know the car model I definitely know the car type, so I cannot include Type
in the modeling equation due to possible aliasing.
However, I don't want to use Model:Age
in the modeling equation, because I have reason to believe that the car model doesn't add much more information than the car type; i.e. car type and age combined have the same effect as car model and age. However, including Model:Age
can significantly increase the number of degrees of freedom, since there are so many kinds of car models.
So is there a way to somehow include Age:Type
, Model
, and Age
in the GLM without dealing with significant issues in the model output? Or if not, what would be the best way around it?