sub sample versus indicator variables (multiple regression)

Question

In my study I have an continuous dependent variable (return) regressed on an independent continuous variable x1 (momentum) and a number of control variables.

I am currently investigating whether this momentum variable x1 differs by property type. Property type is indicated by a categorical variable which can take on code 1 till 10 depending on the property type. For example code 4 means that the property is an apartment building.

I am mainly interested to see if the momentum effect is different for property types with code 1,2,3 and 4.

From what I understand I have two ways to do this:

(1) Create a subsample for all values with property type 1,2,3 or 4 and run the regression within this sub sample.

(2)Create N-1 dummy variables (so 9 in this case) for the Property type and the interaction terms with momentum, for example DumProptype_4 * momentum. Then I have 9 dummy variables and 9 interaction terms.

My Question(s):

-Is it true I can use one of the two methods described above, or are there restrictions for this?

-If using the dummy variable approach, should I also include 9 interaction terms apart from the 9 dummy variables, or only the interaction terms I am interested in?

Help would be greatly appreciated, thanks in advance!

Almost duplicates: https://stats.stackexchange.com/questions/373890/separate-models-vs-flags-in-the-same-model/373909#373909, https://stats.stackexchange.com/questions/486373/is-there-a-benefit-to-splitting-the-data-by-gender-or-age-range-when-building-pr/486461#486461 — kjetil b halvorsen, Feb 19 '21 at 15:14

score 0 · Answer 1 · answered May 20 '16 at 10:22

0

It can be a good starting point. For each of the property code types, you can plot dependent variable against the independent variable. Observe the trend across different property types. Then you can combine the property types, which have similar trend, and segment the population which have different trend and build different models for them.

Alternatively, you can also run a decision tree analysis with more than 2 splits, to see if its creating segments based on property code.

answered May 20 '16 at 10:22

user2542275

717
2
6
17

1

Thank you for your quick reaction. I just found another argument that would make a single equation using dummy variables difficult: if the control variables (for instance market cap) are different for the categories (property type) then the interaction terms (momentum*Dum_Category) produced should also interact with the control variables, for instance size. Do you agree with this? – S. Gontscharoff May 20 '16 at 10:33

sub sample versus indicator variables (multiple regression)

1 Answers1