Model specification for a multiple regression

Question

I would like compare the job satisfaction between two industries. I've conducted a t-test and came to the conclusion that they are statistically different at the 5% significance level.

Now I would like to analyze how professional and demographic variables influence job satisfaction in the two industries.

lm_industry1 <- lm(JS ~ AGE + EDU + SIZE + SCOPE + MGMT + SCOPE + TENURE + COLLAR + SEX, data = data_industry1)
lm_industry2 <- lm(JS ~ AGE + EDU + SIZE + SCOPE + MGMT + SCOPE + TENURE + COLLAR+SEX, data = data_industry2)

(COLLAR and SEX are binary, all the other variables are continuous)

Could this work with a multiple regression? What about non-linear relationships and interaction terms?

Robert Long · Answer 1 · 2020-08-11T09:57:19.017

Try not to be too concerned with statistical significance.

If you think that there is a nonlinear association between one of your explanatory variables and the outcome then you can include a nonlinear term for it, such as TENURE^2, TENURE^3, log(TENURE) etc. Another option is to use splines.

You can include interactions, for example with MGMT * SCOPE which is the same as MGMT + SCOPE + MGMT:SCOPE

At all times it is important to be guidend by the underlying theory, relations between variables (see my answer to your other question) and and not just to throw variables and nonlinear tersm in the model in the hope of finding a "significant" result.

Since you have data on two industries, the usual way to model differences is to use the entire dataset, not to split it into two, and include an "industry" variable in the model, which you would usually interact with the variables you think differ between the two.

Does this answer your question ? If so, please consider marking it as the accepted answer, and if not please let us know why. — Robert Long, Aug 24 '20 at 20:28

Model specification for a multiple regression

1 Answers1