1

I would like to know how professional and demographic variables influence job satisfaction. Which method do I have to use?

lm.out <- lm(pv1~AGE+EDU+SIZE+SCOPE+MGMT+SCOPE+TENURE+COLLAR+SEX, data=data)

or this model?

output1 <- lm(pv1~AGE, data=data)    
output2 <- lm(pv1~AGE+EDU, data=data)    
output3 <- lm(pv1~AGE+EDU+SIZE, data=data)    
output4 <- lm(pv1~AGE+EDU+SIZE+SCOPE, data=data)    
output5 <- lm(pv1~AGE+EDU+SIZE+SCOPE+MGMT, data=data)    
output6 <- lm(pv1~AGE+EDU+SIZE+SCOPE+MGMT+SCOPE, data=data)    
output7 <- lm(pv1~AGE+EDU+SIZE+SCOPE+MGMT+SCOPE+TENURE+COLLAR, data=data)    
output8 <- lm(pv1~AGE+EDU+SIZE+SCOPE+MGMT+SCOPE+TENURE+COLLAR+SEX, data=data)
Stefan
  • 4,977
  • 1
  • 18
  • 38
  • What’s going on in those eight outputs, some kind of stepwise regression? Why not start with education or size? (Stepwise regression is discouraged, anyway, though I want to read what you say is happening in those models.) – Dave Aug 09 '20 at 13:44
  • How would you analyse the effects of demographic and professional characteristics on job satisfaction? Just a multiple regression or a stepwise regression? –  Aug 09 '20 at 14:12

1 Answers1

4

Do not perform any stepwise procedure to select variables

Stepwise is bad. When you are interested in inference, as you seem to be, there is no hope in uncovering the "correct" estimates.

You are interested in:

how professional and demographic variables influence job satisfaction

The first thing to do is carefully consider whether any of the variables in question are inter-related in a causal way. For example, education is likely to "cause" tenure - that is people with longer time in education may be more likely to be in a position of tenure. Perhaps tenure "causes" job satisfaction, but education itself does not. That means that the causal effect of education is mediated by tenure. Perhaps there is some direct effect of education, in which case it will be partially mediated. When you are interested in estimating the causal effect of education on job satisfaction, it is vitally important that you do not include mediators in the model. Thus, depending on what your main exposure is (for example, education here), you will have a different set of variables to condition on (include in your model). The general rules are:

  • include variables that are confounders (common cause of both the exposure and the outcome
  • do not include variables that are mediators (lie on the causal path between the exposure and outcome)
  • include variables that are competing exposures - causes of the outcome that are unrelated to the exposure.

The best way to do this is by using a causal diagram (or DAG). See this question and answers for details of how to do that:
How do DAGs help to reduce bias in causal inference?

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • Can I treat it as a linear regression. Or would a nonlinear regression make more sense? –  Aug 10 '20 at 07:48
  • 1
    That is impossible to say. But you can easily extent a linear model with nonlinear terms if you think that is justified, fo example by including `AGE^2` in the model. It is still a linear model though. – Robert Long Aug 10 '20 at 07:57