Multiple linear regresion in R

Question

I am creating a multiple linear regression model $M_1$ with aggression as response and parenting_style and sibling_aggression as explanatory variables using the child_aggression data.

Below is my code:

m1 <- linear_reg() %>%
  set_engine("lm") %>%
  fit(aggression ~ parenting_style + sibling_aggression, data = cha) 
m1 %>% tidy()

The output:

# A tibble: 3 x 5
  term               estimate std.error statistic     p.value
  <chr>                 <dbl>     <dbl>     <dbl>       <dbl>
1 (Intercept)        -0.00578    0.0121    -0.479 0.632      
2 parenting_style     0.0620     0.0123     5.06  0.000000551
3 sibling_aggression  0.0934     0.0375     2.49  0.0130

how can I interpret each of the 3 model coefficients in simple terms?

score 4 · Accepted Answer · answered May 19 '21 at 22:29

I would interpret these results as:

every 1 unit increase in parenting_style is associated with a 0.06 increase in aggression
every 1 unit increase in sibling_aggression is associated with a 0.09 increase in aggression
the p-values indicate that if these associations were actually absent, the probability of observing these data (or data even more extreme) is very low.

I would also like to make a few points about this model which will hopefully be useful. First, I assume you have thought about the causal relations between these variables. When dealing with causal inference it is important to decide which variable is the main exposure, for which you want to estimate the total causal effect. Is it possible that sibling_aggression has a causal effect on parenting_style? If so then you need to remove parenting_style from the model because it is a mediator. If not, perhaps it is a confounder ? If so then you should retain it. Second, do you expect the associations between these variables to be linear ? Linearity is often plausible over a small range, but over a bigger range sometimes you need to allow for non-linear associations, include interactions ? For instance, perhaps the "effect" of parenting style is different depending on the level of sibling_aggression. If so, then you would want to consider the interaction between these variables. Or, perhaps the direct associations are quadratic, logarithmic or some other non-linear relationships. If so then you can can transform the variables or introduce non-linear terms to the model. Third, what scale are the variables on ? Typically in my experience these kinds of data are often ordinal, rather than continuous ? If so, then this should also be taken into account.

Simple models are good. As Einstein, once said "Everything should be made as simple as possible, but not simpler". However it is important to consider that a model may be too simplistic to be useful.

If you would like to know more about how to estimate causal effects while minimising any biases, please refer to this question and answer:

How do DAGs help to reduce bias in causal inference?

I hope that at least some of this helps !

Just one small question does the intercept really carry any meaning while interpreting the coefficient terms or just because it is a constant term we do not consider it? Apart from that, this a very excellent answer. And thanks for the redirection for the causal effect. Cheers! — Ranji Raj, May 19 '21 at 22:44
You're very welcome. The intercept is interpreted as the expected value of `aggression` when `parenting_style` and `sibling_aggression` are BOTH ZERO. So a lot will depend on the scales of your variables and whether a value of zero for them makes sense. This is one reason why, when dealing with continuous variables, it can be a good idea to centre them about their mean; this way, the interpretation refers to the mean of a particular variable, rather than it being zero which often makes no sense (eg. weight of a person is never zero) — Robert Long, May 19 '21 at 22:59
Just to summarize if I understood your comment on `intercept` whether it is positive or negative we should not try attributing any meaning to it. Is that right? — Ranji Raj, May 19 '21 at 23:07
It depends on the context and in particular on the scale(s) of the variables. If it doesn't make sense for the variables to be zero, then you might consider centereing them about their mean. Then the intercept takes on the interpretation of the value of the responses when the covariates are at their means. — Robert Long, May 20 '21 at 07:29

Multiple linear regresion in R

1 Answers1