0

I am trying to understand the effect that distance has on Hg levels in birds of 4 different species. I am most interested in the main effect of distance but I am including species as an interaction term bc hg levels do vary by species. However, I am not sure that I am interpreting the R output correctly.

summary(lm(blood_hg-1~GIS_distance*species-1, data=Adult_Bird))

This is the model that I am using:

lm(formula = blood_hg - 1 ~ GIS_distance * species - 1, data = Adult_Bird)

I included the "-1" so that R does not automatically use one of the species as the reference. Also, writing the model this way, I am hoping that the results will show the effect of species on bodd_hg rather the interaction with distance.

Here is the R output:

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4342 -0.4637 -0.1594  0.3469  3.2214 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
GIS_distance             -0.0046170  0.0016497  -2.799 0.005493 ** 

speciesCACH               0.6061536  0.1764384   3.435 0.000682 ***

speciesCARW               3.9002870  0.2088432  18.676  < 2e-16 ***

speciesEABL               0.0848200  0.0989441   0.857 0.392047    

speciesHOWR               0.5478451  0.1413647   3.875 0.000133 ***

GIS_distance:speciesCARW -0.0133468  0.0026402  -5.055 7.83e-07 ***

GIS_distance:speciesEABL  0.0030014  0.0017731   1.693 0.091638 . 

GIS_distance:speciesHOWR  0.0005963  0.0020599   0.289 0.772442   
  
---

Residual standard error: 0.8194 on 277 degrees of freedom
  (19 observations deleted due to missingness)
Multiple R-squared:  0.6023,    Adjusted R-squared:  0.5908 
F-statistic: 52.44 on 8 and 277 DF,  p-value: < 2.2e-16

My main question is how to interpret the interaction of GIS_distace and Species? And if what I am concerned with is the interaction of blood_hg and species, how do I manipulate the model to show me that?

I am learning so any advice is helpful!

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 2
    Possible duplicate of [Interaction term in linear regression](https://stats.stackexchange.com/questions/88349/interaction-term-in-linear-regression) – AdamO Aug 30 '17 at 20:59
  • 6
    I would bet you a dollar that you don't really want to remove the intercept from the model by using the *-1*. Also, saying "the interaction of blood_hg and species" doesn't make any sense because blood_hg is your dependent variable. We talk about the effect of the interaction of the independent variables *on* the dependent variable. Other than those points, your model makes sense. You might also look at the *Anova* function in the *car* package; it produces an anova table, which might be what you are looking for in interpreting your model. – Sal Mangiafico Aug 30 '17 at 23:15
  • @ Sal Mangiafico, thank you for your help! I am still a bit confused about the intercept though. It seems that when I do not remove it I cannot see the results for one of my species. So, is the model using this species as a reference? Also, I get very different results regarding which species show statistically significant results depending on whether I inlcude the intercept or not. Is the intercept is where my independent variable (distance) equals zero? But I am not sure I understand how to interpret it past that, especially regarding my interaction term of species. – Mikaela Aug 31 '17 at 22:23

2 Answers2

1

When there are interactions it's dangerous to interpret the "significance" of lower-level coefficients of predictors that are involved in interactions, or of individual interaction coefficients when there are more than 2 levels of an interacting categorical predictor. With interactions, you should evaluate all coefficients for a particular predictor together in a "chunk test" rather than focusing on individual coefficients.

First, in this model omitting an intercept, the coefficient for each species represents its value only when GIS_distance = 0. Does that represent a real-world value in this case? If a continuous predictor typically has values far away from 0, those individual coefficients might not be helpful to interpret.

Second, note that there is no GIS_distance:speciesCACH interaction coefficient reported. The coefficient reported for the continuous GIS_distance is the value only for the reference value of species, speciesCACH, and the 3 reported interaction coefficients are the differences from that value for each of the other species. So the problems you thought you saw when you included an intercept just re-appeared at the level of these interactions when you forced omission of an intercept.

As both Sal Mangiafico's comments and another answer indicate, it's best to include the intercept in your model. Although the reported coefficients then don't show the reference level of a categorical predictor and (with R's default treatment coding) have coefficients for other levels that represent differences from the reference level, you can always use the model to get point estimates and confidence intervals for any illustrative combination of predictor variables that you like. The rms and emmeans packages provide useful tools for that.

EdM
  • 57,766
  • 7
  • 66
  • 187
0

I'm not sure if R accounts for this intuitively but your model violates the Principle of Marginality

meaning that each interaction term in the model must have its respective term fitted on its own.

Moving on from that though, it appears from your output that only the CARW species is significant at the 5% level, however, only CARW is at the 5% level. Intuitively what this means is that a unit increase in GIS_distance when the species is CARW will produce a significant decrease in blood_hg. As for the other two, whilst an increase in GIS_distance will increase the predicted blood_hg, it will not be a statistically significant interaction.

Tom Pinder
  • 336
  • 3
  • 7
  • Ok. Is this interpretation correct then? Distance does have a significant effect on hg levels when all the species are combined but when the species are split, there is only a significant decline of hg values for the CARW species? And all species, except EABL, has a signigicant effect on hg levels. THis interpretation seems a bit odd really. – Mikaela Aug 30 '17 at 21:15
  • I thought this was saying that distance has a significant effect when all species are combined and for species except EABL. And that the interaction term of species shows that the slopes were variable depending on species. – Mikaela Aug 30 '17 at 21:22
  • @Mikaela, you are essentially conducting *analysis of covariance* (ancova). It will be helpful to look up interpretation of ancova. The example here may be helpful. https://rcompanion.org/rcompanion/e_04.html . – Sal Mangiafico Aug 30 '17 at 23:24