1

I am seeking to generate a figure that allows for an immediate comparison between a linear trend and a non-linear (flexible) trend. Both of these trend lines will be adjusted for several covariates.

I am able to code these two models in R: one using lm() and one using gam(). These models are identical, except that the gam() model specifies the variable of interest as a spline, while the lm() model specifies the variable of interest as a linear continuous variable.

Can anyone offer advice as to how I might generate a figure the includes both the spline-modeled variable from the gam() function and the linear parameter estimate from the lm() function?

The two models are described as follows, where "EXPOSURE" is the variable of interest that I'd like to plot:

model.lm <- lm(OUTCOME ~ EXPOSURE + COVARIATE, data=data)

model.gam = gam(OUTCOME ~ s(EXPOSURE, k=5) + COVARIATE, data=data)

NOTE: I acknowledge that I am very unfamiliar with R (SAS is my primary statistical coding language), and apologize if this is a poor question to ask here.

  • Generally, you can plot fits like this: `plot(y ~ x, data = DF); curve(predict(fit, newdata = data.frame(x)), add = TRUE) #it has to be the letter x like this within curve`. Some adjustments might be necessary for your specific plots, but coding question are off-topic here. – Roland Jun 01 '18 at 06:22
  • Hi Roland, Thank you for your comment - quite helpful. And thank you for letting me know that coding questions are inappropriate here. I apologize, as I am a new user. – King_Aardvark Jun 02 '18 at 20:24

1 Answers1

2

We can't use covariates in formula in ggplot2 (if we use the tidyverse). So we have to build the models and the fit independently, merge the dataframes, then we can plot our results.

library(tidyverse)
library(mgcv)

set.seed(0)

mydata <- tibble(exposure = 1:20,
                 covariate = rnorm(20, 0, 0.1),
                 outcome = rnorm(20, 5, 2))

fit_lm <- mydata %>% 
  lm(outcome ~ exposure + covariate, data = .) %>% 
  predict(newdata = mydata, interval = "confidence") %>% 
  as_data_frame()

fit_gam <- mydata %>% 
  gam(outcome ~ s(exposure, k = 5) + covariate, data = .) %>% 
  predict(newdata = mydata, type = "link", se.fit = TRUE) %>% 
  as_data_frame() %>% 
  rename(fit_gam = fit) %>% 
  mutate(lwr_gam = fit_gam - 2 * se.fit,
         upr_gam = fit_gam + 2 * se.fit) %>% 
  select(-se.fit)

mydata %>% 
  bind_cols(fit_lm, fit_gam) %>% 
  ggplot() +
    geom_point(aes(exposure, outcome)) +
    geom_line(aes(exposure, fit), size = 1, color = "red") +
    geom_ribbon(aes(x = exposure, ymin = lwr, ymax = upr), alpha = 0.2, fill = "red") +
    geom_line(aes(exposure, fit_gam), size = 1, color = "blue") +
    geom_ribbon(aes(x = exposure, ymin = lwr_gam, ymax = upr_gam), alpha = 0.2, fill = "blue")

plot

NB : check that the confidence interval for the GAM is correct for you (see Confidence interval for GAM model).

mdag02
  • 313
  • 1
  • 5
  • +1 though you'd get much smoother curves if you predicted at say 100-200 evenly spaced values over the range of `exposure` rather than at the observed data points. – Gavin Simpson Jun 01 '18 at 15:26
  • You're right. It is left as an exercise for the reader :-) – mdag02 Jun 04 '18 at 09:46