2

I am not really confident in interpreting the ANOVA table of a GAM model. I understand how it can be used to compare models (see for instance this question), but I am interested in interpreting it for a single model.

For concreteness:

library( mgcv )
set.seed( 1 )
RawData <- data.frame( y = rbinom( 1000, 1, 0.5 ), x1 = rnorm( 1000 ), x2 = as.factor( rbinom( 1000, 1, 0.5 ) ), x3 = rnorm( 1000 ), x4 = as.factor( rbinom( 1000, 1, 0.5 ) ) )
fit <- gam( y ~ s( x1 ) + x2 + s( x3, by = x2 ) + x4, data = RawData, family = nb( link = log ) )
anova( fit )

Family: Negative Binomial(251657.167) 
Link function: log

Formula:
y ~ s(x1) + x2 + s(x3, by = x2) + x4

Parametric Terms:
   df Chi.sq p-value
x2  1  1.775   0.183
x4  1  0.796   0.372

Approximate significance of smooth terms:
            edf Ref.df Chi.sq p-value
s(x1)     1.000  1.000  0.047   0.828
s(x3):x20 1.000  1.000  0.078   0.779
s(x3):x21 1.000  1.001  0.188   0.665

In particular, I'd be interested in the following:

  1. Can chi.sq values be given an "explained variance" interpretation (or similar), i.e. can they be used to measure variable importance, just like for a usual linear model?
  2. Can the chi.sq values of the smooth and parametric terms handled similarly?
  3. What to do with interactions? (As x2 and x3 in the example: x3 appears on two lines, x2 appears in those, and as a parametric term in addition.)
Tamas Ferenci
  • 3,143
  • 16
  • 26

1 Answers1

4

It's probably best to take a look at the mgcv help file ?anova.gam in R, but in answer to the specific questions:

  1. The parametric chi.sq test statistics are just like their linear model equivalents, but the test statistic used for the smooths is different, and doesn't have an explained variance interpretation. I would not try to use them directly to measure variable importance. For details see http://opus.bath.ac.uk/32382/1/spv3.pdf.

  2. No, as explained above.

  3. I would fit the model with and without the interaction and compare (but probably by AIC).

Simon Wood

Simon Wood
  • 451
  • 2
  • 6
  • Thank you very much! In that case, may I ask you what do you suggest to measure the relative importance of _all_ variables (i.e. smooth terms included) in a GAM model...? – Tamas Ferenci Nov 10 '17 at 00:23
  • I have added this as a separate question ( https://stats.stackexchange.com/questions/313957/how-to-measure-variable-importance-in-a-gam-model ), after all it is indeed a different story, so it is better not to mix the two. – Tamas Ferenci Nov 15 '17 at 20:18