Summary of a GAM fit

Question

If we fit a GAM like:

gam.fit = gam::gam(Outstate ~ Private + s(Room.Board, df = 2) + s(PhD, df = 2) + 
    s(perc.alumni, df = 2) + s(Expend, df = 5) + s(Grad.Rate, df = 2), data = College)

Where, we use the dataset College, that can be found inside the package ISLR.
Now, if we find the summary of this fit, then we can see that:

> summary(gam.fit)

Call: gam(formula = Outstate ~ Private + s(Room.Board, df = 2) + s(PhD, 
    df = 2) + s(perc.alumni, df = 2) + s(Expend, df = 5) + s(Grad.Rate, 
    df = 2), data = College)
Deviance Residuals:
     Min       1Q   Median       3Q      Max 
-7522.66 -1140.99    55.18  1287.51  7918.22 

(Dispersion Parameter for gaussian family taken to be 3475698)

    Null Deviance: 12559297426 on 776 degrees of freedom
Residual Deviance: 2648482333 on 762.0001 degrees of freedom
AIC: 13924.52 

Number of Local Scoring Iterations: 2 

Anova for Parametric Effects
                        Df     Sum Sq    Mean Sq F value    Pr(>F)    
Private                  1 3377801998 3377801998 971.834 < 2.2e-16 ***
s(Room.Board, df = 2)    1 2484460409 2484460409 714.809 < 2.2e-16 ***
s(PhD, df = 2)           1  839368837  839368837 241.496 < 2.2e-16 ***
s(perc.alumni, df = 2)   1  509679160  509679160 146.641 < 2.2e-16 ***
s(Expend, df = 5)        1 1019968912 1019968912 293.457 < 2.2e-16 ***
s(Grad.Rate, df = 2)     1  148052210  148052210  42.596 1.227e-10 ***
Residuals              762 2648482333    3475698                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Anova for Nonparametric Effects
                       Npar Df Npar F   Pr(F)    
(Intercept)                                      
Private                                          
s(Room.Board, df = 2)        1  3.480 0.06252 .  
s(PhD, df = 2)               1  1.916 0.16668    
s(perc.alumni, df = 2)       1  1.471 0.22552    
s(Expend, df = 5)            4 34.350 < 2e-16 ***
s(Grad.Rate, df = 2)         1  1.981 0.15971    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here, I do not understand the meaning of the part "Anova for parametric effects" as well as "Anova for non-parametric effects". Though I do know the working of the ANOVA test, but I am unable to understand the "parametric effects" and "non parametric effects" part of the summary.So, what do they mean? What is their significance?

This question arose due to the (d) part of this answer, for the question 10, chapter 7 of Introduction of statistical Learning.

Gavin Simpson · Accepted Answer · 2017-10-04T15:23:06.197

The way the output of this approach to fitting GAMs is structured is to group the linear parts of the smoothers in with the other parametric terms. Notice Private has an entry in the first table but it's entry is empty in the second. This is because Private is a strictly parametric term; it is a factor variable and hence is associated with an estimated parameter which represents the effect of Private. The reason the smooth terms are separated into two types of effect is that this output allows you to decide if a smooth term has

a nonlinear effect: look at the nonparametric table and assess significance. If significance, leave as a smooth nonlinear effect. If insignificant, consider the linear effect (2. below)
a linear effect: look at the parametric table and assess the significance of the linear effect. If significant you can turn the term into a smooth s(x) -> x in the formula describing the model. If insignificant you might consider dropping the term from the model entirely (but do be careful with this --- that amounts to a strong statement that the true effect is == 0).

Parametric table

Entries here are like what you'd get if you fitted this a linear model and computed the ANOVA table, except no estimates for any associated model coefficients are shown. Instead of estimated coefficients and standard errors, and associated t or Wald tests, the amount of variance explained (in terms of sums of squares) is shown alongside F tests. As with other regression models fitted with multiple covariates (or functions of covariates), the entries in the table are conditional upon the other terms/functions in the model.

Nonparametric table

The nonparametric effects relate to the nonlinear parts of the smoothers fitted. Non of these nonlinear effects is significant except for the nonlinear effect of Expend. There is some evidence of a nonlinear effect of Room.Board. Each of this is associated with some number of non-parametric degrees of freedom (Npar Df) and they explain an amount of variation in the response, the amount of which is assessed via a F test (by default, see argument test).

These tests in the nonparametric section can be interpreted as test of the null hypothesis of a linear relationship instead of a nonlinear relationship.

The way you can interpret this is that only Expend warrants being treated as a smooth nonlinear effect. The other smooths could be converted to linear parametric terms. You may want to check that the smooth of Room.Board continues to have an non-significant non-parametric effect once you convert the other smooths to linear, parametric terms; it may be that the effect of Room.Board is slightly nonlinear but this is being affected by the presence of the other smooth terms in the model.

However, a lot of this might depend on the fact that many smooths were only allowed to use 2 degrees of freedom; why 2?

Automatic smoothness selection

Newer approaches to fitting GAMs would choose the degree of smoothness for you via automatic smoothness selection approaches such as the penalised spline approach of Simon Wood as implemented in recommended package mgcv:

data(College, package = 'ISLR')
library('mgcv')

set.seed(1)
nr <- nrow(College)
train <- with(College, sample(nr, ceiling(nr/2)))
College.train <- College[train, ]
m <- mgcv::gam(Outstate ~ Private + s(Room.Board) + s(PhD) + s(perc.alumni) + 
               s(Expend) + s(Grad.Rate), data = College.train,
               method = 'REML')

The model summary is more concise and directly considers the smooth function as a whole rather than as a linear (parametric) and nonlinear (nonparametric) contributions:

> summary(m)

Family: gaussian 
Link function: identity 

Formula:
Outstate ~ Private + s(Room.Board) + s(PhD) + s(perc.alumni) + 
    s(Expend) + s(Grad.Rate)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   8544.1      217.2  39.330   <2e-16 ***
PrivateYes    2499.2      274.2   9.115   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                 edf Ref.df      F  p-value    
s(Room.Board)  2.190  2.776 20.233 3.91e-11 ***
s(PhD)         2.433  3.116  3.037 0.029249 *  
s(perc.alumni) 1.656  2.072 15.888 1.84e-07 ***
s(Expend)      4.528  5.592 19.614  < 2e-16 ***
s(Grad.Rate)   2.125  2.710  6.553 0.000452 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.794   Deviance explained = 80.2%
-REML = 3436.4  Scale est. = 3.3143e+06  n = 389

Now the output gathers the smooth terms and the parametric terms into separate tables, with the latter getting a more familiar output similar to that of a linear model. The smooth terms entire effect is shown in the lower table. These aren't the same tests as for the gam::gam model you show; they are tests against the null hypothesis that the smooth effect is a flat, horizontal line, a null effect or showing zero effect. The alternative is that the true nonlinear effect is different from zero.

Notice that the EDFs are all larger than 2 except for s(perc.alumni), suggesting that the gam::gam model may be a little restrictive.

The fitted smooths for comparison are given by

plot(m, pages = 1, scheme = 1, all.terms = TRUE, seWithMean = TRUE)

which produces

The automatic smoothness selection can also be co-opted to shrinking terms out of the model entirely:

Having done that, we see that the model fit has not really changed

> summary(m2)

Family: gaussian 
Link function: identity 

Formula:
Outstate ~ Private + s(Room.Board) + s(PhD) + s(perc.alumni) + 
    s(Expend) + s(Grad.Rate)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   8539.4      214.8  39.755   <2e-16 ***
PrivateYes    2505.7      270.4   9.266   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                 edf Ref.df      F  p-value    
s(Room.Board)  2.260      9  6.338 3.95e-14 ***
s(PhD)         1.809      9  0.913  0.00611 ** 
s(perc.alumni) 1.544      9  3.542 8.21e-09 ***
s(Expend)      4.234      9 13.517  < 2e-16 ***
s(Grad.Rate)   2.114      9  2.209 1.01e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.794   Deviance explained = 80.1%
-REML = 3475.3  Scale est. = 3.3145e+06  n = 389

All of the smooths seem to suggest slightly nonlinear effects even after we've shrunk the linear and nonlinear parts of the splines.

Personally, I find the output from mgcv easier to interpret, and because it has been shown that the automatic smoothness selection methods will tend to fit a linear effect if that is supported by the data.

+1 Great explanation. (I'm curious what you mean by a "nonparametric F test," though: how would it differ from a standard F test, if at all?) — whuber, Sep 28 '17 at 21:41
@whuber that's just poor phrasing; it is a standard F test, but because of the decomposition of smooths into linear and nonlinear parts those are tests of what the output call the "nonparametric" bit. I'll edit that. — Gavin Simpson, Sep 28 '17 at 21:49
Thanks for the great answer! Meanwhile, I was wondering about, "These tests in the nonparametric section can be interpreted as test of the null hypothesis of a linear relationship instead of a nonlinear relationship", in which "tests of the null hypothesis of a linear relationship" would mean that a higher $p$ value would give a higher possibility for the null hypothesis being true. That is, a non-linear relationship. Thus a lower one would indicate a linear relationship, thus `Expend` should have linear tendency instead of a non-linear one. But that is contradictory. — Mooncrater, Oct 02 '17 at 15:55
p-values don't work that way; for the test to even be performed, we have assumed that the null hypothesis **is true**. What that test is doing is saying is, assume that the relationship is linear, how much in conflict with that assumption is the evidence brought by the data? If the evidence brought by the data is consistent with that which we expect if the null were true. If the evidence is inconsistent with null hypothesis then we would be unlikely to have observed the data we did if the null were true. The p value is a measure of the evidence against the null. — Gavin Simpson, Oct 02 '17 at 18:02
So, in this instance, there is only strong evidence against the null hypothesis (of a linear relationship) for `Expend`. Hence, in a NHST framework (although that framework is not the only way to interpret p values or statistical results) you might reject the null for `Expend` and conclude that a non-linear relationship is more consistent with the data. For the other smooth terms, the evidence *as evaluated in those F tests* is consistent with the null and hence we accept that hypothesis and might conclude that those relationships are consistent with them being linear. — Gavin Simpson, Oct 02 '17 at 18:05
You say of the parametric table: "Entries here are essentially what you'd get if you fitted this as a linear model and computed the ANOVA table." I assume this is not literally true (i.e. that the nonlinear part of the model doesn't impact the significance estimates for the linear part at all). Surely the estimation of the linear and nonlinear parts is done jointly, and the non-linear part affects the coefficients for the linear part as well as there significance, right? — Jacob Socolar, Oct 04 '17 at 14:00
@JacobSocolar As far as I understand, the spline is decomposable into a linear component and some non-linear components (the number of which depend on the degrees of freedom allowed for the spline). What I meant was this is the kind of output you'd get from a linear model fit followed by ANOVA (i.e. an F test for the mean square). Bute yes, these are partial effects in the sense that the variance explained by one part of the model depends on the other terms (& basis functions) in the model. And yes, I didn't mean that quote literally; I'll reword to "like what you would...". — Gavin Simpson, Oct 04 '17 at 15:21

Summary of a GAM fit

1 Answers1

Parametric table

Nonparametric table

Automatic smoothness selection

Linked