2

I am fitting time series of neuron spike data with a Poisson GAM. I am fitting it with the following call:

gam3.formula = "sig089a ~
 reward + s(time.from.release, k=c(30), id=0) + 
 s(time.from.next.press, k=c(30), id=0) +
 s(wrist.extensors, bs='tp', k=c(5),fx=F) + 
 s(wrist.flexors, bs='tp',k=c(5),fx=F) +
 s(biceps, bs='tp',k=c(5),fx=F) + 
 s(triceps, bs='tp',k=c(5),fx=F) +
 s(lag1, bs='tp', k=c(5),fx=F) +
 s(lag2, bs='tp', k=c(5),fx=F) +
 s(lag3, bs='tp', k=c(5),fx=F) +
 s(lag4, bs='tp', k=c(5),fx=F) +
 s(lag5, bs='tp', k=c(5),fx=F)"
gam3.formula = as.formula(gam3.formula)
gam3 = bam(gam3.formula, data=new.data, family=poisson(), select=T)

All variables except for reward are continuous variables. rewards is a factor with two different levels. lag1 to lag5 are lagged versions of sig089a.

The model completes with no problem with the default GCV method:

    > summary(gam3)

    Family: poisson 
    Link function: log 

    Formula:
    sig089a ~ reward + s(time.from.release, k = c(30), id = 0) + 
    s(time.from.next.press, k = c(30), id = 0) + s(wrist.extensors, 
    bs = "tp", k = c(5), fx = F) + s(wrist.flexors, bs = "tp", 
    k = c(5), fx = F) + s(biceps, bs = "tp", k = c(5), fx = F) + 
    s(triceps, bs = "tp", k = c(5), fx = F) + s(lag1, bs = "tp", 
    k = c(5), fx = F) + s(lag2, bs = "tp", k = c(5), fx = F) + 
    s(lag3, bs = "tp", k = c(5), fx = F) + s(lag4, bs = "tp", 
    k = c(5), fx = F) + s(lag5, bs = "tp", k = c(5), fx = F)

 Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)   
 (Intercept)  2.19980    0.75146   2.927  0.00342 **
 reward0     -0.01100    0.01514  -0.726  0.46756   
  ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Approximate significance of smooth terms:
                                edf Ref.df   Chi.sq  p-value    
 s(time.from.release)    1.748e+01     25 1061.985  < 2e-16 ***
 s(time.from.next.press) 2.417e+01     29  820.978  < 2e-16 ***
 s(wrist.extensors)      7.401e-04      4    0.001  0.42014    
 s(wrist.flexors)        1.184e+00      4    6.251  0.00683 ** 
 s(biceps)               5.710e-01      4    1.331  0.11758    
 s(triceps)              1.739e-04      4    0.000  1.00000    
 s(lag1)                 2.456e+00      4  150.869  < 2e-16 ***
 s(lag2)                 2.138e+00      4  109.107  < 2e-16 ***
 s(lag3)                 2.377e+00      4   73.978  < 2e-16 ***
 s(lag4)                 1.874e+00      4   30.263 2.67e-08 ***
 s(lag5)                 1.783e+00      4   29.503 2.70e-08 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 R-sq.(adj) =  0.114   Deviance explained = 11.4%
 fREML =  63436  Scale est. = 1         n = 45787

I can see this model's deviance explained is 11.4%. According to this post, GAM fitting using the default GCV smootheness can suffer from under-smoothing and REML is more robust to under-fitting. So I did the same call with REML, I then got:

   Parametric coefficients:
                Estimate Std. Error z value Pr(>|z|)   
   (Intercept)  2.19983    0.75144   2.927  0.00342 **
   reward0     -0.01098    0.01513  -0.726  0.46812   
   ---
   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

   Approximate significance of smooth terms:
                                edf Ref.df   Chi.sq  p-value    
   s(time.from.release)    17.484203     25 1061.972  < 2e-16 ***
   s(time.from.next.press) 24.165244     29  820.978  < 2e-16 ***
   s(wrist.extensors)       0.025424      4    0.020  0.37724    
   s(wrist.flexors)         1.175150      4    6.205  0.00696 ** 
   s(biceps)                0.578627      4    1.354  0.11595    
   s(triceps)               0.008631      4    0.001  0.79583    
   s(lag1)                  2.457036      4  150.868  < 2e-16 ***
   s(lag2)                  2.137905      4  109.099  < 2e-16 ***
   s(lag3)                  2.373759      4   73.983  < 2e-16 ***
   s(lag4)                  1.873649      4   30.265 2.67e-08 ***
   s(lag5)                  1.781395      4   29.501 2.68e-08 ***
   ---
   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

   R-sq.(adj) =  0.114   Deviance explained = -169%
   -REML =  63436  Scale est. = 1         n = 45787

Now the Deviance explained is negative at -169%. Plot and inspecting the smooth terms I don't see any difference. As far as I can tell fitting with GCV vs. fitting with REML here simply made Deviance explained negative. Why does this happen and does this say something about my model specification?

Asy
  • 151
  • 7

1 Answers1

1

The problem with gam should have been fixed in 1.8-24 (June 18, 2018). A similar problem with bam (mgcv_1.8-24: “fREML” or “REML” method of bam() gives wrong explained deviance) will be fixed in the next version.

Zheyuan Li
  • 674
  • 1
  • 4
  • 13