1

I wanted to understand more about whether my implementation of the multivariate adaptive regression splines is correct. I have crop yield data from multiple locations and year and I want to predict yield as a function of location, year and some climate variables.

Before running mars (from earth package), I converted location and year as factors in R

dat$year <- as.factor(dat$year)
dat$location.id <- as.factor(dat$location.id) 

I also converted yield values into log to avoid negative prediction

 dat$log.yld <- log(dat$yld)

And then fitted my model:

earth(x = dat[,index of predictors that include climate + loc + year],
      y = dat[,65], # position of my log yield values
      degree =2, 
      pmethod = "cv",
      nfold = 10,
      ncross = 3)

Is my implementation above is correct? How does earth handle categorical predictors like I have with location and year?

Thank you

89_Simple
  • 751
  • 1
  • 9
  • 23

1 Answers1

0

The factors are expanded (into dummy variables ) before being fed to the algorithm http://www.milbo.org/doc/earth-notes.pdf

SR13
  • 1