I wanted to understand more about whether my implementation of the multivariate adaptive regression splines is correct. I have crop yield data from multiple locations and year and I want to predict yield as a function of location, year and some climate variables.
Before running mars (from earth package), I converted location and year as factors in R
dat$year <- as.factor(dat$year)
dat$location.id <- as.factor(dat$location.id)
I also converted yield values into log to avoid negative prediction
dat$log.yld <- log(dat$yld)
And then fitted my model:
earth(x = dat[,index of predictors that include climate + loc + year],
y = dat[,65], # position of my log yield values
degree =2,
pmethod = "cv",
nfold = 10,
ncross = 3)
Is my implementation above is correct? How does earth handle categorical predictors like I have with location and year?
Thank you