Here is an example using logistic regression.
You would need to adapt this for other models. For example, if the model needs the predictors to be on the same scale, then you would need to add another step in the fit
and predict
functions to normalize all the predictors.
Also, you could make the number of components a tuning variable (see the other example that you mentioned on the package website).
set.seed(1)
dat <- twoClassSim(200)
funcs <- getModelInfo("glm", regex = FALSE)[[1]]
funcs$fit <- function(x, y, wts, param, lev, last, classProbs, ...) {
## Conduct PCA and generate the new predictors
for_pca <- 1:5
num_pc <- 2
pca <- preProcess(x[, 1:5], method = "pca")
pc <- predict(pca, x[, 1:5])[, 1:num_pc, drop = FALSE]
## glm needs a data frame and formula, so bind the data together
## in a data frame and attach the outcome
dat <- cbind(x[, -for_pca, drop = FALSE], pc)
dat <- as.data.frame(dat)
dat$y <- y
## Save the model and attache the information needed
## to predict new samples
out <- glm(y ~ ., data = dat, family = binomial)
out$pp <- pca
out$for_pca <- colnames(x)[for_pca]
out
}
funcs$predict <- function(modelFit, newdata, submodels = NULL) {
## Generate the PC's, attach and predict
pc <- predict(modelFit$pp, newdata[, modelFit$for_pca])
orig_vars <- !(colnames(newdata) %in% modelFit$for_pca)
dat <- cbind(newdata[, orig_vars, drop = FALSE],
pc)
dat <- as.data.frame(dat)
prob <- predict(modelFit, dat, type = "response")
## Predict the class
ifelse(prob >= .5,
modelFit$obsLevels[2],
modelFit$obsLevels[1])
}
set.seed(2)
mod <- train(Class ~ ., data = dat,
method = funcs,
trControl = trainControl(method = "cv"))
For this example:
> mod
Generalized Linear Model
200 samples
15 predictor
2 classes: 'Class1', 'Class2'
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
Resampling results
Accuracy Kappa Accuracy SD Kappa SD
0.665 0.321 0.116 0.232
> coef(mod$finalModel)
(Intercept) Linear04 Linear05 Linear06 Linear07
-0.22918645 -0.86777592 0.22813460 -0.59662663 0.52737593
Linear08 Linear09 Linear10 Nonlinear1 Nonlinear2
-0.21819921 0.50468429 -0.14011715 0.57582282 -0.18439884
Nonlinear3 PC1 PC2
0.04742595 0.02288815 0.14073538
Max