I keep running into warnings in RStudio when I use subsets where p > n. ISLR 6.4.3 mentions that forward stepwise can be useful for high dimensional data, which I'm trying to just test out for learning purposes. It seems that all the examples I have found fit a full model first, but in all those examples n > p. Could someone point me in the right direction or fill me in on what I'm missing? Sample code below.
df <- read.csv('my_sample_data.csv')
df1 <- df[,1:200] # using a subset of the features to explore stepwise
set.seed(123)
# x <- df1[,-1]
# y <-df1[,1]
# train test split
trainIndex <- createDataPartition(df1$Age,p=.8,
list=FALSE,
times=1)
training <- df1[trainIndex,]
testing <- df1[-trainIndex,]
dim(training) # 130 200 ======= 130 samples, 200 features
# parameter tuning
fitControl <- trainControl(
method = "cv",
number = 5)
set.seed(42)
step.fit <- train(Age~., data=training,
method="glmStepAIC",
trControl=fitControl,
trace=FALSE,
)