I am building linear models by adding one variable at a time. I am interested in studying the effects of each variable and how those effects change as a new variable is added to the model. Basically, a step-wise regression that does not worry about significance.
Here is the code to my approach using the mtcars data as an example of what I am trying to accomplish:
library(broom)
library(tidyverse)
data("mtcars")
# creating empty list to store the models
models <- vector("list")
# getting the independent variables to put into the model
# also removing some variables due to collinearity
names <- colnames(mtcars[ ,-match(c("wt", "mpg", "disp", "cyl", "drat"), colnames(mtcars))])
# adding one independent variable to the model at a time
for (i in 1:length(names)){
f <- as.formula(paste("mpg ~", paste(names[1:i], collapse = "+")))
model <- lm(f, data=mtcars)
models[[i]] <- model
}
# Naming the models
names(models) <- paste0("MODEL", 1:length(names))
# getting the coeffecients
all_coefs <- plyr::ldply(models, tidy, .id = "model")
coefs <- all_coefs %>% select(-(std.error:p.value)) %>%
spread(model, estimate)
# getting the r2
all_r2 <- plyr::ldply(models, glance, .id = "model")
r2 <- all_r2 %>% select(-r.squared, -(sigma:df.residual)) %>%
spread(model, adj.r.squared) %>%
mutate(term = "adj.rsquared")
# gather the t-stats for each variable
p.value <- all_coefs %>% select(-(estimate:statistic)) %>%
spread(model, p.value)
# combing r-squared and coeffecients
model_results <- bind_rows(coefs, r2)
The question I have is whether or not this seems like an appropriate approach to take for studying a lot of different models? If not, what approaches would others suggest.