Given the dataset cars.txt, we want to formulate a good regression model for the Midrange Price using the variables Horsepower, Length, Luggage, Uturn, Wheelbase, and Width. Both:
- using all possible subsets selection, and
- using an automatic selection technique.
For the first part, we do in R:
cars <- read.table(file=file.choose(), header=TRUE)
names(cars)
#regression
attach(cars)
leap <- leaps(x=cbind(cars$Horsepower, cars$Length, cars$Luggage, cars$Uturn, cars$Wheelbase, cars$Width),
y=cars$MidrangePrice, method=c("r2"), nbest=3)
combine <- cbind(leap$which,leap$size, leap$r2)
n <- length(leap$size)
dimnames(combine) <- list(1:n,c("horsep","length","Luggage","Uturn","Wheelbase","Width","size","r2"))
round(combine, digits=3)
leap.cp <- leaps(x=cbind(cars$Horsepower, cars$Length, cars$Luggage, cars$Uturn, cars$Wheelbase, cars$Width),
y=cars$MidrangePrice, nbest=3)
combine.cp <- cbind(leap.cp$which,leap.cp$size, leap.cp$Cp)
dimnames(combine.cp) <- list(1:n,c("horsep","length","Luggage","Uturn","Wheelbase","Width","size","cp"))
round(combine.cp, digits=3)
plot(leap.cp$size, leap.cp$Cp, ylim=c(1,7))
abline(a=0, b=1)
Am I correct in my interpretation that the most adequate model is one with 4 parameters (the three variables Horsepower, Wheelbase and Width) because it has the lowest Mallows' Cp value?
For the second part, we can choose between the forward, backward or stepwise selection models:
#stepwise selection methods
#forward
slm.foward <- step(lm(cars$MidrangePrice ~1, data=cars), scope=~cars$Horsepower + cars$Length + cars$Luggage + cars$Uturn + cars$Wheelbase + cars$Horsepower+ cars$Width, direction="forward")
#backward
reg.lm1 <- lm(cars$MidrangePrice ~ cars$Horsepower + cars$Length + cars$Luggage + cars$Uturn + cars$Wheelbase + cars$Horsepower + cars$Width)
slm.backward <- step(reg.lm1, direction="backward")
#stepwise
reg.lm1 <- lm(cars$MidrangePrice ~ cars$Horsepower + cars$Length + cars$Luggage + cars$Uturn + cars$Wheelbase + cars$Horsepower + cars$Width)
slm.stepwise <- step(reg.lm1,direction="both")
How do I interpret the results I get from this R code?