I am building a glm for machine learning purposes to predict value y.
I want to test the following formula in all such of ways that is considered high academic standard. I have read COOLSerdash's answer here, but I didn't find it to answer my question sufficiently.
I have built 4 training data sets as I want to conduct a 4-fold cross-validation
mod1 = glm(y~x1+x2+x3+x4, family=binomial(link="logit")),
data=df[df$period%in%c("01","02","03"),]
mod2 = glm(y~x1+x2+x3+x4, family=binomial(link="logit")),
data=df[df$period%in%c("01","02","04"),]
mod3 = glm(y~x1+x2+x3+x4, family=binomial(link="logit")),
data=df[df$period%in%c("01","03","04"),]
mod4 = glm(y~x1+x2+x3+x4, family=binomial(link="logit")),
data=df[df$period%in%c("02","03","04"),]
The wald test in summary() all show statistically significant p-values for each of the categories.
The chi-square test (anova(modx,test='Chisq')) all show statistical significant values for each of the independent variables.
Lastly, the Hosmer-Lemeshow test also show p<0.05 for each of the model.
My question is, specifically, is this enough to assume that all variables are correlated? Because I know that the wald test differs by the variables location in the formula (i.e. y~x1+x2 is different from y~x2+x1). Coolserdash also describes how the chi-square has a similar approach.
I look forward to reading your answers.