What is the proper way to conduct chi-square test (and/or Hosmer-Lemeshow) on a binomial logistic regression? (GLM)

Question

I am building a glm for machine learning purposes to predict value y.

I want to test the following formula in all such of ways that is considered high academic standard. I have read COOLSerdash's answer here, but I didn't find it to answer my question sufficiently.

I have built 4 training data sets as I want to conduct a 4-fold cross-validation

mod1 = glm(y~x1+x2+x3+x4, family=binomial(link="logit")),
           data=df[df$period%in%c("01","02","03"),]
mod2 = glm(y~x1+x2+x3+x4, family=binomial(link="logit")),
           data=df[df$period%in%c("01","02","04"),]
mod3 = glm(y~x1+x2+x3+x4, family=binomial(link="logit")),
           data=df[df$period%in%c("01","03","04"),]
mod4 = glm(y~x1+x2+x3+x4, family=binomial(link="logit")),
           data=df[df$period%in%c("02","03","04"),]

The wald test in summary() all show statistically significant p-values for each of the categories.

The chi-square test (anova(modx,test='Chisq')) all show statistical significant values for each of the independent variables.

Lastly, the Hosmer-Lemeshow test also show p<0.05 for each of the model.

My question is, specifically, is this enough to assume that all variables are correlated? Because I know that the wald test differs by the variables location in the formula (i.e. y~x1+x2 is different from y~x2+x1). Coolserdash also describes how the chi-square has a similar approach.

I look forward to reading your answers.

@dash2 as Y is a binary variable, and x1, x2, x3, x4 are all strings (categories), its a little hard to conduct a cor.test() — Christian R. Houen, May 25 '18 at 10:34
What code did you use for the Hosmer-Lemeshow test? Note that it's a goodness of fit test, so it's not 'supposed' to be significant. If your variables are all binary, I'm not sure how it could be. How many levels are there in each variable? (Maybe you need some interaction terms; see: [Test logistic regression model using residual deviance and degrees of freedom](https://stats.stackexchange.com/a/248978/7290).) Note further that H-L is not considered reliable. It also isn't clear how you're using cross-validation here. — gung - Reinstate Monica, May 25 '18 at 18:55
hoslem.test() - 3 levels in each variable. I have 4 folds. Crossvaliation is done by training a model with same formula on 4 different combinations of the folds. I test each model on the testing data, achieve an accuracy score and average the accuracy score from the 4 models. — Christian R. Houen, May 26 '18 at 08:43

What is the proper way to conduct chi-square test (and/or Hosmer-Lemeshow) on a binomial logistic regression? (GLM)

0 Answers0