I want to do a univariate analysis on a set of variables to see which predict a binary outcome. I want to discard some of them before performing logistic regression.
I am trying to understand if I can rely on the f-test outputs (as provided by f_classif
in sklearn
) when my variables are non-normal and the outcome is binary.
I understand that in a ols regression problem this f-test compares the variance of the residuals between a model with intercept only and the variance of a model with the variable included. So, I would think the original distribution of the dependent variables is not problematic. Now, in logistic regression I would think it is the same, but I can't find any background related to this f_classif
for binary outcomes and I don't understand what residuals are compared.
My apologies in advance if this question is basic.