I am using SelectKbest for my feature selection process. https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html
My data is non normal and actually skewed. I don't transform/scale it either since i am using tree based method (xgboost) binary classifier.
I have 200+ features therefore for better performance i would like to somehow reduce these.
I am using selectKbest(score=f_classif)
From my understanding f_classif interpretes the values of y as class labels and computes, for each feature X[:,i] of X, an F-statistic. The formula used is exactly the one given here: one way ANOVA F-test, with K the number of distinct values of y. I am sure this needs an underlying assumption of normlally dsitrubuted features. I have been reading alternative scoring functions for my classification task, e.g. chi2 as opposed to f_classif.
Since this is non parametric would you say this is more suited for my data?