21

I know that for each feature-class pair, the value of the chi-square statistic is computed and compared against a threshold.

I am a little confused though. If there are $m$ features and $k$ classes, how does one build the contingency table? How does one decide which features to keep and which ones to remove?

Any clarification will be much appreciated. Thanks in advance

user721975
  • 825
  • 2
  • 9
  • 15

1 Answers1

9

The chi-square test is a statistical test of independence to determine the dependency of two variables. It shares similarities with coefficient of determination, R². However, chi-square test is only applicable to categorical or nominal data while R² is only applicable to numeric data.

From the definition, of chi-square we can easily deduce the application of chi-square technique in feature selection. Suppose you have a target variable (i.e., the class label) and some other features (feature variables) that describes each sample of the data. Now, we calculate chi-square statistics between every feature variable and the target variable and observe the existence of a relationship between the variables and the target. If the target variable is independent of the feature variable, we can discard that feature variable. If they are dependent, the feature variable is very important.

Mathematical details are described here:http://nlp.stanford.edu/IR-book/html/htmledition/feature-selectionchi2-feature-selection-1.html

For continuous variables, chi-square can be applied after "Binning" the variables.

An example in R, shamelessly copied from FSelector

# Use HouseVotes84 data from  mlbench package
library(mlbench)# For data
library(FSelector)#For method
data(HouseVotes84)

#Calculate the chi square statistics 
weights<- chi.squared(Class~., HouseVotes84)

# Print the results 
print(weights)

# Select top five variables
subset<- cutoff.k(weights, 5)

# Print the final formula that can be used in classification
f<- as.simple.formula(subset, "Class")
print(f)

Not related to so much in feature selection but the video below discusses the chisquare in detail https://www.youtube.com/watch?time_continue=5&v=IrZOKSGShC8

discipulus
  • 726
  • 4
  • 14
  • It makes sense to "bin" the continuous variable features, then apply the Chi2 test. But there are other implementations, such as sklearn's chi2 for feature selection here, https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html. No binning at all, but just sum each features up. What's your thoughts on it? – EricX Jul 04 '20 at 13:55