There is a dataset of about 8500 different kinds of mushrooms, each datapoint has about 20 features. The features are purely categorical: color of the cap, its shape and so on. None of them are ordered. For every datapoint it is known if the mushroom is poisonous or not. My task is to determine which features distinguish the edible mushrooms from the non-edible ones.
My knowledge of statistics is limited, I have read the following example on analysing categorical data. Following that example, my intent is to do the following:
- For every feature, perform a $2 \times 2$-table Chi-square test to understand if there is any kind of an association.
- For every feature with an association, compute and odds ratio to see if the association is significant or not.
My concern is that I will be treating each feature separately. So, I will have about 20 separate experiments. Maybe, I am missing some statistical test which would take into account the fact that there are many (not just one) categorical features which determine if the mushroom is edible or not.