I am working with a binary data set. There are 'm' bacteria models and 'n' attributes (i.e. genes) in total. The data set represents the attribute composition in each model (1 for present and 0 for absent). Below is a view of the data set. As shown all the models do not have all the attributes. However there are some attributes that are constant in all the models. e.g. attributes A1, A2 and A3.
Model A1 A2 A3 A4 A5 A6 A7
M1 1 1 1 1 1 1 0
M2 1 1 1 1 1 1 1
M3 1 1 1 1 0 0 0
M4 1 1 1 0 0 1 0
M5 1 1 1 1 0 0 0
M6 1 1 1 0 1 0 1
M7 1 1 1 0 0 1 1
I want to cluster the models based on their attributes and group models with similar attributes. I want to know;
If I remove attributes A1, A2 and A3 from the data set before the analysis (solely because they are constant in all the models) will it effect my analysis? Or
Is it always a must to do a PCA (or any other statistical validation) prior deciding which variables to remove?
I would like to know what the common practice is in such a scenario.