Here is an investigation of how adding (or removal, doesn't matter) of constant attributes to/from a dataset of binary attributes affects distances computed between cases. I tested it for various popular binary data distance measures using SPSS.
This answer helps to decide whether one may or may not delete attributes which are constant (or, to extend tentatively, almost constant, i.e. extremely skewed) before computing proximities b/w cases in binary data - to input the matrix then to procedures such as hierarchical cluster analysis (HAC) or multidimensional scaling (MDS).
I used to generate repeatedly 15 random binary variables (with random moderate skewnesses, and coming both from noncorrelated or correlated population; it didn't change the results) and computed a proximity measure between cases. Then I would add 5 more variables, now constants - either all =1 or all =0, and computed the proximity matrix again. Then I would observe the scatterplot of the 15-variable vs 20-variable based proximity values.
All proximity measures for binary data available currently in SPSS, were examined. In a binary variable, 1 means "attribute is present" and 0 means "attribute is absent". The results are shown below.
- equal: constant variables do not affect distance measure anyhow (they're just ignored by the measure)
- proportional: exact proportional relation
- linear: exact linear (i.e. with intercept term) relation
- monotonic: exact somewhat curved relation
- scatter: no exact relation, scatterplot is a cloud (the shape of the cloud depends on the measure, it can be ellipsoid or half-ellipsoid or more peculiar).
An acronym nearby the name of a measure is its SPSS syntax keyword. The found relationships are:
1) When the 5 constant variables are 1 (attribute is present)
Similarities
Russell and Rao (simple joint prob) RR linear
Simple matching (or Rand) SM linear
Jaccard JACCARD scatter
Dice (or Czekanowski or Sorenson) DICE scatter
Sokal and Sneath 1 SS1 monotonic, almost linear
Rogers and Tanimoto RT monotonic, almost linear
Sokal and Sneath 2 SS2 scatter
Kulczynski 1 K1 scatter
Sokal and Sneath 3 SS3 linear, however sometimes prox value can be 1 in both regimes (i.e. away from the line)
Kulczynski 2 K2 scatter
Sokal and Sneath 4 SS4 scatter
Hamann HAMANN linear
Ochiai (or cosine) OCHIAI scatter
Sokal and Sneath 5 SS5 scatter
Phi (or Pearson) correlation PHI scatter
Goodman and Kruskal’s lambda LAMBDA scatter
Anderberg’s D D scatter
Yule’s Y Y scatter
Yule’s Q (Goodman and Kruskal’s gamma) Q scatter
Dispersion similarity DISPER scatter
Dissimilarities
Euclidean distance BEUCLID equal
Squared Euclidean distance BSEUCLID equal
Size difference SIZE proportional
Pattern difference PATTERN proportional
Shape difference BSHAPE scatter
Variance dissimilarity VARIANCE proportional
Lance-and-Williams dissimilarity BLWMN scatter
2) When the 5 constant variables are 0 (attribute is absent)
Similarities
Russell and Rao (simple joint prob) RR proportional
Simple matching (or Rand) SM linear
Jaccard JACCARD equal
Dice (or Czekanowski or Sorenson) DICE equal
Sokal and Sneath 1 SS1 monotonic, almost linear
Rogers and Tanimoto RT monotonic, almost linear
Sokal and Sneath 2 SS2 equal
Kulczynski 1 K1 equal
Sokal and Sneath 3 SS3 linear, however sometimes prox value can be 1 in both regimes (i.e. away from the line)
Kulczynski 2 K2 equal
Sokal and Sneath 4 SS4 scatter
Hamann HAMANN linear
Ochiai (or cosine) OCHIAI equal
Sokal and Sneath 5 SS5 scatter
Phi (or Pearson) correlation PHI scatter
Goodman and Kruskal’s lambda LAMBDA scatter
Anderberg’s D D scatter
Yule’s Y Y scatter
Yule’s Q (Goodman and Kruskal’s gamma) Q scatter
Dispersion similarity DISPER scatter
Dissimilarities
Euclidean distance BEUCLID equal
Squared Euclidean distance BSEUCLID equal
Size difference SIZE proportional
Pattern difference PATTERN proportional
Shape difference BSHAPE scatter
Variance dissimilarity VARIANCE proportional
Lance-and-Williams dissimilarity BLWMN equal
The instruction is straightforward. You should not remove (or add) constant attributes (provided they are meaningful to you) in any scatter case because it will affect the distances in the matrix in a nonsystematic way. In proportional or linear case your decision to remove or to leave should take into consideration the nature of the specific clustering or MDS method. For example, in complete or single linkage$^1$ HAC clustering any proportional or linear transform of distances will not affect results; even monotonic transform won't (though it might influence the decision of the number of clusters to leave). In MDS, results may differ whether you treat your distances as ratio, interval or ordinal level. Hence, depending on your choice of that, proportional or linear effects of constant (or almost constant) attribute deletion will or will not take place in the results of your analysis.
$^1$ Actually, only these two linkage methods in HAC are 100% theoretically warranted for binary data. Average linkage methods already bear some heuristicity, and "geometric" methods like centroid or Ward should, further, be avoided with binary data.