I want to commence a twostep cluster analysis, since the database I am conducting analysis on contains important metric as well as nominal values.
=> Question #1: Should the binary and the metric variables used be about the same quantity? I use 3 binary variables, but way more metric ones. Will one binary (of only few) influence the cluster shaping more than one metric (of many)?
=> Question #2: Does it "confuse" the algorithm if some binary variables are encoded with 0,1, and some with 1,2 as possible values? Or does it merely assess the distance between cases and not care about this at all?
Also, I know that with "normal" cluster analysis, you can chose different coefficients for the comparison of cases. Some consider shared non-values as similarities (e.g. the Simple Matching coefficient), some only consider present values as similar (Tanimoto / Jaccard). To my knowledge, the latter is useful if dummies are used: Two people are both NOT a member of the Republicans, NOT a member of the Democrats, NOT a member of the Green Party but a member of The Libertarian Party. If only positive values are considered, that would mean they have one thing in common; if both negative and positive values are considered, they have four things in common (although it is really just one).
Since I was gonna use dummies to assess the employment state, I also have the following questions:
=> Question #3: Can I chose the coefficient used for binary variables when I do a two step cluster analysis? (I was gonna use SPSS, but Stata is also an option)
=> Question #4: If not, which coefficient does that analysis use? Are mutual non-values considered a similarity?
=> Question #5: If mutual non-values are considered a similarity: Is there a way to reduce autocorrelation akin to the example above? Transforming the binary variables to metric ones is not feasible, is there anything else?
I would be VERY happy if any of you could help me with these questions! I've already done literary research on them, sadly, I wound up with no answers yet.