Can I use Clustering with mixed data type in R?

Question

I know there is same question in cross validated. But it is somewhat different.

At there Q&A, as using daisy funtion(), we can use categorical data type in clustering.

But, I'm wondering that as sequence the nominal variable (for example, 1 is small apartment, 2 is middle-size apartment, 3 is building and the higher number, the better), can I use kmeans clustering with this nominal variable?

Of course, in this case, this nominal variable is converted as int type(=continuous type).

Please let me know, why it can't or can. I want to know theory explanation.

so, you just want to convert a nominal variable to continous? or something more? — carlo, Mar 21 '17 at 14:26
@carlo Yes. right. But the nominal variable has rank. For example, 1 is small apartment, 2 is middle-size apartment, 3 is building and the higher number, the better. I know that converting nominal variable to continuous is wrong. So i rank the nominal variable's data. Then.. Is it right ?? — 서영재, Mar 24 '17 at 00:41
Your example seems to be ordinal rather then nominal. In any case, daisy works fine for what you want to do, I have used the Matlab port of it for my master thesis. Just be careful to accurately tell it which variable is what type. — David Ernst, Sep 03 '17 at 21:17

score 1 · Answer 1 · answered Mar 26 '17 at 16:49

It depends on the desired effect.

For example with k-means, if you encode these values as 1,2,3 the distance of 1 to 3 is 2²=4, i.e., 4 times as much as the differences of 1 to 2, and 2 to 3 (1²=1).

This can be desired, or problematic. It depends on your data's meaning, there is not a single mathematical 'more correct' way.

Can I use Clustering with mixed data type in R?

1 Answers1