I have a data set consisting of $n$ elements with $d$ features for each element ($x_{i,f}$ means the value for the f-th feature of the i-th element). I would like to cluster this data set into $k$ clusters.
One problem I have is that one feature is a nominal one and another is a discrete ordinal one. As an example consider elements which have the following features:
- $x_{i,1}$ : the height of a person
- $x_{i,2}$ : the weight of a person
- $x_{i,3}$ : country where the person lives
- $x_{i,4}$ : nr. of friends the person has
Is it ok to use a simple k-means algorithm with an euclidian distance measure?
I would introduce a indicator variable $\delta$ with the following meaning for feature $x_{i,3}$: $$\delta(i,j) = \begin{cases} 0, & \text{if }x_{i,3} = y_{j,3}\text{,}\\ 1, & \text{else.}\end{cases}$$ So two objects $i$ and $j$ have no distance for their third feature if this feature is the same (same country) and 1 otherwise.
Or do you know a better way to do a cluster analysis in this case?