I have a seemingly easy question which however is troubling me a bit.
I have couples of vectors made up of nominal attributes. They can be of different length and sometimes some of the attributes in one might not be included in the other. See a
and b
as two potential examples.
a
1 mathematician
2 engineer
3 mathematician
4 mathematician
5 mathematician
6 engineer
7 mathematician
8 mathematician
9 mathematician
10 mathematician
11 mathematician
12 engineer
13 mathematician
14 mathematician
15 engineer
b
1 physicist
2 surgeon
3 physicist
4 surgeon
5 physicist
6 physicist
7 surgeon
8 surgeon
9 physicist
10 physicist
11 mathematician
Do you have in mind a measure that could summarize the dissimilarity between them? The type of measure I am looking for is something like the euclidean distance, but for qualitative vectors.
One option I have in ming is to actually compute the euclidean distance among the categorical vectors transformed into frequence vectors. In this way, they would become quantitative and would be of the same length. But my question is, do you find this a sound approach?
If someone has more ideas, we could do a review of distance measures for nominal vectors!