Best similarity measure for binary wine data

Question

I'm creating a dataset of wine grape varieties and their associated flavors/aromas. Here's a schematic of the data:

            Flavor1       Flavor2       Flavor3       Flavor4    ...

   Grape1      1             1             1             0

   Grape2      0             0             0             1

   Grape3      0             0             1             0

   Grape4      1             1             1             1

   ...

1 = grape has the flavor

0 = grape doesn't have the flavor

I plan to plot histograms for each grape variety and do a visual check, but I imagine there's some similarity matrix I could construct for these data. I'm not the most advanced statistics user, so something readily implementable in a statistical package would be great, if at all possible.

Thank you!

@ttnphns I don't have the knowledge to answer that question unfortunately. I was hoping that the structure of my dataset might suggest something, but maybe not. Basically, I'm looking for something easily implementable and easily interpretable. — mrt, May 21 '18 at 06:02

score 1 · Answer 1 · answered May 21 '18 at 05:25

I suggest to try one of the following distance measures

If you are on python, the following package gives you a list of algorithms to experiment with out of the box

Just check out the "Metrics intended for boolean-valued vector spaces"

Here you can get a short recipe for doing so

This thread might be a good further reading

Best similarity measure for binary wine data

1 Answers1