I'm creating a dataset of wine grape varieties and their associated flavors/aromas. Here's a schematic of the data:
Flavor1 Flavor2 Flavor3 Flavor4 ...
Grape1 1 1 1 0
Grape2 0 0 0 1
Grape3 0 0 1 0
Grape4 1 1 1 1
...
1 = grape has the flavor
0 = grape doesn't have the flavor
I plan to plot histograms for each grape variety and do a visual check, but I imagine there's some similarity matrix I could construct for these data. I'm not the most advanced statistics user, so something readily implementable in a statistical package would be great, if at all possible.
Thank you!