I have a problem where I am trying to group observations (most likely using k-means or a similar unsupervised learning tool) where each observation includes n-variables, with the total sum of these variables equal to one. We are therefore grouping observations, based on the probabilities of each of n potential states or outcomes for each observation.
For example: if we are testing the manufacturing of dice, and we take a sample of 1000 dice, each die would be one observation and we would record V1 = the percentage of the time that a one appeared when we rolled that particular die, V2 = % two, etc. We would then cluster the die based on the percent of the time that each die appeared as a 1, 2, 3, etc (to see if, for example, one of the die machines was improperly). While we could look at the percentages, it is perceived that clustering these observations will help point to any under-lying trends (colors, materials, machines, etc) that would not be apparent without the unsupervised learning techniques.
I know this is a bit of a strange problem, but I'm sure some literature exists for it. What is this type of problem called? Can you point me in the direction of some work that has been done to study this type of problem??