0

I am curious to see how similarly my participants process a given experiment. (I do not think "agreement" is the right term here so I used similarity.) The values are continuous and have no (theoretical) ceiling, but they cannot be negative. I thought about calculating a correlation or similarity score (e.g. cosine) but as far as I know those are designed to compare two sets of values only. In my case I have more (potentially 5 or more).

I tried looking for this problem on CV, but the related questions that I could find were either about categorical data or limited to two participants. I may have been using the wrong key terms because this seems like it could be a prevalent problem to solve.

So for this dataset, the question would be "how similar are the values of the participants" (comparing the rows), expecting a single real number (between 0 and 1) for the "across all participants" similarity.

participant
P02 8 9 11 34 2 6 8
P04 14 20 35 66 8 14 12
P05 7 11 10 20 4 5 13
Bram Vanroy
  • 147
  • 2
  • 13
  • Can you add a little bit more clarity on what exactly you want out? Do you want a single real number between 0 and 1 to measure how correlated all 3 of these rows are? If so, then I think you're asking something similar to [this](https://stats.stackexchange.com/questions/9918/how-to-compute-correlation-between-within-groups-of-variables). – Adam Kells Oct 01 '21 at 10:32

1 Answers1

0

I think there are two ways to do this:

  1. Try using the cosine similarity generalised to multiple vectors as discussed here.
  2. Compute some pairwise correlation or distance metric between rows and perform a clustering. You can then compute a metric which will tell you how similar the clusters are to each other.
Adam Kells
  • 908
  • 1
  • 12