I have a dataset containing behaviours of users. Let's say we have 1000 users and 40 similar tasks. The tasks are similar but the difficulty level vary.To do a task, people can follow 10 patterns. Since the 40 tasks are similar, the 10 patterns apply to all of them.
Each user do at least 5 tasks out of 40, following one of the 10 patterns. My goal is to see whether or not users tend to always follow the same pattern when performing these tasks.
We define:
- P1, P2.... P10 as 10 possible patterns for the tasks.
- U1, U2, ..... U1000 as 1000 users
- T1, T2,..... T40 as 40 tasks
Since we don't care about Task-differences for now, so my data can be illustrated with P and U only:
user, patterns
U1, P1 P1 P1 P2 P2
U2, P2 P3 P5 P4 P5
U3, P1 P1 P1 P1 P1 P1 P1 P1 P1 P1 P1 P1
U4, P2 P2 P2 P2 P2 P2 P2 P1 P1 P1 P1 P1 P1
....
U1000, P3 P2 P3 P2 P3 P2
Let's come back to my question:
Do users usually tend to stick to the same pattern?
I would like to know how to measure this kind of stability. Now what I do is the following:
For each user, I compute the number of repetitions of a certain pattern, then pick up the most frequent one. Calculate the proportion of these repetitions with regard to the number of patterns for this user. Then I get something like this:
user, proportionOfMostFrequentPattern
U1, 0.9
U2, 1
U3, 0.4
....
With the above approach, I can capture how much proportion of time a user followed the same most frequent pattern. However, if we take a look at U4. His patterns have 7 P1 and 6 P2. If I use the approach above, the proportion is 7/13. This does not consider the repetitions of P2. Is there a standard metric that is able to calculate the stability of the patterns? In addition, is there a significance test for this kind of issue?