I've recently entered the realm of machine learning and a project I am working on requires me to cluster users based on the order they visited webpages on a website. I have data in the form of:
['user_id', 1, 2, 4, 6, 3, 7, 3, 2, 4...]
Where each number is a category/page that the user visited. In addition the length of data for each user is not the same i.e. some users visit more pages than others.
I realize this is really vague and defining similarity it hard. I tried following the example in this research paper and to be honest a lot of it went over my head.
I need help in how to approach this problem and am open to new ideas and suggestions.