Algorithm to calculate difference in users' tastes

Question

I have data like Person $A$ like movies ['X','Y', 'Z'] and he dislikes ['V']. Person $B$ like movies ['X','L','V'] and dislikes ['Y']. like wise so many users. What could be a good algorithm to find mean difference of users' tastes?

Depends on application... Any more details? #s of movies and users? Where do you want to use this dissimilarity? — , May 19 '11 at 10:43
I would like to find the similar taste users and suggest them as friends to follow. So there will be N number of movies and X Number of users. Each like the movie based on their taste. I don want to look for category of movie. I want the matching percentage of similarity among users. — Ananth Duari, May 19 '11 at 10:51

mlwida · Accepted Answer · 2011-05-19T11:45:45.760

What you want to do is called "Collaborative Filtering". Searching the web will offer you a tremendous amount of resources for this topic, but I truly recommend this paper:

Xiaoyuan Su & Taghi M. Khoshgoftaar: A Survey of Collaborative Filtering Techniques

In section 3. Memory-Based Collaborative Filtering Techniques you'll find the basic techniques to find users with similar taste. They generally consist of selecting a metric for a nearest-neighor-approach plus some modifications on the user-item-matrix (how to deal with missing values / how to treat items which have not been rated this often etc.).This is a good point to start.

After grasping the basic ideas you may want to try out some sophisticated techniques like Singular Value Decomposition, which has been successfully applied in the Netflix-Price (you'll find a link to this and other techniques in section 4. Model-Based Collaborative Filtering Techniques of the recommended paper).

If you have some bucks to spend, I also recommend "Programming Collective Intelligence" by Toby Segaran, which approaches this topic in a very very practical way.

score 1 · Answer 2 · answered May 19 '11 at 11:07

1

If you represent each movie as a categorical variable with 3 levels (like, unspecified, dislike) you can do any type of clustering analysis on your users with these covariates.

answered May 19 '11 at 11:07

Nick Sabbe

12,119
2
35
43

you mean to say like Fn(A,B) = C[like, unspecified, unlike] = C[1, 2, 2] like = A & B like X = 1 dislike = SUM (A dislike on B's like = V, B dislike on A's like = Y) = 2 unspecified = Z, L = 2 is that correct? – Ananth Duari May 19 '11 at 11:33
Ananth, I would also consider assigning a value to matches of dislikes. In other words, A & B both dislike X implies some degree of similarity in tastes. You may want to weight it differently than matching likes, but it should have relevence. – nycdan May 19 '11 at 16:20

Algorithm to calculate difference in users' tastes

2 Answers2

Linked