I have data like Person $A$ like movies ['X','Y', 'Z']
and he dislikes ['V']
. Person $B$ like movies ['X','L','V']
and dislikes ['Y']
. like wise so many users. What could be a good algorithm to find mean difference of users' tastes?

- 9,922
- 2
- 45
- 74

- 131
- 4
-
Depends on application... Any more details? #s of movies and users? Where do you want to use this dissimilarity? – May 19 '11 at 10:43
-
I would like to find the similar taste users and suggest them as friends to follow. So there will be N number of movies and X Number of users. Each like the movie based on their taste. I don want to look for category of movie. I want the matching percentage of similarity among users. – Ananth Duari May 19 '11 at 10:51
2 Answers
What you want to do is called "Collaborative Filtering". Searching the web will offer you a tremendous amount of resources for this topic, but I truly recommend this paper:
Xiaoyuan Su & Taghi M. Khoshgoftaar: A Survey of Collaborative Filtering Techniques
In section 3. Memory-Based Collaborative Filtering Techniques you'll find the basic techniques to find users with similar taste. They generally consist of selecting a metric for a nearest-neighor-approach plus some modifications on the user-item-matrix (how to deal with missing values / how to treat items which have not been rated this often etc.).This is a good point to start.
After grasping the basic ideas you may want to try out some sophisticated techniques like Singular Value Decomposition, which has been successfully applied in the Netflix-Price (you'll find a link to this and other techniques in section 4. Model-Based Collaborative Filtering Techniques of the recommended paper).
If you have some bucks to spend, I also recommend "Programming Collective Intelligence" by Toby Segaran, which approaches this topic in a very very practical way.

- 9,922
- 2
- 45
- 74
If you represent each movie as a categorical variable with 3 levels (like, unspecified, dislike) you can do any type of clustering analysis on your users with these covariates.

- 12,119
- 2
- 35
- 43
-
you mean to say like Fn(A,B) = C[like, unspecified, unlike] = C[1, 2, 2] like = A & B like X = 1 dislike = SUM (A dislike on B's like = V, B dislike on A's like = Y) = 2 unspecified = Z, L = 2 is that correct? – Ananth Duari May 19 '11 at 11:33
-
Ananth, I would also consider assigning a value to matches of dislikes. In other words, A & B both dislike X implies some degree of similarity in tastes. You may want to weight it differently than matching likes, but it should have relevence. – nycdan May 19 '11 at 16:20