2

I have data like Person $A$ like movies ['X','Y', 'Z'] and he dislikes ['V']. Person $B$ like movies ['X','L','V'] and dislikes ['Y']. like wise so many users. What could be a good algorithm to find mean difference of users' tastes?

mlwida
  • 9,922
  • 2
  • 45
  • 74
Ananth Duari
  • 131
  • 4
  • Depends on application... Any more details? #s of movies and users? Where do you want to use this dissimilarity? –  May 19 '11 at 10:43
  • I would like to find the similar taste users and suggest them as friends to follow. So there will be N number of movies and X Number of users. Each like the movie based on their taste. I don want to look for category of movie. I want the matching percentage of similarity among users. – Ananth Duari May 19 '11 at 10:51

2 Answers2

2

What you want to do is called "Collaborative Filtering". Searching the web will offer you a tremendous amount of resources for this topic, but I truly recommend this paper:

Xiaoyuan Su & Taghi M. Khoshgoftaar: A Survey of Collaborative Filtering Techniques

In section 3. Memory-Based Collaborative Filtering Techniques you'll find the basic techniques to find users with similar taste. They generally consist of selecting a metric for a nearest-neighor-approach plus some modifications on the user-item-matrix (how to deal with missing values / how to treat items which have not been rated this often etc.).This is a good point to start.

After grasping the basic ideas you may want to try out some sophisticated techniques like Singular Value Decomposition, which has been successfully applied in the Netflix-Price (you'll find a link to this and other techniques in section 4. Model-Based Collaborative Filtering Techniques of the recommended paper).

If you have some bucks to spend, I also recommend "Programming Collective Intelligence" by Toby Segaran, which approaches this topic in a very very practical way.

mlwida
  • 9,922
  • 2
  • 45
  • 74
1

If you represent each movie as a categorical variable with 3 levels (like, unspecified, dislike) you can do any type of clustering analysis on your users with these covariates.

Nick Sabbe
  • 12,119
  • 2
  • 35
  • 43
  • you mean to say like Fn(A,B) = C[like, unspecified, unlike] = C[1, 2, 2] like = A & B like X = 1 dislike = SUM (A dislike on B's like = V, B dislike on A's like = Y) = 2 unspecified = Z, L = 2 is that correct? – Ananth Duari May 19 '11 at 11:33
  • Ananth, I would also consider assigning a value to matches of dislikes. In other words, A & B both dislike X implies some degree of similarity in tastes. You may want to weight it differently than matching likes, but it should have relevence. – nycdan May 19 '11 at 16:20