Clustering a binary matrix

Question

I have a semi-small matrix of binary features of dimension 250k x 100. Each row is a user and the columns are binary "tags" of some user behavior e.g. "likes_cats".

user  1   2   3   4   5  ...
-------------------------
A     1   0   1   0   1
B     0   1   0   1   0
C     1   0   0   1   0

I would like to fit the users into 5-10 clusters and analyze the loadings to see if I can interpret groups of user behavior. There appears to be quite a few approaches to fitting clusters on binary data - what do we think might be the best strategy for this data?

PCA
Making a Jaccard Similarity matrix, fitting a hierarchical cluster and then using the top "nodes".
K-medians
K-medoids
Proximus?
Agnes

So far I've had some success with using hierarchical clustering but I'm really not sure it's the best way to go..

tags = read.csv("~/tags.csv")
d = dist(tags, method = "binary")
hc = hclust(d, method="ward")
plot(hc)
cluster.means = aggregate(tags,by=list(cutree(hc, k = 6)), mean)

enter image description here

For large (many nodes) and high-dimensional data it can also be worthwhile to try a graph clustering algorithm (using e.g. tanimoto similarity and methods such as Louvain clustering, RNSC, mcl). I have some doubts whether your type of data will generate meaningful clusters (it very well may of course), but those doubts relate to clustering in general, not specifically to a particular type of clustering. PCA is definitely something to try. — micans, Feb 12 '14 at 13:01
To be honest, I'm surprised that this question attracted such a little attention. Why is it so? To me, this sounds like an extremely interesting question. — Dror Atariah, Apr 27 '15 at 11:25

D L Dahly · Answer 1 · 2015-06-03T12:22:53.730

Latent class analysis is one possible approach.

Take the following probability distribution where A, B, and C can take on values of 1 or 0.

$P(A_i, B_j, C_k)$

If these were independent of each other, then we would expect to see:

$P(A_i, B_j, C_k)=P(A_i)P(B_j)P(C_k)$

Once this possiblity is eliminated, we might hypothesize that any observed dependency is due to values clustering within otherwise unobserved subgroups. To test this idea, we can estimate the following model:

$P(A_i, B_j, C_k)=P(X_n)P(A_i|X_n)P(B_j|X_n)P(C_k|X_n)$

Where $X$ is a latent categorical variable with $n$ levels. You specfy $n$, and the model parameters (marginal probabilities of class membership, and class specific probabilities for each variable) can be estimated via expectation-maximization.

In practice, you could estimate several models, with $5 \le n \le 10$, and "choose" the best model based on theory, likelihood based fit indices, and classification quality (which can be assessed by calculating posterior probabilities of class membership for the observations).

However, trying to identify meaningful patterns in 100 variables with 5-10 groups will likely require reducing that list down prior to estimating the model, which is a tricky enough topic in its own right (REF).

Great, interesting. What would you say is the benefit of using that technique over any of the others? — wije, Feb 12 '14 at 15:48
One advantage is that clustering is fuzzy, allowing you to account for uncertainty in any subsequent class assignments. Another is that because it is a model based method,. you get likelihood based fit indices that can help guide model selection. This of course comes at the cost of having to make distributional assumptions...I'm sure other valid methods will have their own tradeoffs. — D L Dahly, Feb 12 '14 at 19:58

score 7 · Answer 2 · answered Feb 14 '14 at 09:22

7

Actually, frequent itemset mining may be a better choice than clustering on such data.

The usual vector-oriented set of algorithms does not make a lot of sense. K-means for example will produce means that are no longer binary.

answered Feb 14 '14 at 09:22

Has QUIT--Anony-Mousse

39,639
7
61
96

Does is make sense to use frequent items even though i wish to cluster the users rather than the tags (columns)? – wije Feb 14 '14 at 11:20
1

IMHO yes. But for obvious reasons, association rules are not a strict partitioning of the data set. A user may be a member of more than one "frequent itemset". I.e. a user may both be a cat fan and a dog fan; these two groups aren't enforced to be disjoint. – Has QUIT--Anony-Mousse Feb 14 '14 at 13:04
Which IMHO is actually good. Assuming that every user is member of exactly one cluster seems overly naive to me. – Has QUIT--Anony-Mousse Feb 14 '14 at 13:05

Clustering a binary matrix

2 Answers2

Linked