I have a dataset with about 800 observations, each with about 2000 boolean variables. I would like to cluster the observations. Now, I'm pretty new to all of this so I hope you'll bear with me.
My first thought was to use agglomerative hierarchical clustering. After looking into various linkage methods, I don't think I can find exactly what I want. For each clustering step I want the new cluster to contain all the "true" of the previous clusters it consists of.
So lets say we start with the following observations:
V1 V2 V3 V4 V5 V6
O1 X X
O2 X X X
O3 X X X
O4 X X X
O5 X X
The first clusters to be formed should look something like:
V1 V2 V3 V4 V5 V6
Ca1 X X X (containing O1,O2)
Ca2 X X X X (containing O3,O4)
Ca3 X X (containing O5)
Further in the proces it could look like:
V1 V2 V3 V4 V5 V6
Cb1 X X X (containing O1,O2)
Cb2 X X X X (containing O3,O4,O5)
As it moves up the hierarchy it should absorb all the "True" of the previous cluster. The top of the hierarchy is a single cluster with all the variables set to "True".
Does this mean that, each time a new cluster is formed, a new dissimilarity matrix must be calculated? Does this exist? What is this called?
Sorry if I'm being unclear, I'll try to answer any questions to my best effort.
Edit: Changed wording in title (dichotomous to binary, removed word)