Getting started with biclustering

Question

I have been doing some casual internet research on biclusters. (I have read the Wiki article several times.) So far, it seems as if there are few definitions or standard terminology.

I was wondering if there were any standard papers or books that anybody who is interested in algorithms for finding biclusters should read.
Is it possible to say what is the state of the art in the field? I was intrigued by the notion of finding biclusters using genetic algorithms, so I would appreciate comments on that approach in particular in the context of other approaches.
Usually in clustering, the goal is to partition the data-set into groups where each element is in some group. Do bicluster algorithms also seek to put all elements in a particular group?

score 16 · Accepted Answer · answered Feb 27 '11 at 11:51

I never used it directly, so I can only share some papers I had and general thoughts about that technique (which mainly address your questions 1 and 3).

My general understanding of biclustering mainly comes from genetic studies (2-6) where we seek to account for clusters of genes and grouping of individuals: in short, we are looking to groups samples sharing similar profile of gene expression together (this might be related to disease state, for instance) and genes that contribute to this pattern of gene profiling. A survey of the state of the art for biological "massive" datasets is available in Pardalos's slides, Biclustering. Note that there is an R package, biclust, with applications to microarray data.

In fact, my initial idea was to apply this methodology to clinical diagnosis, because it allows to put features or variables in more than one cluster, which is interesting from a semeiological perpective because symptoms that cluster together allow to define syndrome, but some symptoms can overlap in different diseases. A good discussion may be found in Cramer et al., Comorbidity: A network perspective (Behavioral and Brain Sciences 2010, 33, 137-193).

A somewhat related technique is collaborative filtering. A good review was made available by Su and Khoshgoftaar (Advances in Artificial Intelligence, 2009): A Survey of Collaborative Filtering Techniques. Other references are listed at the end. Maybe analysis of frequent itemset, as exemplified in the market-basket problem, is also linked to it, but I never investigated this. Another example of co-clustering is when we want to simultaneously cluster words and documents, as in text mining, e.g. Dhillon (2001). Co-clustering documents and words using bipartite spectral graph partitioning. Proc. KDD, pp. 269–274.

About some general references, here is a not very exhaustive list that I hope you may find useful:

Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666
Carmona-Saez et al. (2006). Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics, 7, 78.
Prelic et al. (2006). A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22(9), 1122-1129. www.tik.ee.ethz.ch/sop/bimax
DiMaggio et al. (2008). Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies. BMC Bioinformatics, 9, 458.
Santamaria et al. (2008). BicOverlapper: A tool for bicluster visualization. Bioinformatics, 24(9), 1212-1213.
Madeira, S.C. and Oliveira, A.L. (2004) Bicluster algorithms for biological data analysis: a survey. IEEE Trans. Comput. Biol. Bioinform., 1, 24–45.
Badea, L. (2009). Generalized Clustergrams for Overlapping Biclusters. IJCAI
Symeonidis, P. (2006). Nearest-Biclusters Collaborative Filtering. WEBKDD

Great answer. If I had another vote, I would vote for this answer again. — Henry B., Feb 28 '11 at 19:38
@chl The first link to the Pardalos slides seems to be dead. Does anyone know of an alternative location? — Erik, Mar 15 '13 at 13:47
@Erik Most of the material from the slides can be found in [Consistent Biclustering via Fractional 0–1 Programming](http://www.na.icar.cnr.it/~mariog/slides/Pardalos_biclustering.pdf) by the same author. (I checked the content of the slides with my copy of the dead link.) — chl, Mar 15 '13 at 15:45

score 4 · Answer 2 · answered Dec 20 '11 at 19:37

4

Here's a good survey/review:

Stanislav Busygin, Oleg Prokopyev, and Panos M. Pardalos. Biclustering in data mining. Computers & Operations Research, 35(9):2964–2987, September 2008.

answered Dec 20 '11 at 19:37

kc2001

215
1
9

Getting started with biclustering

2 Answers2

Linked