Co-clustering documents and words using bipartite spectral graph partitioning
I. Dhillon. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, page 269--274. New York, NY, USA, ACM, (2001)
DOI: 10.1145/502512.502550
Abstract
Both document clustering and word clustering are well studied problems. Most existing algorithms cluster documents and words separately but not simultaneously. In this paper we present the novel idea of modeling the document collection as a bipartite graph between documents and words, using which the simultaneous clustering problem can be posed as a bipartite graph partitioning problem. To solve the partitioning problem, we use a new spectral co-clustering algorithm that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. The spectral algorithm enjoys some optimality properties; it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitioning problem. We present experimental results to verify that the resulting co-clustering algorithm works well in practice.
Description
Co-clustering documents and words using bipartite spectral graph partitioning
%0 Conference Paper
%1 Dhillon:2001:CDW:502512.502550
%A Dhillon, Inderjit S.
%B Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
%C New York, NY, USA
%D 2001
%I ACM
%K bipartite co-clustering spectral thema thema:co-clustering
%P 269--274
%R 10.1145/502512.502550
%T Co-clustering documents and words using bipartite spectral graph partitioning
%U http://doi.acm.org/10.1145/502512.502550
%X Both document clustering and word clustering are well studied problems. Most existing algorithms cluster documents and words separately but not simultaneously. In this paper we present the novel idea of modeling the document collection as a bipartite graph between documents and words, using which the simultaneous clustering problem can be posed as a bipartite graph partitioning problem. To solve the partitioning problem, we use a new spectral co-clustering algorithm that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. The spectral algorithm enjoys some optimality properties; it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitioning problem. We present experimental results to verify that the resulting co-clustering algorithm works well in practice.
%@ 1-58113-391-X
@inproceedings{Dhillon:2001:CDW:502512.502550,
abstract = {Both document clustering and word clustering are well studied problems. Most existing algorithms cluster documents and words separately but not simultaneously. In this paper we present the novel idea of modeling the document collection as a bipartite graph between documents and words, using which the simultaneous clustering problem can be posed as a bipartite graph partitioning problem. To solve the partitioning problem, we use a new spectral co-clustering algorithm that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. The spectral algorithm enjoys some optimality properties; it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitioning problem. We present experimental results to verify that the resulting co-clustering algorithm works well in practice.},
acmid = {502550},
added-at = {2013-04-08T11:05:52.000+0200},
address = {New York, NY, USA},
author = {Dhillon, Inderjit S.},
biburl = {https://www.bibsonomy.org/bibtex/2683ca0430ea02eff60988a6cda122e6f/becker},
booktitle = {Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining},
description = {Co-clustering documents and words using bipartite spectral graph partitioning},
doi = {10.1145/502512.502550},
interhash = {d112b6041dab4b4677368c7bf101f1ee},
intrahash = {683ca0430ea02eff60988a6cda122e6f},
isbn = {1-58113-391-X},
keywords = {bipartite co-clustering spectral thema thema:co-clustering},
location = {San Francisco, California},
numpages = {6},
pages = {269--274},
publisher = {ACM},
series = {KDD '01},
timestamp = {2013-04-08T11:05:52.000+0200},
title = {Co-clustering documents and words using bipartite spectral graph partitioning},
url = {http://doi.acm.org/10.1145/502512.502550},
year = 2001
}