copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Clustering of time-series subsequences is meaningless: implications for previous and future research

E. Keogh, and J. Lin. Knowledge and Information Systems, 8 (2): 154--177 (Aug 31, 2005)
DOI: 10.1007/s10115-004-0172-7

Abstract

Given the recent explosion of interest in streaming data and online algorithms, clustering of time-series subsequences, extracted via a sliding window, has received much attention. In this work, we make a surprising claim. Clustering of time-series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising because it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method that, based on the concept of time-series motifs, is able to meaningfully cluster subsequences on some time-series datasets.

Links and resources

BibTeX key: Keogh2005Clustering
entry type: article
booktitle: Knowledge and Information Systems
year: 2005
month: aug
day: 31
journal: Knowledge and Information Systems
number: 2
pages: 154--177
publisher: Springer-Verlag
volume: 8
citeulike-linkout-2: http://link.springer.com/article/10.1007/s10115-004-0172-7
citeulike-linkout-1: http://www.springerlink.com/content/abbw3h04m2a6kp42
citeulike-attachment-1: meaningless.pdf; /pdf/user/pbett/article/1211008/1011491/meaningless.pdf; 9447d0b729b06c8770dc90f318d3ff200189b8a6
citeulike-article-id: 1211008
priority: 2
posted-at: 2015-03-30 07:48:58
citeulike-attachment-2: keogh_05_clustering_1011492.pdf; /pdf/user/pbett/article/1211008/1011492/keogh_05_clustering_1011492.pdf; 8b122724faa74b34894901697fb789767f3f8ceb
issn: 0219-1377
citeulike-linkout-0: http://dx.doi.org/10.1007/s10115-004-0172-7
comment: (private-note)Attached preprint available from http://www.cs.ucr.edu/\~eamonn/meaningless.pdf Attached postprint available from http://cs.gmu.edu/\~jessica/publications/meaningless\_kais05.pdf
DOI: 10.1007/s10115-004-0172-7
url: http://dx.doi.org/10.1007/s10115-004-0172-7

@pbett's tags highlighted

Cite this publication

@article{Keogh2005Clustering, abstract = {Given the recent explosion of interest in streaming data and online algorithms, clustering of time-series subsequences, extracted via a sliding window, has received much attention. In this work, we make a surprising claim. Clustering of time-series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising because it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method that, based on the concept of time-series motifs, is able to meaningfully cluster subsequences on some time-series datasets.}, added-at = {2018-06-18T21:23:34.000+0200}, author = {Keogh, Eamonn and Lin, Jessica}, biburl = {https://www.bibsonomy.org/bibtex/22e17e62767aa98446ff14741651609fc/pbett}, booktitle = {Knowledge and Information Systems}, citeulike-article-id = {1211008}, citeulike-attachment-1 = {meaningless.pdf; /pdf/user/pbett/article/1211008/1011491/meaningless.pdf; 9447d0b729b06c8770dc90f318d3ff200189b8a6}, citeulike-attachment-2 = {keogh_05_clustering_1011492.pdf; /pdf/user/pbett/article/1211008/1011492/keogh_05_clustering_1011492.pdf; 8b122724faa74b34894901697fb789767f3f8ceb}, citeulike-linkout-0 = {http://dx.doi.org/10.1007/s10115-004-0172-7}, citeulike-linkout-1 = {http://www.springerlink.com/content/abbw3h04m2a6kp42}, citeulike-linkout-2 = {http://link.springer.com/article/10.1007/s10115-004-0172-7}, comment = {(private-note)Attached preprint available from http://www.cs.ucr.edu/\~{}eamonn/meaningless.pdf Attached postprint available from http://cs.gmu.edu/\~{}jessica/publications/meaningless\_kais05.pdf}, day = 31, doi = {10.1007/s10115-004-0172-7}, interhash = {a6646462cc59e9f13cbcbb27b9949b47}, intrahash = {2e17e62767aa98446ff14741651609fc}, issn = {0219-1377}, journal = {Knowledge and Information Systems}, keywords = {statistics clustering}, month = aug, number = 2, pages = {154--177}, posted-at = {2015-03-30 07:48:58}, priority = {2}, publisher = {Springer-Verlag}, timestamp = {2018-06-22T18:34:53.000+0200}, title = {Clustering of time-series subsequences is meaningless: implications for previous and future research}, url = {http://dx.doi.org/10.1007/s10115-004-0172-7}, volume = 8, year = 2005 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Clustering of time-series subsequences is meaningless: implications for previous and future research

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Clustering of time-series subsequences is meaningless: implications for previous and future research

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Clustering of time-series subsequences is meaningless: implications for previous and future research

Comments and Reviews
(0)