copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Stream Clustering Algorithms: A Primer

S. Kaur, V. Bhatnagar, and S. Chakravarthy. Big Data in Complex Systems, volume 9 of Studies in Big Data, Springer, Cham, (2015)
DOI: 10.1007/978-3-319-11056-1_4

Abstract

Stream data has become ubiquitous due to advances in acquisition technology and pervades numerous applications. These massive data gathered as continuous flow, are often accompanied by dire need for real-time processing. One aspect of data streams deals with storage management and processing of continuous queries for aggregation. Another significant aspect pertains to discovery and understanding of hidden patterns to derive actionable knowledge using mining approaches. This chapter focuses on stream clustering and presents a primer of clustering algorithms in data stream environment. Clustering of data streams has gained importance because of its ability to capture natural structures from unlabeled, non-stationary data. Single scan of data, bounded memory usage, and capturing data evolution are the key challenges during clustering of streaming data. We elaborate and compare the algorithms on the basis of these constraints. We also propose a taxonomy of algorithms based on the fundamental approaches used for clustering. For each approach, a systematic description of contemporary, well-known algorithms is presented. We place special emphasis on synopsis data structure used for consolidating characteristics of streaming data and feature it as an important issue in design of a stream clustering algorithms. We argue that a number of functional and operational characteristics (e.g. quality of clustering, handling of outliers, number of parameters etc.) of a clustering algorithm are influenced by the choice of synopsis. A summary of clustering features that are supported by different algorithms is given. Finally, research directions for improvement in the usability of stream clustering algorithms are suggested.

Links and resources

BibTeX key: KaurBhatnagarChakravarthy15p105
entry type: incollection
address: Cham
booktitle: Big Data in Complex Systems
year: 2015
pages: 105--145
publisher: Springer
series: Studies in Big Data
volume: 9
crossref: HassanienTaherAzarEtAl2015
file: SpringerLink:2015/KaurBhatnagarChakravarthy15p105.pdf:PDF
groups: public
intrahash: e32933e5a8afacaa77d99f65c4531234
DOI: 10.1007/978-3-319-11056-1_4
timestamp: 2015.03.04
username: flint63

Cite this publication

%0 Book Section %1 KaurBhatnagarChakravarthy15p105 %A Kaur, Sharanjit %A Bhatnagar, Vasudha %A Chakravarthy, Sharma %B Big Data in Complex Systems %C Cham %D 2015 %E Hassanien, Aboul Ella %E Taher Azar, Ahmad %E Snasael, Vaclav %E Kacprzyk, Janusz %E Abawajy, Jemal H. %I Springer %K 01624 springer paper ai adaptive data pattern recognition analysis algorithm zzz.big %P 105--145 %R 10.1007/978-3-319-11056-1_4 %T Stream Clustering Algorithms: A Primer %V 9 %X Stream data has become ubiquitous due to advances in acquisition technology and pervades numerous applications. These massive data gathered as continuous flow, are often accompanied by dire need for real-time processing. One aspect of data streams deals with storage management and processing of continuous queries for aggregation. Another significant aspect pertains to discovery and understanding of hidden patterns to derive actionable knowledge using mining approaches. This chapter focuses on stream clustering and presents a primer of clustering algorithms in data stream environment. Clustering of data streams has gained importance because of its ability to capture natural structures from unlabeled, non-stationary data. Single scan of data, bounded memory usage, and capturing data evolution are the key challenges during clustering of streaming data. We elaborate and compare the algorithms on the basis of these constraints. We also propose a taxonomy of algorithms based on the fundamental approaches used for clustering. For each approach, a systematic description of contemporary, well-known algorithms is presented. We place special emphasis on synopsis data structure used for consolidating characteristics of streaming data and feature it as an important issue in design of a stream clustering algorithms. We argue that a number of functional and operational characteristics (e.g. quality of clustering, handling of outliers, number of parameters etc.) of a clustering algorithm are influenced by the choice of synopsis. A summary of clustering features that are supported by different algorithms is given. Finally, research directions for improvement in the usability of stream clustering algorithms are suggested.

@incollection{KaurBhatnagarChakravarthy15p105, abstract = {Stream data has become ubiquitous due to advances in acquisition technology and pervades numerous applications. These massive data gathered as continuous flow, are often accompanied by dire need for real-time processing. One aspect of data streams deals with storage management and processing of continuous queries for aggregation. Another significant aspect pertains to discovery and understanding of hidden patterns to derive actionable knowledge using mining approaches. This chapter focuses on stream clustering and presents a primer of clustering algorithms in data stream environment. Clustering of data streams has gained importance because of its ability to capture natural structures from unlabeled, non-stationary data. Single scan of data, bounded memory usage, and capturing data evolution are the key challenges during clustering of streaming data. We elaborate and compare the algorithms on the basis of these constraints. We also propose a taxonomy of algorithms based on the fundamental approaches used for clustering. For each approach, a systematic description of contemporary, well-known algorithms is presented. We place special emphasis on synopsis data structure used for consolidating characteristics of streaming data and feature it as an important issue in design of a stream clustering algorithms. We argue that a number of functional and operational characteristics (e.g. quality of clustering, handling of outliers, number of parameters etc.) of a clustering algorithm are influenced by the choice of synopsis. A summary of clustering features that are supported by different algorithms is given. Finally, research directions for improvement in the usability of stream clustering algorithms are suggested.}, added-at = {2016-06-09T17:12:20.000+0200}, address = {Cham}, author = {Kaur, Sharanjit and Bhatnagar, Vasudha and Chakravarthy, Sharma}, biburl = {https://www.bibsonomy.org/bibtex/2e32933e5a8afacaa77d99f65c4531234/flint63}, booktitle = {Big Data in Complex Systems}, crossref = {HassanienTaherAzarEtAl2015}, doi = {10.1007/978-3-319-11056-1_4}, editor = {Hassanien, Aboul Ella and Taher Azar, Ahmad and Snasael, Vaclav and Kacprzyk, Janusz and Abawajy, Jemal H.}, file = {SpringerLink:2015/KaurBhatnagarChakravarthy15p105.pdf:PDF}, groups = {public}, interhash = {800bb05bcb3dfaa7e9f84f52db525d6c}, intrahash = {e32933e5a8afacaa77d99f65c4531234}, keywords = {01624 springer paper ai adaptive data pattern recognition analysis algorithm zzz.big}, pages = {105--145}, publisher = {Springer}, series = {Studies in Big Data}, timestamp = {2017-07-13T17:34:37.000+0200}, title = {Stream Clustering Algorithms: A Primer}, username = {flint63}, volume = 9, year = 2015 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Stream Clustering Algorithms: A Primer

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Stream Clustering Algorithms: A Primer

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Stream Clustering Algorithms: A Primer

Comments and Reviews
(0)