Text Classification Through Time: Efficient Label Propagation in Time-Based Graphs
S. Baluja, D. Ravichandran, und D. Sivakumar. Proceeding of the International Conference on Knowledge Discovery and Information Retrieval (KDIR 2009), INSTICC, (6-8 oct 2009)
Zusammenfassung
One of the fundamental assumptions for machine learning based text classification systems is that the underlying distribution from which the set of labeled-text is drawn is identical to the distribution from which the text-to-be-labeled is drawn. However, in live news aggregation sites, this assumption is rarely correct. Instead, the events and topics discussed in news stories dramatically change over time. Rather than ignoring this phenomenon, we attempt to explicitly model the transitions of news stories and classifications over time to label stories that may be acquired months after the initial examples are labeled. We test our system, based on efficiently propagating labels in time-based graphs, with recently published news stories collected over an eighty day period. Experiments presented in this paper include the use of training labels from each story within the first several days of gathering stories, to using a single story as a label.
Beschreibung
CiteULike: Text Classification Through Time: Efficient Label Propagation in Time-Based Graphs
%0 Conference Paper
%1 Baluja09
%A Baluja, Shumeet
%A Ravichandran, Deepak
%A Sivakumar, D.
%B Proceeding of the International Conference on Knowledge Discovery and Information Retrieval (KDIR 2009)
%D 2009
%K hashing label-propagation machine-learning
%T Text Classification Through Time: Efficient Label Propagation in Time-Based Graphs
%X One of the fundamental assumptions for machine learning based text classification systems is that the underlying distribution from which the set of labeled-text is drawn is identical to the distribution from which the text-to-be-labeled is drawn. However, in live news aggregation sites, this assumption is rarely correct. Instead, the events and topics discussed in news stories dramatically change over time. Rather than ignoring this phenomenon, we attempt to explicitly model the transitions of news stories and classifications over time to label stories that may be acquired months after the initial examples are labeled. We test our system, based on efficiently propagating labels in time-based graphs, with recently published news stories collected over an eighty day period. Experiments presented in this paper include the use of training labels from each story within the first several days of gathering stories, to using a single story as a label.
@inproceedings{Baluja09,
abstract = {One of the fundamental assumptions for machine learning based text classification systems is that the underlying distribution from which the set of labeled-text is drawn is identical to the distribution from which the text-to-be-labeled is drawn. However, in live news aggregation sites, this assumption is rarely correct. Instead, the events and topics discussed in news stories dramatically change over time. Rather than ignoring this phenomenon, we attempt to explicitly model the transitions of news stories and classifications over time to label stories that may be acquired months after the initial examples are labeled. We test our system, based on efficiently propagating labels in time-based graphs, with recently published news stories collected over an eighty day period. Experiments presented in this paper include the use of training labels from each story within the first several days of gathering stories, to using a single story as a label.},
added-at = {2011-05-17T21:14:34.000+0200},
author = {Baluja, Shumeet and Ravichandran, Deepak and Sivakumar, D.},
biburl = {https://www.bibsonomy.org/bibtex/2b8c355086dd09a91893c7a643e9796f4/gromgull},
booktitle = {Proceeding of the International Conference on Knowledge Discovery and Information Retrieval (KDIR 2009)},
citeulike-article-id = {5973472},
day = {6-8},
description = {CiteULike: Text Classification Through Time: Efficient Label Propagation in Time-Based Graphs},
interhash = {f3ea7281afdc41ffeec46f9367a241e6},
intrahash = {b8c355086dd09a91893c7a643e9796f4},
keywords = {hashing label-propagation machine-learning},
location = {Madeira, Portugal},
month = oct,
organization = {INSTICC},
posted-at = {2009-10-20 10:46:40},
priority = {0},
timestamp = {2011-05-17T21:14:34.000+0200},
title = {Text Classification Through Time: Efficient Label Propagation in {Time-Based} Graphs},
year = 2009
}