copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Extremely fast text feature extraction for classification and indexing

G. Forman, and E. Kirshenbaum. CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge management, page 1221--1230. New York, NY, USA, ACM, (2008)
DOI: http://doi.acm.org/10.1145/1458082.1458243

Abstract

Most research in speeding up text mining involves algorithmic improvements to induction algorithms, and yet for many large scale applications, such as classifying or indexing large document repositories, the time spent extracting word features from texts can itself greatly exceed the initial training time. This paper describes a fast method for text feature extraction that folds together Unicode conversion, forced lowercasing, word boundary detection, and string hash computation. We show empirically that our integer hash features result in classifiers with equivalent statistical performance to those built using string word features, but require far less computation and less memory.

Description

Extremely fast text feature extraction for classification and indexing

Links and resources

BibTeX key: Forman08fastExtraction
entry type: inproceedings
address: New York, NY, USA
booktitle: CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge management
year: 2008
pages: 1221--1230
publisher: ACM
location: Napa Valley, California, USA
isbn: 978-1-59593-991-3
DOI: http://doi.acm.org/10.1145/1458082.1458243
url: http://portal.acm.org/citation.cfm?id=1458243

@lee_peck's tags highlighted

Cite this publication

search on

Meta data

Last update 15 years ago
Created 15 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Extremely fast text feature extraction for classification and indexing

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Extremely fast text feature extraction for classification and indexing

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Extremely fast text feature extraction for classification and indexing

Comments and Reviews
(0)