copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Gauging Similarity with n-Grams: Language-Independent Categorization of Text

M. Damashek. Science, 267 (5199): 843--848 (February 1995)
DOI: 10.1126/science.267.5199.843

Abstract

A language-independent means of gauging topical similarity in unrestricted text is described. The method combines information derived from n-grams (consecutive sequences of n characters) with a simple vector-space technique that makes sorting, categorization, and retrieval feasible in a large multilingual collection of documents. No prior information about document content or language is required. Context, as it applies to document similarity, can be accommodated by a well-defined procedure. When an existing document is used as an exemplar, the completeness and accuracy with which topically related documents are retrieved is comparable to that of the best existing systems. The results of a formal evaluation are discussed, and examples are given using documents in English and Japanese.

@danielt's tags highlighted

Cite this publication

search on

Meta data

Last update 16 years ago
Created 16 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Gauging Similarity with n-Grams: Language-Independent Categorization of Text

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Gauging Similarity with n-Grams: Language-Independent Categorization of Text

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Gauging Similarity with n-Grams: Language-Independent Categorization of Text

Comments and Reviews
(0)