@pprett

N-Gram-Based Text Categorization

, и . Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, стр. 161--175. Las Vegas, US, (1994)

Аннотация

Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We...

Линки и ресурсы

тэги

сообщество