Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text
W. Wong, W. Liu, and M. Bennamoun. AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics, page 83--89. Darlinghurst, Australia, Australia, Australian Computer Society, Inc., (2006)
Abstract
An increasing number of language and speech applications are gearing towards the use of texts from online sources as input. Despite such rise, not much work can be found in the aspect of integrated approaches for cleaning dirty texts from online sources. This paper presents a mechanism of Integrated Scoring for Spelling error correction, Abbreviation expansion and Case restoration (ISSAC). The idea of ISSAC was first conceived as part of the text preprocessing phase in an ontology engineering project. Evaluations of ISSAC using 400 chat records reveal an improved accuracy of 96.5\% over the existing 74.4\% based on the use of Aspell only.
%0 Conference Paper
%1 Wong2006Integrated
%A Wong, Wilson
%A Liu, Wei
%A Bennamoun, Mohammed
%B AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics
%C Darlinghurst, Australia, Australia
%D 2006
%I Australian Computer Society, Inc.
%K badtext
%P 83--89
%T Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text
%U http://portal.acm.org/citation.cfm?id=1273808.1273820
%X An increasing number of language and speech applications are gearing towards the use of texts from online sources as input. Despite such rise, not much work can be found in the aspect of integrated approaches for cleaning dirty texts from online sources. This paper presents a mechanism of Integrated Scoring for Spelling error correction, Abbreviation expansion and Case restoration (ISSAC). The idea of ISSAC was first conceived as part of the text preprocessing phase in an ontology engineering project. Evaluations of ISSAC using 400 chat records reveal an improved accuracy of 96.5\% over the existing 74.4\% based on the use of Aspell only.
%@ 1-920682-41-4
@inproceedings{Wong2006Integrated,
abstract = {An increasing number of language and speech applications are gearing towards the use of texts from online sources as input. Despite such rise, not much work can be found in the aspect of integrated approaches for cleaning dirty texts from online sources. This paper presents a mechanism of Integrated Scoring for Spelling error correction, Abbreviation expansion and Case restoration (ISSAC). The idea of ISSAC was first conceived as part of the text preprocessing phase in an ontology engineering project. Evaluations of ISSAC using 400 chat records reveal an improved accuracy of 96.5\% over the existing 74.4\% based on the use of Aspell only.},
added-at = {2008-12-09T03:00:06.000+0100},
address = {Darlinghurst, Australia, Australia},
author = {Wong, Wilson and Liu, Wei and Bennamoun, Mohammed},
biburl = {https://www.bibsonomy.org/bibtex/23898571bb45edbc2690bcc663da239d3/jamesh},
booktitle = {AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics},
citeulike-article-id = {3749344},
interhash = {345cc80c639b4e16096e757c7bb64be2},
intrahash = {3898571bb45edbc2690bcc663da239d3},
isbn = {1-920682-41-4},
keywords = {badtext},
location = {Sydney, Australia},
pages = {83--89},
posted-at = {2008-12-05 02:49:17},
priority = {2},
publisher = {Australian Computer Society, Inc.},
timestamp = {2008-12-09T09:59:02.000+0100},
title = {Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text},
url = {http://portal.acm.org/citation.cfm?id=1273808.1273820},
year = 2006
}