@jamesh

Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text

, , and . AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics, page 83--89. Darlinghurst, Australia, Australia, Australian Computer Society, Inc., (2006)

Abstract

An increasing number of language and speech applications are gearing towards the use of texts from online sources as input. Despite such rise, not much work can be found in the aspect of integrated approaches for cleaning dirty texts from online sources. This paper presents a mechanism of Integrated Scoring for Spelling error correction, Abbreviation expansion and Case restoration (ISSAC). The idea of ISSAC was first conceived as part of the text preprocessing phase in an ontology engineering project. Evaluations of ISSAC using 400 chat records reveal an improved accuracy of 96.5\% over the existing 74.4\% based on the use of Aspell only.

Links and resources

Tags

community

  • @dblp
  • @jamesh
@jamesh's tags highlighted