BibSonomy :: bibtex  ::

tag user group author concept BibTeX key search:all search:diego_ma
A blue social bookmark and publication sharing system.
tags · relations · groups · popular
help · blog · about
login · register
diego_ma's BibTeX entry:  

Orthographic Case Restoration Using Supervised Learning without Manual Annotation

International Journal on Artificial Intelligence Tools, 13(1): 141-156, 2004.
Authors: Cheng Niu and Wei Li and Jihong and Rohini Shrihari
URL: http://homepage.mac.com/liwei999/WeiLi/Publications.html
Tags: named_entities question_answering speech
Abstract: One challenge in text processing is the treatment of case insensitive documents such as speech recognition results. The traditional approach is to re-train a language model excluding case-related features. This paper presents an alternative two-step approach whereby a preprocessing module (Step 1) is designed to restore case-sensitive form which is subsequently processed by the original system (Step 2). Step 1 is mainly implemented as a Hidden Markov Model trained on a large raw corpus of case sensitive documents. It is demonstrated that this approach (i) outperforms the feature exclusion approach for named entity tagging, (ii) leads to limited degradation for parsing, relationship extraction and case insensitive question answering, (iii) reduces system complexity, and (iv) has wide applicability: the restored text can be used in both statistical model and rule-based systems.
| URL | BibTeX  
@article{Niu:2004,
title = {Orthographic Case Restoration Using Supervised Learning without Manual Annotation},
author = {Cheng Niu and Wei Li and Jihong and Rohini Shrihari},
journal = {International Journal on Artificial Intelligence Tools},
number = {1},
pages = {141-156},
url = {http://homepage.mac.com/liwei999/WeiLi/Publications.html},
volume = {13},
year = {2004},
abstract = {One challenge in text processing is the treatment of case insensitive documents such as speech recognition results. The traditional approach is to re-train a language model excluding case-related features. This paper presents an alternative two-step approach whereby a preprocessing module (Step 1) is designed to restore case-sensitive form which is subsequently processed by the original system (Step 2). Step 1 is mainly implemented as a Hidden Markov Model trained on a large raw corpus of case sensitive documents. It is demonstrated that this approach (i) outperforms the feature exclusion approach for named entity tagging, (ii) leads to limited degradation for parsing, relationship extraction and case insensitive question answering, (iii) reduces system complexity, and (iv) has wide applicability: the restored text can be used in both statistical model and rule-based systems.},
keywords = {named_entities question_answering speech }
}