A maximum entropy approach to identifying sentence boundaries
J. Reynar, and A. Ratnaparkhi. Proceedings of the fifth conference on Applied natural language processing, page 16--19. Stroudsburg, PA, USA, Association for Computational Linguistics, (1997)
DOI: 10.3115/974557.974561
Abstract
We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Romanalphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.
Description
A maximum entropy approach to identifying sentence boundaries
%0 Conference Paper
%1 reynar1997
%A Reynar, Jeffrey C.
%A Ratnaparkhi, Adwait
%B Proceedings of the fifth conference on Applied natural language processing
%C Stroudsburg, PA, USA
%D 1997
%I Association for Computational Linguistics
%K boundary detection disambiguation entropy maximum me sentence splitting
%P 16--19
%R 10.3115/974557.974561
%T A maximum entropy approach to identifying sentence boundaries
%U http://dx.doi.org/10.3115/974557.974561
%X We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Romanalphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.
@inproceedings{reynar1997,
abstract = {We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Romanalphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.},
acmid = {974561},
added-at = {2012-10-26T17:00:29.000+0200},
address = {Stroudsburg, PA, USA},
author = {Reynar, Jeffrey C. and Ratnaparkhi, Adwait},
biburl = {https://www.bibsonomy.org/bibtex/259366d9e5d332886b5440da7d3b383ce/jil},
booktitle = {Proceedings of the fifth conference on Applied natural language processing},
description = {A maximum entropy approach to identifying sentence boundaries},
doi = {10.3115/974557.974561},
interhash = {63e93410bfe518da3cbb3fe114ea85d2},
intrahash = {59366d9e5d332886b5440da7d3b383ce},
keywords = {boundary detection disambiguation entropy maximum me sentence splitting},
location = {Washington, DC},
numpages = {4},
pages = {16--19},
publisher = {Association for Computational Linguistics},
series = {ANLC '97},
timestamp = {2013-11-23T20:11:51.000+0100},
title = {A maximum entropy approach to identifying sentence boundaries},
url = {http://dx.doi.org/10.3115/974557.974561},
year = 1997
}