Inproceedings,

Seeing more than whitespace—Tokenisation and disambiguation in a North Sámi grammar checker

L. Wiechetek, K. Unhammer, and S. Moshagen.
Workshop on the Use of Computational Methods in the Study of Endangered Languages, 1, page 46. ComputEL, (2019)

Abstract

Communities of lesser resourced languages like North Sámi benefit from language tools such as spell checkers and grammar checkers to improve literacy. Accurate error feedback is dependent on well-tokenised input, but traditional tokenisation as shallow preprocessing is inadequate to solve the challenges of real-world language usage. We present an alternative where tokenisation remains ambiguous until we have linguistic context information available. This lets us accurately detect sentence boundaries, multiwords and compound error detection. We describe a North Sámi grammarchecker with such a tokenisation system, and show the results of its evaluation.

BibTeX key: wiechetek2019seeing
entry type: inproceedings
booktitle: Workshop on the Use of Computational Methods in the Study of Endangered Languages
year: 2019
organization: ComputEL
pages: 46
volume: 1
venue: Honolulu, Hawai’i
url: https://computel-workshop.org/wp-content/uploads/2019/02/CEL3_book_papers_draft.pdf#page=58

BibSonomy

Seeing more than whitespace—Tokenisation and disambiguation in a North Sámi grammar checker

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on