Developing AI Tools For A Writing Assistant: Automatic Detection of dt-mistakes In Dutch

W. Mercelis.
International Journal of Computational Linguistics (IJCL), 12 (2): 9-23 (июня 2021)

Аннотация

This paper describes a lightweight, scalable model that predicts whether a Dutch verb ends in -d, -t or -dt. The confusion of these three endings is a common Dutch spelling mistake. If the predicted ending is different from the ending as written by the author, the system will signal the dt-mistake. This paper explores various data sources to use in this classification task, such as the Europarl Corpus, the Dutch Parallel Corpus and a Dutch Wikipedia corpus. Different architectures are tested for the model training, focused on a transfer learning approach with ULMFiT. The trained model can predict the right ending with 99.4% accuracy, and this result is comparable to the current state-of-the-art performance. Adjustments to the training data and the use of other part-of-speech taggers may further improve this performance. As discussed in this paper, the main advantages of the approach are the short training time and the potential to use the same technique with other disambiguation tasks in Dutch or in other languages.

ключ BibTeX: mercelisdeveloping
тип записи: article
год: 2021
месяц: June
журнал: International Journal of Computational Linguistics (IJCL)
номер: 2
страницы: 9-23
том: 12
language: English
issn: 2180-1266
url: https://www.cscjournals.org/library/manuscriptinfo.php?mc=IJCL-121

тэги

Пользователи данного ресурса

Комментарии и рецензиипоказать / перейти в невидимый режим

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

BibSonomy