Developing AI Tools For A Writing Assistant: Automatic Detection of dt-mistakes In Dutch
W. Mercelis. International Journal of Computational Linguistics (IJCL)12
9-23 (June 2021)
This paper describes a lightweight, scalable model that predicts whether a Dutch verb ends in -d, -t or -dt. The confusion of these three endings is a common Dutch spelling mistake. If the predicted ending is different from the ending as written by the author, the system will signal the dt-mistake. This paper explores various data sources to use in this classification task, such as the Europarl Corpus, the Dutch Parallel Corpus and a Dutch Wikipedia corpus. Different architectures are tested for the model training, focused on a transfer learning approach with ULMFiT. The trained model can predict the right ending with 99.4% accuracy, and this result is comparable to the current state-of-the-art performance. Adjustments to the training data and the use of other part-of-speech taggers may further improve this performance. As discussed in this paper, the main advantages of the approach are the short training time and the potential to use the same technique with other disambiguation tasks in Dutch or in other languages.