Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" [1]. It was primarily developed for language guessing, a task on which it is known to perform with near-pe
Chomsky bot written in Ruby. A funny little thing which generates random paragraphs of text from a set sentence building blocks. It combines four kinds of phrases (introduction phrases, subject phrases, verb phrases and object phrases) into a sentence. The sentences this simple construction can create are amazing. They are syntactically correct and "hovers on the edge on understandability".
M. Schwab, R. Jäschke, and F. Fischer. Proceedings of the 6th International Conference on Natural Language and Speech Processing, page 99--109. Association for Computational Linguistics, (2023)
P. Xia, S. Wu, and B. Van Durme. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 7516--7533. Association for Computational Linguistics, (November 2020)
S. Bordia, and S. Bowman. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, page 7--15. Minneapolis, Minnesota, Association for Computational Linguistics, (June 2019)
S. Blodgett, S. Barocas, H. Daumé III, and H. Wallach. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, page 5454--5476. Online, Association for Computational Linguistics, (July 2020)