Chomsky bot written in Ruby. A funny little thing which generates random paragraphs of text from a set sentence building blocks. It combines four kinds of phrases (introduction phrases, subject phrases, verb phrases and object phrases) into a sentence. The sentences this simple construction can create are amazing. They are syntactically correct and "hovers on the edge on understandability".
Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" [1]. It was primarily developed for language guessing, a task on which it is known to perform with near-pe
S. Blodgett, S. Barocas, H. Daumé III, and H. Wallach. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, page 5454--5476. Online, Association for Computational Linguistics, (July 2020)
S. Bordia, and S. Bowman. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, page 7--15. Minneapolis, Minnesota, Association for Computational Linguistics, (June 2019)
P. Xia, S. Wu, and B. Van Durme. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 7516--7533. Association for Computational Linguistics, (November 2020)
M. Schwab, R. Jäschke, and F. Fischer. Proceedings of the 6th International Conference on Natural Language and Speech Processing, page 99--109. Association for Computational Linguistics, (2023)