This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
SecondString is intended primarily for researchers in information integration and other scientists. It does or will include a range of string-matching methods from a variety of communities, including statistics, artificial intelligence, information retrieval, and databases. It also includes tools for systematically evaluating performance on test data. It is not designed for use on very large data sets.
DadaDodo is a program that analyses texts for word probabilities, and then generates random sentences based on that. Sometimes these sentences are nonsense; but sometimes they cut right through to the heart of the matter, and reveal hidden meanings.
The nonsense which follows is a Markov Chain based upon patterns in some pieces of English text. Word-Unit Nonsense uses patterns about words that tend to follow one another. Character-Unit Nonsense uses letters.
S. Mpouli, and J. Ganascia. Proceedings of the Workshop on Resources and Methods for Semantic Processing of Digital Works/Texts, 126, page 21--24. Linköping University Electronic Press, Linköpings universitet, (July 2016)
E. Breck, Y. Choi, and C. Cardie. IJCAI'07: Proceedings of the 20th International Joint Conference on Artifical Intelligence, page 2683--2688. San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., (2007)
A. Hotho, S. Staab, and G. Stumme. Knowledge Discovery in Databases: PKDD 2003, 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, volume 2838 of LNAI, page 217-228. Heidelberg, Springer, (2003)