CiteXplore combines literature search with text mining tools for biology.
Search results are cross referenced to EBI applications based on publication identifiers.
Links to full text versions are provided where available.
20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
The nonsense which follows is a Markov Chain based upon patterns in some pieces of English text. Word-Unit Nonsense uses patterns about words that tend to follow one another. Character-Unit Nonsense uses letters.
DadaDodo is a program that analyses texts for word probabilities, and then generates random sentences based on that. Sometimes these sentences are nonsense; but sometimes they cut right through to the heart of the matter, and reveal hidden meanings.
This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
SecondString is intended primarily for researchers in information integration and other scientists. It does or will include a range of string-matching methods from a variety of communities, including statistics, artificial intelligence, information retrieval, and databases. It also includes tools for systematically evaluating performance on test data. It is not designed for use on very large data sets.
TIR 2010
7th International Workshop on Text-based Information Retrieval
in conjunction with DEXA 2010
University of Deusto
Bilbao, Spain
30 August - 3 September 2010
Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).
markdown-mode is a major mode for editing Markdown-formatted
text files in GNU Emacs. markdown-mode is free software, licensed
under the GNU GPL.
S. Jänicke, T. Efer, M. Büchler, und G. Scheuermann. Computer Vision, Imaging and Computer Graphics - Theory and Applications, Seite 153--171. Cham, Springer International Publishing, (2015)
F. Arnold, und R. Jäschke. Proceedings of the Workshop on Natural Language Processing for Digital Humanities at ICON 2021, Seite 55--63. NLP Association of India, (2021)
A. Nenkova, und R. Passonneau. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, Seite 145--152. Boston, Massachusetts, USA, Association for Computational Linguistics, (2004)
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, und E. Hovy. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seite 1480--1489. San Diego, California, Association for Computational Linguistics, (Juni 2016)
P. Moreira, Y. Bizzoni, K. Nielbo, I. Lassen, und M. Thomsen. Proceedings of the The 5th Workshop on Narrative Understanding, Seite 25--35. Toronto, Canada, Association for Computational Linguistics, (Juli 2023)