CiteXplore combines literature search with text mining tools for biology.
Search results are cross referenced to EBI applications based on publication identifiers.
Links to full text versions are provided where available.
20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
The nonsense which follows is a Markov Chain based upon patterns in some pieces of English text. Word-Unit Nonsense uses patterns about words that tend to follow one another. Character-Unit Nonsense uses letters.
DadaDodo is a program that analyses texts for word probabilities, and then generates random sentences based on that. Sometimes these sentences are nonsense; but sometimes they cut right through to the heart of the matter, and reveal hidden meanings.
This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
SecondString is intended primarily for researchers in information integration and other scientists. It does or will include a range of string-matching methods from a variety of communities, including statistics, artificial intelligence, information retrieval, and databases. It also includes tools for systematically evaluating performance on test data. It is not designed for use on very large data sets.
A. Maedche, und S. Staab. ECAI-2000 --Proceedings of the 13th European Conference on Artificial Intelligence, Seite 321--325. IOS Press, Amsterdam, (2000)
S. Bloehdorn, und A. Hotho. Proceedings of the Workshop on Text-based Information Retrieval (TIR-04) at the 27th German Conference on Artificial Intelligence, (September 2004)
Y. Zhao, und G. Karypis. CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management, Seite 515--524. New York, NY, USA, ACM Press, (2002)
I. Dhillon, Y. Guan, und J. Kogan. 2nd SIAM International Conference on Data Mining (Workshop on Clustering High-Dimensional Data and its Applications), (2002)
F. Beil, M. Ester, und X. Xu. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Seite 436--442. ACM Press, (2002)
S. Bloehdorn, P. Cimiano, A. Hotho, und S. Staab. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 20 (1):
87-112(Mai 2005)
S. Bloehdorn, und A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Seite 70-87. (August 2004)
A. Hotho, A. Maedche, und S. Staab. Proc. of the Workshop ``Text Learning: Beyond Supervision'' at IJCAI 2001. Seattle, WA, USA, August 6, 2001, (2001)
S. Bloehdorn, und A. Hotho. Proceedings of the Fourth IEEE International Conference on Data Mining, Seite 331-334. IEEE Computer Society Press, (November 2004)
F. Beil, M. Ester, und X. Xu. KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Seite 436--442. New York, NY, USA, ACM Press, (2002)
A. Hotho, S. Staab, und G. Stumme. Knowledge Discovery in Databases: PKDD 2003, 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Volume 2838 von LNAI, Seite 217-228. Heidelberg, Springer, (2003)
A. Hotho, S. Staab, und G. Stumme. Proceedings of the 2003 IEEE International Conference on Data Mining, Seite 541-544 (Poster. Melbourne, Florida, IEEE Computer Society, (November 2003)
M. Sanderson, und W. Croft. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'99, Seite 206--213. (1999)
S. Staab, und A. Hotho. Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM'03 Conference held in Zakopane, Seite 451-452. (2003)
A. Hotho, S. Staab, und G. Stumme. Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD, Volume 2838 von LNCS, Seite 217-228. (2003)
B. Lauser, und A. Hotho. Proc. of the 7th European Conference in Research and Advanced Technology for Digital Libraries, ECDL 2003, Volume 2769 von LNCS, Seite 140-151. Springer, (2003)
P. Cimiano, A. Hotho, und S. Staab. Proceedings of the Conference on Languages Resources and Evaluation (LREC), Lisbon, Portugal, ELRA - European Language Ressources Association, (Mai 2004)
A. Hotho, und G. Stumme. Proceedings of FGML Workshop, Seite 37-45. Special Interest Group of German Informatics Society (FGML --- Fachgruppe Maschinelles Lernen der GI e.V.), (2002)
A. Hotho, A. Maedche, und S. Staab. ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, Seite 607--608. Washington, DC, USA, IEEE Computer Society, (2001)
L. Baker, und A. McCallum. Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval, Seite 96--103. Melbourne, AU, ACM Press, New York, US, (1998)
G. Ifrim, M. Theobald, und G. Weikum. Proceedings of the 22nd International Conference on Machine Learning - Learning in Web Search (LWS 2005), Seite 18--26. Bonn, Germany, (2005)
E. Breck, Y. Choi, und C. Cardie. IJCAI'07: Proceedings of the 20th International Joint Conference on Artifical Intelligence, Seite 2683--2688. San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., (2007)
C. Luo, Y. Li, und S. Chung. Data & Knowledge Engineering, 68 (11):
1271 - 1288(2009)Including Special Section: Conference on Privacy in Statistical Databases (PSD 2008) - Six selected and extended papers on Database Privacy.
W. Cavnar, und J. Trenkle. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Seite 161--175. Las Vegas, US, (1994)
B. Martins, H. Manguinhas, und J. Borbinha. Proceedings of the International Conference on Semantic Computing, Seite 1--9. IEEE Computer Society, (August 2008)
M. Hearst. Proceedings of the 14th conference on Computational linguistics, 2, Seite 539--545. Stroudsburg, PA, USA, Association for Computational Linguistics, (1992)
R. Nallapati, A. Ahmed, E. Xing, und W. Cohen. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seite 542--550. New York, NY, USA, ACM, (2008)
S. Dori-Hacohen, und J. Allan. Proceedings of the 22nd ACM international conference on Conference on information &\#38; knowledge management, Seite 1845--1848. New York, NY, USA, ACM, (2013)
C. Kohlschütter, P. Fankhauser, und W. Nejdl. Proc. of 3rd ACM International Conference on Web Search and Data Mining New York City, NY USA (WSDM 2010)., (2010)
C. Au Yeung, und A. Jatowt. Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Seite 1231--1240. New York, NY, USA, ACM, (2011)
S. Mpouli, und J. Ganascia. Proceedings of the Workshop on Resources and Methods for Semantic Processing of Digital Works/Texts, 126, Seite 21--24. Linköping University Electronic Press, Linköpings universitet, (Juli 2016)
X. Zhang, und Y. LeCun. (2015)cite arxiv:1502.01710Comment: This technical report is superseded by a paper entitled "Character-level Convolutional Networks for Text Classification", arXiv:1509.01626. It has considerably more experimental results and a rewritten introduction.