Natural Language Corpus Data: Beautiful Data
This directory contains code and data to accompany the chapter Natural Language Corpus Data from the book Beautiful Data (Segaran and Hammerbacher, 2009). If you like this you may also like: How to Write a Spelling Corrector.
The BioScope corpus consists of medical and biological texts annotated for negation, speculation and their linguistic scope. This was done to allow a comparison between the development of systems for negation/hedge detection and scope resolution. The corpus is publicly available for research purposes.
eTBLAST is a unique search engine for searching biomedical literature. it lets you input an entire paragraph and returns MEDLINE abstracts that are similar to it.
Wired Magazine issue 16.07. Data Deluge. Crop predictions. Quark. Data mining. tracking news. watching the skies, scanning skeletons. airfares. voting. epidemics. google events. terrorism. visualizing big data
TAPoR will build a unique human and computing infrastructure for text analysis across the country by establishing six regional centers to form one national text analysis research portal.