Researchers at Google annotated English-language Web pages from the ClueWeb09 and ClueWeb12 corpora. The annotation process was automatic, and hence imperfect. However, the annotations are of generally high quality, as they strove for high precision (and, by necessity, lower recall). For each entity they recognized with high confidence, they provide the beginning and end byte offsets of the entity mention in the input text, its Freebase identifier (mid), and two confidence levels (computed differently, see below).
You might consider using this data in conjunction with the recently released Freebase annotations of several TREC query sets. ·
Concept search, full-text search and annotation structure search in one scaleable index: "Mímir is a multi-paradigm information management index and repository which can be used to index and search over text, annotations, semantic schemas (ontologies), and semantic meta-data (instance data). It allows queries that arbitrarily mix full-text, structural, linguistic and semantic queries and that can scale to gigabytes of text. A typical semantic annotation project deals with large quantities of data of different kinds. Mímir provides a framework for implementing indexing and search functionality across all these data type." ·
thumbtack collect, organize, share use thumbtack to collect a list of your favorite restaurants and share them with your friends plan a trip- collect information about places to stay and things to do research your next purchase- store, analyze and sift through your options in thumbtack take notes and share them with your team ·
Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. ·
Basically, its an RDF-based web annotations system.
Three JISC-funded projects have a requirement to allow people to annotate events and other things. The projects are:
* Collaborative Research on the Web (CREW) - University of Bristol and University of Manchester
* Semantic Tools for Screen Arts Research (STARS) - University of Bristol
* Integration Project (CIP) - University of Bristol
The Caboto project was setup to create a collaborative effort to fulfill the requirements of CREW, STARS and CIP.
The requirements from the JISC projects:
* CREW Events Requirements
* CIP Requirements
* STARS Requirements
The project is in the early stages but its is possible to obtain and run the project: ·
Wei Wu, Bin Zhang, and Mari Ostendorf. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, page 689--692. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)