Misc,

D2.1.1 - Guidelines and best practices for Linguistic Linked Data-based content analytics - Phase I - v2.0

, , , , , , , and .
(October 2014)

Abstract

In this deliverable we have presented guidelines for the publication of multilingual data as linked data. It includes guidelines on the appropriate use of existing vocabularies; the naming of resources; dereferencing resources; encoding textual content; interlinking resources and language identification. It also captures developing best practices garnered from mapping meta-data in existing language resource repositories into linked data. It provides detailed guidelines in mapping major classes of lexical resources and dictionaries into linked data, using the LEMON lexical-semantic vocabulary and the NLP Interchange Format (NIF) as a common base. As a platform for the further development and application of best practice, a critical comparison of existing linguistic meta-data repositories is conducted so as to indicate a path for meta-data harmonisation between these major resources. The work presented here is the result of widespread consultation and engagement with the relevant stakeholder communities. This engagement included the active gathering of requirements and use cases; direct engagement with the communities operating the existing linguistic resource meta-data repositories and ongoing opportunities for influencing the development of technical best practice and linked data vocabulary recommendation through W3C community groups active in this area. This document therefore provides just a snapshot of many ongoing activities and the reader is encouraged to engage with these directly through the links provided.

Tags

Users

  • @magarcia

Comments and Reviews