Abstract

In this deliverable we describe guidelines for the publication of multilingual linguistic data as linked data. It includes guidelines on the appropriate use of existing vocabularies; the naming of resources; dereferencing resources; encoding textual content; interlinking resources, language identification as well as licensing resources. It also captures developing best practices gathered as a byproduct from mapping metadata in existing language resource repositories into linked data. It provides detailed guidelines in mapping major classes of lexical resources and dictionaries into linked data, using the lemon lexical-semantic vocabulary and the NLP Interchange Format (NIF) as a common base. The work presented here is the result of widespread consultation and engagement with the relevant stakeholder communities. This engagement includes the active gathering of requirements and use cases; direct engagement with the communities operating the existing linguistic resource metadata repositories and ongoing opportunities for influencing the development of technical best practice and linked data vocabulary recommendation through W3C community groups active in this area. This document therefore provides just a snapshot of many ongoing activities and the reader is encouraged to engage with these directly through the links provided. This document evolves its previous version D2.1.1 with updated versions of the guidelines reported in that document, and with the inclusion of some new reference cards that were developed during the second phase of the project.

Links and resources

Tags