Misc,

D3.2.2 - Roadmap for the use of linguistic linked data for content analytics – Phase II

, , , , , , , , , and .
(2015)

Abstract

Page 3 of 57 Executive Summary As data is being created at an ever-increasing pace, more and more organizations are seeking to exploit insights generated from that data to optimize processes, improve decision making, and to modify existing business models or generate radically new ones. Data has been, in fact, regarded as the oil of the new economy. However, raw data per se is useless if we can not analyze it automatically and extract key insights from it. The main challenges involved with processing data are: i) extracting key insights by performing semantic analysis on both structured and unstructured data in combination with existing semantic knowledge (i.e., knowledge bases and ontologies), ii) repurposing information so that it can be recombined, restructured and reused for a different purpose than originally conceived, and iii) closing the language gap by making information reusable across languages. Linked Data is the key technology to address the three problems mentioned above, as it allows a wide range of semantic and language resources to be discovered and leveraged in analysing unstructured data. This results in more links between structured and unstructured data; these links themselves can be published as linked data and leveraged for further analysis. Most data is still unstructured, thus remaining inaccessible to machines. Techniques for extracting meaning and key insights, aggregating and integrating them across documents are needed. Without this, the insights generated from data will be very limited. By linking unstructured and structured data to existing concepts and entities described in relevant domain ontologies and knowledge bases, semantic normalization can be achieved and data can be analysed at the level of the things it refers to rather than the strings it contains. Semantic analysis also supports repurposing bits of information into new pieces of information that can be reused for a different purpose or audience than originally conceived for. Closing the language gap is a key challenge within the Digital Single Market (DSM), which is one of the key priorities of the European Commission. The DSM is expected to generate 415 billion EUR in additional growth. While the legal framework supporting the DSM is currently developed by the European Commission, the benefits of the DSM will not come about if consumers cannot utilize product and product-related information in their own language. The DSM will be a failure if there is no technology supporting sellers in providing information about their products not only in multiple languages but most importantly in machine readable form so that machines can also understand this information as a basis to provide product comparison sites. While better translation services are an important ingredient to make the DSM happen in practice, it is by far not the most important one. Rather than sticking to textual descriptions of products, a key challenge will be to move to semantic descriptions of products that are linked to agreed-upon vocabularies and semantic product catalogues that are localized into multiple languages and that vendors can link their product descriptions to. Relevant Semantic Web and Linked Data technologies and best practices will play a key role here. In order to support the implementation of the DSM and to leverage the expected benefits, an R&D investment in research and innovation in the order of 3-4 Billion EUR will be needed. Supporting access to and reuse of public sector information is also a high-priority goal of the DSM. However, this will only be accomplished by machines that can analyse the data semantically, across data sets and languages, by aggregating pieces of relevant information to make overall sense of the entire data. Thus, data linking at the semantic level, also across languages, is key. This can be supported by a layer of linguistic linked data containing terminologies, dictionaries and vocabularies that are linked across languages and domains such that public sector information and open data can unfold their full potential. Without language-aware linked data technologies, the gains from content analytics will be very limited. Only data linking will provide the basis of the added value that everyone expects from big data. If we fail to develop a strong industry and intellectual environment that can deploy linked data technologies and boost the data economy, we will not only fail to generate valuable insights from data: Europe will also clearly fall behind global competitors in the US who will then dominate the field of data analytics.

Tags

Users

  • @magarcia

Comments and Reviews