Abstract
The archival of content like publications or web pages is just the first step toward ?full? content preservation. It also has to be guaranteed that content can be found and interpreted in the long run. The correspondence between the terminology used for querying and the one used in content objects to be retrieved, is a crucial prerequisite for effective retrieval technology. However, as terminology evolves over time, a growing gap opens between older documents in (longterm) archives and the active language used for querying such archives. Thus, technologies for detecting and systematically handling terminology evolution are required to ensure ?semantic? accessibility of archived content in the long run. The core of our approach is to derive mappings between terminologies originating from different times by the fusion of term concept graphs. To verify the suitability of our approach, we present first results of experiments conducted on The Times archive that covers 200 years of documents. In addition, we discuss how our approach can be applied to web archives and the challenges that arise from this.
Users
Please
log in to take part in the discussion (add own reviews or comments).