@trisse69

Terminology Evolution Module for Web Archives in the LiWA Context

, , , and . Proc. of 10th International Web Archiving Workshop in conjunction with iPRES in Vienna, Austria, 2010, (2010)

Abstract

More and more national libraries and institutes are archiving the web as a part of the cultural heritage. As with all long term archives, these archives contain text and language that evolves over time. This is particularly true for web archives as content published online is highly dynamic and changing at a fast rate. The language evolution causes gaps between the terminology used for querying and the one stored in long term archives. To ensure access and interpretability of these archives, language evolution must be found and handled in an automatic manner. In this paper we present the LiWA Terminology evolution module, TeVo which takes us one step closer to fully automatic detection of terminology evolution. TeVo consists of a pipeline for finding evolution from web archives based on the UIMA framework. The LiWA TeVo module consists of two main processing chains, the first for Warc file extraction and text processing and the second for finding terminology evolution. We also present the terminology evolution browser, the TeVo browser, which aids in exploring evolution of terms present in archives.

Links and resources

Tags

community

  • @tahmasebi
  • @trisse69
  • @l3s
@trisse69's tags highlighted