copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Towards Extracting Event-Centric Collections from Web Archives

G. Gossen, T. Risse, and E. Demidova. International Journal on Digital Libraries, (2018)accepted for publication in October 2018, to appear..
DOI: 10.1007/s00799-018-0258-6

Abstract

Web archives constitute an increasingly important source of information for computer scientists, humanities researchers and journalists interested in studying past events. However, currently there are no access methods that help Web archive users to efficiently access event-centric information in large-scale archives that go beyond the retrieval of individual disconnected documents. In this article, we tackle the novel problem of extracting interlinked event-centric document collections from large-scale Web archives to facilitate an efficient and intuitive access to information regarding past events. We address this problem by: 1) facilitating users to define event-centric document collections in an intuitive way through a Collection Specification; 2) development of a specialised extraction method that adapts focused crawling techniques to the Web archive settings; and 3) definition of a function to judge the relevance of the archived documents with respect to the Collection Specification taking into account the topical and temporal relevance of the documents. Our extended experiments on the German Web archive (covering a time period of 19 years) demonstrate that our method enables efficient extraction of event-centric collections for different event types.

Links and resources

BibTeX key

gossen2018towards

entry type

article

year

2018

journal

International Journal on Digital Libraries

issn

1432-5012

DOI

10.1007/s00799-018-0258-6

additional links

note

accepted for publication in October 2018, to appear.

@demidova's tags highlighted

Cite this publication

@article{gossen2018towards, abstract = {Web archives constitute an increasingly important source of information for computer scientists, humanities researchers and journalists interested in studying past events. However, currently there are no access methods that help Web archive users to efficiently access event-centric information in large-scale archives that go beyond the retrieval of individual disconnected documents. In this article, we tackle the novel problem of extracting interlinked event-centric document collections from large-scale Web archives to facilitate an efficient and intuitive access to information regarding past events. We address this problem by: 1) facilitating users to define event-centric document collections in an intuitive way through a Collection Specification; 2) development of a specialised extraction method that adapts focused crawling techniques to the Web archive settings; and 3) definition of a function to judge the relevance of the archived documents with respect to the Collection Specification taking into account the topical and temporal relevance of the documents. Our extended experiments on the German Web archive (covering a time period of 19 years) demonstrate that our method enables efficient extraction of event-centric collections for different event types.}, added-at = {2018-10-15T16:45:54.000+0200}, author = {Gossen, Gerhard and Risse, Thomas and Demidova, Elena}, biburl = {https://www.bibsonomy.org/bibtex/29fb73bfb6fa302e5731a5ea6c546109e/demidova}, doi = {10.1007/s00799-018-0258-6}, interhash = {b759ff84a3c23e8a5041b69698cd821b}, intrahash = {9fb73bfb6fa302e5731a5ea6c546109e}, issn = {1432-5012}, journal = {International Journal on Digital Libraries}, keywords = {alexandria cleopatra data4urbanmobility gossen myown}, note = {accepted for publication in October 2018, to appear.}, timestamp = {2018-10-23T09:16:08.000+0200}, title = {Towards Extracting Event-Centric Collections from Web Archives}, year = 2018 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Towards Extracting Event-Centric Collections from Web Archives

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Towards Extracting Event-Centric Collections from Web Archives

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Towards Extracting Event-Centric Collections from Web Archives

Comments and Reviews
(0)