P. Schaer, and M. Neumann. Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11-14, 2017, Proceedings, volume 10456 of Lecture Notes in Computer Science, (2017)
Abstract
Extending TREC-style test collections by incorporating external resources is
a time consuming and challenging task. Making use of freely available web data
requires technical skills to work with APIs or to create a web scraping program
specifically tailored to the task at hand. We present a light-weight
alternative that employs the web data extraction language OXPath to harvest
data to be added to an existing test collection from web resources. We
demonstrate this by creating an extended version of GIRT4 called GIRT4-XT with
additional metadata fields harvested via OXPath from the social sciences portal
Sowiport. This allows the re-use of this collection for other evaluation
purposes like bibliometrics-enhanced retrieval. The demonstrated method can be
applied to a variety of similar scenarios and is not limited to extending
existing collections but can also be used to create completely new ones with
little effort.
Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11-14, 2017, Proceedings
%0 Conference Paper
%1 schaer2017enriching
%A Schaer, Philipp
%A Neumann, Mandy
%B Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11-14, 2017, Proceedings
%D 2017
%E Jones, Gareth J. F.
%E Lawless, Séamus
%E Gonzalo, Julio
%E Kelly, Liadh
%E Goeuriot, Lorraine
%E Mandl, Thomas
%E Cappellato, Linda
%E Nicola, Ferro
%K myown neumann schaer sh2
%T Enriching Existing Test Collections with OXPath
%U http://arxiv.org/abs/1706.06836
%V 10456
%X Extending TREC-style test collections by incorporating external resources is
a time consuming and challenging task. Making use of freely available web data
requires technical skills to work with APIs or to create a web scraping program
specifically tailored to the task at hand. We present a light-weight
alternative that employs the web data extraction language OXPath to harvest
data to be added to an existing test collection from web resources. We
demonstrate this by creating an extended version of GIRT4 called GIRT4-XT with
additional metadata fields harvested via OXPath from the social sciences portal
Sowiport. This allows the re-use of this collection for other evaluation
purposes like bibliometrics-enhanced retrieval. The demonstrated method can be
applied to a variety of similar scenarios and is not limited to extending
existing collections but can also be used to create completely new ones with
little effort.
@inproceedings{schaer2017enriching,
abstract = {Extending TREC-style test collections by incorporating external resources is
a time consuming and challenging task. Making use of freely available web data
requires technical skills to work with APIs or to create a web scraping program
specifically tailored to the task at hand. We present a light-weight
alternative that employs the web data extraction language OXPath to harvest
data to be added to an existing test collection from web resources. We
demonstrate this by creating an extended version of GIRT4 called GIRT4-XT with
additional metadata fields harvested via OXPath from the social sciences portal
Sowiport. This allows the re-use of this collection for other evaluation
purposes like bibliometrics-enhanced retrieval. The demonstrated method can be
applied to a variety of similar scenarios and is not limited to extending
existing collections but can also be used to create completely new ones with
little effort.},
added-at = {2017-06-22T09:40:04.000+0200},
author = {Schaer, Philipp and Neumann, Mandy},
biburl = {https://www.bibsonomy.org/bibtex/2db9281e7acfb71e289757a01bb25fdbe/schaer},
booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11-14, 2017, Proceedings},
description = {Enriching Existing Test Collections with OXPath},
editor = {Jones, Gareth J. F. and Lawless, Séamus and Gonzalo, Julio and Kelly, Liadh and Goeuriot, Lorraine and Mandl, Thomas and Cappellato, Linda and Nicola, Ferro},
interhash = {0a94df0a4052a5ab566d3394d36a74ff},
intrahash = {db9281e7acfb71e289757a01bb25fdbe},
keywords = {myown neumann schaer sh2},
pdf = {https://arxiv.org/pdf/1706.06836.pdf},
series = {Lecture Notes in Computer Science},
timestamp = {2018-08-13T16:14:09.000+0200},
title = {Enriching Existing Test Collections with OXPath},
url = {http://arxiv.org/abs/1706.06836},
volume = 10456,
year = 2017
}