iCrawl: An integrated focused crawling toolbox for Web Science
G. Gossen. PhD seminar “Web Archiving and Archived Web — a new Research Method, a new Object of Study?”, Danish Digital Humanities Lab/NetLab, (June 2014)
Abstract
Within the scientific community an increasing interest in using Web content for research can be observed. Especially the Social Web is attractive for many humanities disciplines as it provides direct access to thoughts of many people about politics, popular topics and events. Documenting the activities on the Web and Social Web in Web archives facilitates better understanding of the public perception. However, state-of-the-art Web archive crawler like Heritrix have significant limitations
in terms of usability, functionality and maintenance with regard to the needs of the scientific community. The iCrawl project aims to provide an integrated crawling
toolbox with an intuitive, flexible and extensible set of Web crawling components.
%0 Conference Paper
%1 gossen2014phdSeminar
%A Gossen, Gerhard
%B PhD seminar “Web Archiving and Archived Web — a new Research Method, a new Object of Study?”
%D 2014
%E Brügger, Niels
%K alexandria crawling icrawl myown
%T iCrawl: An integrated focused crawling toolbox for Web Science
%U http://www.netlab.dk/courses/
%X Within the scientific community an increasing interest in using Web content for research can be observed. Especially the Social Web is attractive for many humanities disciplines as it provides direct access to thoughts of many people about politics, popular topics and events. Documenting the activities on the Web and Social Web in Web archives facilitates better understanding of the public perception. However, state-of-the-art Web archive crawler like Heritrix have significant limitations
in terms of usability, functionality and maintenance with regard to the needs of the scientific community. The iCrawl project aims to provide an integrated crawling
toolbox with an intuitive, flexible and extensible set of Web crawling components.
@inproceedings{gossen2014phdSeminar,
abstract = {Within the scientific community an increasing interest in using Web content for research can be observed. Especially the Social Web is attractive for many humanities disciplines as it provides direct access to thoughts of many people about politics, popular topics and events. Documenting the activities on the Web and Social Web in Web archives facilitates better understanding of the public perception. However, state-of-the-art Web archive crawler like Heritrix have significant limitations
in terms of usability, functionality and maintenance with regard to the needs of the scientific community. The iCrawl project aims to provide an integrated crawling
toolbox with an intuitive, flexible and extensible set of Web crawling components.},
added-at = {2014-11-28T10:27:58.000+0100},
author = {Gossen, Gerhard},
biburl = {https://www.bibsonomy.org/bibtex/2f93c804b6528306cf90555f7a4fc4515/gerhardgossen},
booktitle = {PhD seminar “Web Archiving and Archived Web — a new Research Method, a new Object of Study?”},
editor = {Brügger, Niels},
interhash = {5db54bde0364d9b192e4dc1b90b2f120},
intrahash = {f93c804b6528306cf90555f7a4fc4515},
keywords = {alexandria crawling icrawl myown},
month = jun,
organization = {Danish Digital Humanities Lab/NetLab},
timestamp = {2014-11-28T10:27:58.000+0100},
title = {iCrawl: An integrated focused crawling toolbox for Web Science},
url = {http://www.netlab.dk/courses/},
year = 2014
}