@gerhardgossen

iCrawl: An integrated focused crawling toolbox for Web Science

. PhD seminar “Web Archiving and Archived Web — a new Research Method, a new Object of Study?”, Danish Digital Humanities Lab/NetLab, (June 2014)

Abstract

Within the scientific community an increasing interest in using Web content for research can be observed. Especially the Social Web is attractive for many humanities disciplines as it provides direct access to thoughts of many people about politics, popular topics and events. Documenting the activities on the Web and Social Web in Web archives facilitates better understanding of the public perception. However, state-of-the-art Web archive crawler like Heritrix have significant limitations in terms of usability, functionality and maintenance with regard to the needs of the scientific community. The iCrawl project aims to provide an integrated crawling toolbox with an intuitive, flexible and extensible set of Web crawling components.

Links and resources

Tags