The NetarchiveSuite is a complete web archiving software package developed from 2004 and onwards. The primary function of the NetarchiveSuite is to plan, schedule and run web harvests of parts of the Internet. It scales to a wide range of tasks, from small, thematic harvests (e.g. related to special events, or special domains) to harvesting and archiving the content of an entire national domain. The software has built-in bit preservation functionality. The systems architecture allows for the software to be distributed among several machines, possibly on more than one geographical location. The NetarchiveSuite is built around the Heritrix web crawler, which it uses to harvest the web.
Linguistic Inquiry and Word Count (LIWC) is a text analysis software program designed by James W. Pennebaker, Roger J. Booth, and Martha E. Francis. LIWC calculates the degree to which people use different categories of words across a wide array of texts, including emails, speeches, poems, or transcribed daily speech. With a click of a button, you can determine the degree any text uses positive or negative emotions, self-references, causal words, and 70 other language dimensions.
a place for information technologists, archivists, engineers, librarians, computer scientists, curators, web developers and others to help each other make best use of tools, techniques, processes, workflows, practices and approaches to insuring long term access to digital information. Broadly speaking, the goal of this site is to create an easy to use public knowledge base to support digital preservation. This is a joint project of the Open Planets Foundation and the National Digital Stewardship Alliance (NDSA) Innovation working group.
N. Gray, T. Carozzi, and G. Woan. (2012)cite arxiv:1207.3923 Comment: Project final report, 45 pages: see http://purl.org/nxg/projects/mrd-gw for project details, and http://purl.org/nxg/projects/mrd-gw/report for other document versions.