@c.schoech

Improving Burrows’ Delta – An empirical evaluation of text distance measures

, , , and . Book of Abstracts of the Digital Humanities Conference 2015, ADHO, UWS, (2015)

Abstract

Since John Burrows first proposed Delta as a new stylometric measure (Burrows 2002), it has become one of the most robust distance measures for authorship attribution (Juola 2006, Stamatatos 2009, Koppel, Schler and Argamon 2009). It has been shown to render very useful results in different text genres (Hoover 2004a) and languages (Eder and Rybicki 2013). Nowadays, Delta is widely used not the least because there is the free stylo package in R (Eder, Kestemont and Rybicki 2013). There have been several proposals to improve Delta (Hoover 2004b, Argamon 2008, Eder, Kestemont and Rybicki 2013, Smith and Aldridge 2011). In the following, we report on a series of experiments to test these proposals using collections of novels in three languages. Our results will show that one of Hoover’s and one of Argamon’s measures show good results, but are outperformed in general by Burrows’ Delta and by Eder’s Delta. The modification of Delta proposed by Smith and Aldridge, on the other hand, shows a remarkable improvement of the results in all languages and has the advantage of providing a stable increase of performance up to a specific point, unlike the other measures which are very sensitive to the amount of most frequent words (mfw) used. These results also allow to discuss some of the theoretical assumptions for the success of these measures, even if we are still far away of providing a “compelling theoretical justification” (Hoover 2005) for their success.

Links and resources

Tags

community