Abstract
Compression models represent an interesting approach for different
classification tasks and have been used widely across many research fields. We
adapt compression models to the field of authorship verification (AV), a branch
of digital text forensics. The task in AV is to verify if a questioned document
and a reference document of a known author are written by the same person. We
propose an intrinsic AV method, which yields competitive results compared to a
number of current state-of-the-art approaches, based on support vector machines
or neural networks. However, in contrast to these approaches our method does
not make use of machine learning algorithms, natural language processing
techniques, feature engineering, hyperparameter optimization or external
documents (a common strategy to transform AV from a one-class to a multi-class
classification problem). Instead, the only three key components of our method
are a compressing algorithm, a dissimilarity measure and a threshold, needed to
accept or reject the authorship of the questioned document. Due to its
compactness, our method performs very fast and can be reimplemented with
minimal effort. In addition, the method can handle complicated AV cases where
both, the questioned and the reference document, are not related to each other
in terms of topic or genre. We evaluated our approach against publicly
available datasets, which were used in three international AV competitions.
Furthermore, we constructed our own corpora, where we evaluated our method
against state-of-the-art approaches and achieved, in both cases, promising
results.
Description
Authorship Verification based on Compression-Models
Links and resources
Tags
community