@wdees

Data Quality of Digital Process Data

, and . KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie, 74 (1): 407--430 (Jun 1, 2022)
DOI: 10.1007/s11577-022-00840-9

Abstract

Digital process data are becoming increasingly important for social science research, but their quality has been gravely neglected so far. In this article, we adopt a process perspective and argue that data extracted from socio-technical systems are, in principle, subject to the same error-inducing mechanisms as traditional forms of social science data, namely biases that arise before their acquisition (observational design), during their acquisition (data generation), and after their acquisition (data processing). As the lack of access and insight into the actual processes of data production renders key traditional mechanisms of quality assurance largely impossible, it is essential to identify data quality problems in the data available---that is, to focus on the possibilities post-hoc quality assessment offers to us. We advance a post-hoc strategy of data quality assurance, integrating simulation and explorative identification techniques. As a use case, we illustrate this approach with the example of bot activity and the effects this phenomenon can have on digital process data. First, we employ agent-based modelling to simulate datasets containing these data problems. Subsequently, we demonstrate the possibilities and challenges of post-hoc control by mobilizing geometric data analysis, an exemplary technique for identifying data quality issues.

Description

Data Quality of Digital Process Data | SpringerLink

Links and resources

Tags