Inbook,

Applied Large-Scale Data Edition

.
page 649--667. (2016)
DOI: 10.1007/978-3-658-11994-2_36

Abstract

The dissemination of a huge collection of empirical data within the complex study framework of the National Educational Panel Study (NEPS) makes the collaborative and systematic preparation of the data indispensable. Both building up a collaborative infrastructure and committing all coworkers to principles that guide the data-preparation process are therefore crucial. In addition to leaving reported data unchanged and organizing the editing process in intermediate steps, the core principle is replicability, which is achieved via a completely syntax-based procedure using Stata®. The syntax elements of all collaborators are systematically linked to each other so that, in the last run, one press of a button generates all the scientific use data. This approach has two major advantages: It forces the staff to extensively document the process in order to make it comprehensible both at later points in time and for colleagues and reviewers. In addition, it facilitates the writing of generalized syntax, which can be reused across multiple editing projects. These guiding principles are supported by a technical framework to carry out data editing collaboratively. We came to organize the collaborative infrastructure by methods originating from software-development environments. The most important part of the infrastructure is a distributed version-control program, which enables us to keep track of any changes in syntax files. The writing of generalized syntax has resulted in an exhaustive library of additional Stata® subroutines for data editing. Due to their generality, these subroutines are shared with the scientific community to a large extent, providing data managers worldwide with convenient tools for their work in several fields of application. Furthermore, we pursue a strategy to involve all NEPS researchers in quality control. This is achieved by releasing early versions (comparable with “milestones”), enabling all other NEPS members to quickly evaluate the results of data editing during the process. An important advantage of this approach is that the data are carefully examined by many researchers before their final release to the scientific community. This process enhances the data quality in an invaluable manner.

Tags

Users

  • @knutwenzig
  • @dirtyhawk

Comments and Reviews