Article,

Measuring Information Reduction caused by Anonymization Methods in NEPS Scientific Use Files

, and .
NEPS Working Paper, (2016)

Abstract

The National Educational Panel Study (NEPS) is conducted as a very large panel study, collecting data from six Starting Cohorts in different age ranges. Primarily, NEPS is designed as an infrastructural service, providing the collected information to researchers as Scientific Use Files (SUF). Those SUFs are disseminated via three different access modes: Researchers may work with the data Onsite at our facility, they may use our remote access technology RemoteNEPS, or Download the data from our website to their local workstation. This strategy is used to protect more sensitive information by more secure access ways, which means every access way offers a specific SUF version, each containing a different amount of information. This is done by modifying the data later provided for Download and Remote usage, reducing the information contained to a more anonymous level (e.g., topcoding some variables), and thus being more appropriate for this level. In this paper, we try to measure those differences, that is, to determine a measurement of information difference, by quantifying the information loss when comparing the data. We do this following three approaches: (1) counting the amount of variables affected by anonymization, (2) evaluating the methods applied by an heuristic approach, and (3) measuring the difference of the empirical data. It turns out that by referring to the Onsite SUF versions as 100% (i.e., the full information is accessible here), on average between 74% and 87% of information is preserved in the Download, and more than 97% in the Remote versions.

Tags

Users

  • @neps.dc

Comments and Reviews