Abstract
The National Educational Panel Study (NEPS) is conducted as a very large panel study, collecting data from six Starting Cohorts in different age ranges. Primarily, NEPS is designed as an infrastructural service, providing the collected information to researchers as Scientific Use Files
(SUF). Those SUFs are disseminated via three different access modes: Researchers may work with the data Onsite at our facility, they may use our remote access technology RemoteNEPS, or Download the data from our website to their local workstation. This strategy is used to protect
more sensitive information by more secure access ways, which means every access way offers a specific SUF version, each containing a different amount of information. This is done by modifying
the data later provided for Download and Remote usage, reducing the information contained to a more anonymous level (e.g., topcoding some variables), and thus being more appropriate for this level. In this paper, we try to measure those differences, that is, to determine a measurement
of information difference, by quantifying the information loss when comparing the data. We do this following three approaches: (1) counting the amount of variables affected by anonymization, (2) evaluating the methods applied by an heuristic approach, and (3) measuring the difference of the empirical data. It turns out that by referring to the Onsite SUF versions as 100% (i.e., the full information is accessible here), on average between 74% and 87% of information is preserved
in the Download, and more than 97% in the Remote versions.
Users
Please
log in to take part in the discussion (add own reviews or comments).