copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A new method for protecting interrelated time series with Bayesian prior distributions and synthetic data

M. Schneider, and J. Abowd. Journal of the Royal Statistical Society: Series A (Statistics in Society), (2015)
DOI: 10.1111/rssa.12100

Abstract

Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between protection of confidentiality and quality of inference. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The US Census Bureau collects millions of interrelated time series microdata that are hierarchical and contain many 0s and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian generalized linear mixed models with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the magnitudes or number of entities. We find that, as the prior distributions of the variance components in the Bayesian generalized linear mixed model become more precise towards zero, protection of confidentiality increases and the quality of inference deteriorates. We evaluate our methodology by using a strict privacy measure, empirical differential privacy and a newly defined risk measure, the probability of range identification, which directly measures attribute disclosure risk. We illustrate our results with the US Census Bureau's quarterly workforce indicators.

Links and resources

BibTeX key: RSSA:RSSA12100
entry type: article
year: 2015
journal: Journal of the Royal Statistical Society: Series A (Statistics in Society)
pages: n/a--n/a
issn: 1467-985X
DOI: 10.1111/rssa.12100
url: http://dx.doi.org/10.1111/rssa.12100

Cite this publication

%0 Journal Article %1 RSSA:RSSA12100 %A Schneider, Matthew J. %A Abowd, John M. %D 2015 %J Journal of the Royal Statistical Society: Series A (Statistics in Society) %K Administrative Empirical Informative Statistical Synthetic Zero-inflated data, differential disclosure distributions, limitation, mixed models prior privacy, %P n/a--n/a %R 10.1111/rssa.12100 %T A new method for protecting interrelated time series with Bayesian prior distributions and synthetic data %U http://dx.doi.org/10.1111/rssa.12100 %X Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between protection of confidentiality and quality of inference. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The US Census Bureau collects millions of interrelated time series microdata that are hierarchical and contain many 0s and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian generalized linear mixed models with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the magnitudes or number of entities. We find that, as the prior distributions of the variance components in the Bayesian generalized linear mixed model become more precise towards zero, protection of confidentiality increases and the quality of inference deteriorates. We evaluate our methodology by using a strict privacy measure, empirical differential privacy and a newly defined risk measure, the probability of range identification, which directly measures attribute disclosure risk. We illustrate our results with the US Census Bureau's quarterly workforce indicators.

@article{RSSA:RSSA12100, abstract = {Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between protection of confidentiality and quality of inference. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The US Census Bureau collects millions of interrelated time series microdata that are hierarchical and contain many 0s and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian generalized linear mixed models with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the magnitudes or number of entities. We find that, as the prior distributions of the variance components in the Bayesian generalized linear mixed model become more precise towards zero, protection of confidentiality increases and the quality of inference deteriorates. We evaluate our methodology by using a strict privacy measure, empirical differential privacy and a newly defined risk measure, the probability of range identification, which directly measures attribute disclosure risk. We illustrate our results with the US Census Bureau's quarterly workforce indicators.}, added-at = {2016-09-30T21:18:21.000+0200}, author = {Schneider, Matthew J. and Abowd, John M.}, biburl = {https://www.bibsonomy.org/bibtex/23cf59153be68812ff7c34700e74bcb65/ncrn-cornell}, doi = {10.1111/rssa.12100}, interhash = {603d913c412cdc1394ba08893d75777c}, intrahash = {3cf59153be68812ff7c34700e74bcb65}, issn = {1467-985X}, journal = {Journal of the Royal Statistical Society: Series A (Statistics in Society)}, keywords = {Administrative Empirical Informative Statistical Synthetic Zero-inflated data, differential disclosure distributions, limitation, mixed models prior privacy,}, pages = {n/a--n/a}, timestamp = {2016-09-30T21:18:21.000+0200}, title = {A new method for protecting interrelated time series with Bayesian prior distributions and synthetic data}, url = {http://dx.doi.org/10.1111/rssa.12100}, year = 2015 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A new method for protecting interrelated time series with Bayesian prior distributions and synthetic data

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML A new method for protecting interrelated time series with Bayesian prior distributions and synthetic data

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A new method for protecting interrelated time series with Bayesian prior distributions and synthetic data

Comments and Reviews
(0)