Bayesian Entropy Estimation for Countable Discrete Distributions
E. Archer, I. Park, und J. Pillow. (2013)cite arxiv:1302.0328Comment: 38 pages LaTeX. Revised and resubmitted to JMLR.
Zusammenfassung
We consider the problem of estimating Shannon's entropy $H$ from discrete
data, in cases where the number of possible symbols is unknown or even
countably infinite. The Pitman-Yor process, a generalization of Dirichlet
process, provides a tractable prior distribution over the space of countably
infinite discrete distributions, and has found major applications in Bayesian
non-parametric statistics and machine learning. Here we show that it also
provides a natural family of priors for Bayesian entropy estimation, due to the
fact that moments of the induced posterior distribution over $H$ can be
computed analytically. We derive formulas for the posterior mean (Bayes' least
squares estimate) and variance under Dirichlet and Pitman-Yor process priors.
Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a
narrow prior distribution over $H$, meaning the prior strongly determines the
entropy estimate in the under-sampled regime. We derive a family of continuous
mixing measures such that the resulting mixture of Pitman-Yor processes
produces an approximately flat prior over $H$. We show that the resulting
Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of
distributions. We explore the theoretical properties of the resulting
estimator, and show that it performs well both in simulation and in application
to real data.
Beschreibung
[1302.0328] Bayesian Entropy Estimation for Countable Discrete Distributions
%0 Journal Article
%1 archer2013bayesian
%A Archer, Evan
%A Park, Il Memming
%A Pillow, Jonathan
%D 2013
%K bayesian bias entropy
%T Bayesian Entropy Estimation for Countable Discrete Distributions
%U http://arxiv.org/abs/1302.0328
%X We consider the problem of estimating Shannon's entropy $H$ from discrete
data, in cases where the number of possible symbols is unknown or even
countably infinite. The Pitman-Yor process, a generalization of Dirichlet
process, provides a tractable prior distribution over the space of countably
infinite discrete distributions, and has found major applications in Bayesian
non-parametric statistics and machine learning. Here we show that it also
provides a natural family of priors for Bayesian entropy estimation, due to the
fact that moments of the induced posterior distribution over $H$ can be
computed analytically. We derive formulas for the posterior mean (Bayes' least
squares estimate) and variance under Dirichlet and Pitman-Yor process priors.
Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a
narrow prior distribution over $H$, meaning the prior strongly determines the
entropy estimate in the under-sampled regime. We derive a family of continuous
mixing measures such that the resulting mixture of Pitman-Yor processes
produces an approximately flat prior over $H$. We show that the resulting
Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of
distributions. We explore the theoretical properties of the resulting
estimator, and show that it performs well both in simulation and in application
to real data.
@article{archer2013bayesian,
abstract = {We consider the problem of estimating Shannon's entropy $H$ from discrete
data, in cases where the number of possible symbols is unknown or even
countably infinite. The Pitman-Yor process, a generalization of Dirichlet
process, provides a tractable prior distribution over the space of countably
infinite discrete distributions, and has found major applications in Bayesian
non-parametric statistics and machine learning. Here we show that it also
provides a natural family of priors for Bayesian entropy estimation, due to the
fact that moments of the induced posterior distribution over $H$ can be
computed analytically. We derive formulas for the posterior mean (Bayes' least
squares estimate) and variance under Dirichlet and Pitman-Yor process priors.
Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a
narrow prior distribution over $H$, meaning the prior strongly determines the
entropy estimate in the under-sampled regime. We derive a family of continuous
mixing measures such that the resulting mixture of Pitman-Yor processes
produces an approximately flat prior over $H$. We show that the resulting
Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of
distributions. We explore the theoretical properties of the resulting
estimator, and show that it performs well both in simulation and in application
to real data.},
added-at = {2020-03-15T18:57:10.000+0100},
author = {Archer, Evan and Park, Il Memming and Pillow, Jonathan},
biburl = {https://www.bibsonomy.org/bibtex/221289a80d092de076589ccd3837491db/kirk86},
description = {[1302.0328] Bayesian Entropy Estimation for Countable Discrete Distributions},
interhash = {aeef077c22e041411dfce7f52be6e9b2},
intrahash = {21289a80d092de076589ccd3837491db},
keywords = {bayesian bias entropy},
note = {cite arxiv:1302.0328Comment: 38 pages LaTeX. Revised and resubmitted to JMLR},
timestamp = {2020-03-15T18:57:10.000+0100},
title = {Bayesian Entropy Estimation for Countable Discrete Distributions},
url = {http://arxiv.org/abs/1302.0328},
year = 2013
}