Abstract
Data clustering, including problems such as finding network communities, can
be put into a systematic framework by means of a Bayesian approach. The
application of Bayesian approaches to real problems can be, however, quite
challenging. In most cases the solution is explored via Monte Carlo sampling or
variational methods. Here we work further on the application of variational
methods to clustering problems. We introduce generative models based on a
hidden group structure and prior distributions. We extend previous attends by
Jaynes, and derive the prior distributions based on symmetry arguments. As a
case study we address the problems of two-sides clustering real value data and
clustering data represented by a hypergraph or bipartite graph. From the
variational calculations, and depending on the starting statistical model for
the data, we derive a variational Bayes algorithm, a generalized version of the
expectation maximization algorithm with a built in penalization for model
complexity or bias. We demonstrate the good performance of the variational
Bayes algorithm using test examples.
Users
Please
log in to take part in the discussion (add own reviews or comments).