Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme
S. Syed, A. Bouchard-Côté, G. Deligiannidis, and A. Doucet. (2019)cite arxiv:1905.02939Comment: 61 pages, 18 figures. The method is implemented in an open source probabilistic programming available at https://github.com/UBC-Stat-ML/blangSDK.
Abstract
Parallel tempering (PT) methods are a popular class of Markov chain Monte
Carlo schemes used to explore complex high-dimensional probability
distributions. These algorithms can be highly effective but their performance
is contingent on the selection of a suitable annealing schedule. In this work,
we provide a new perspective on PT algorithms and their tuning, based on two
main insights. First, we identify and formalize a sharp divide in the behaviour
and performance of reversible versus non-reversible PT methods. Second, we
analyze the behaviour of PT algorithms using a novel asymptotic regime in which
the number of parallel compute cores goes to infinity. Based on this approach
we show that a class of non-reversible PT methods dominates its reversible
counterpart and identify distinct scaling limits for the non-reversible and
reversible schemes, the former being a piecewise-deterministic Markov process
(PDMP) and the latter a diffusion. In particular, we identify a class of
non-reversible PT algorithms which is provably scalable to massive parallel
implementation, in contrast to reversible PT algorithms, which are known to
collapse in the massive parallel regime. We then bring these theoretical tools
to bear on the development of novel methodologies. We develop an adaptive
non-reversible PT scheme which estimates the event rate of the limiting PDMP
and uses this estimated rate to approximate the optimal annealing schedule. We
provide a wide range of numerical examples supporting and extending our
theoretical and methodological contributions. Our adaptive non-reversible PT
method outperforms experimentally state-of-the-art PT methods in terms of
taking less time to adapt, as well as providing better target approximations.
Our scheme has no tuning parameters and appears in our simulations robust to
violations of the theoretical assumption used to carry out our analysis.
Description
[1905.02939] Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme
cite arxiv:1905.02939Comment: 61 pages, 18 figures. The method is implemented in an open source probabilistic programming available at https://github.com/UBC-Stat-ML/blangSDK
%0 Journal Article
%1 syed2019nonreversible
%A Syed, Saifuddin
%A Bouchard-Côté, Alexandre
%A Deligiannidis, George
%A Doucet, Arnaud
%D 2019
%K distributed mcmc parallel
%T Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme
%U http://arxiv.org/abs/1905.02939
%X Parallel tempering (PT) methods are a popular class of Markov chain Monte
Carlo schemes used to explore complex high-dimensional probability
distributions. These algorithms can be highly effective but their performance
is contingent on the selection of a suitable annealing schedule. In this work,
we provide a new perspective on PT algorithms and their tuning, based on two
main insights. First, we identify and formalize a sharp divide in the behaviour
and performance of reversible versus non-reversible PT methods. Second, we
analyze the behaviour of PT algorithms using a novel asymptotic regime in which
the number of parallel compute cores goes to infinity. Based on this approach
we show that a class of non-reversible PT methods dominates its reversible
counterpart and identify distinct scaling limits for the non-reversible and
reversible schemes, the former being a piecewise-deterministic Markov process
(PDMP) and the latter a diffusion. In particular, we identify a class of
non-reversible PT algorithms which is provably scalable to massive parallel
implementation, in contrast to reversible PT algorithms, which are known to
collapse in the massive parallel regime. We then bring these theoretical tools
to bear on the development of novel methodologies. We develop an adaptive
non-reversible PT scheme which estimates the event rate of the limiting PDMP
and uses this estimated rate to approximate the optimal annealing schedule. We
provide a wide range of numerical examples supporting and extending our
theoretical and methodological contributions. Our adaptive non-reversible PT
method outperforms experimentally state-of-the-art PT methods in terms of
taking less time to adapt, as well as providing better target approximations.
Our scheme has no tuning parameters and appears in our simulations robust to
violations of the theoretical assumption used to carry out our analysis.
@article{syed2019nonreversible,
abstract = {Parallel tempering (PT) methods are a popular class of Markov chain Monte
Carlo schemes used to explore complex high-dimensional probability
distributions. These algorithms can be highly effective but their performance
is contingent on the selection of a suitable annealing schedule. In this work,
we provide a new perspective on PT algorithms and their tuning, based on two
main insights. First, we identify and formalize a sharp divide in the behaviour
and performance of reversible versus non-reversible PT methods. Second, we
analyze the behaviour of PT algorithms using a novel asymptotic regime in which
the number of parallel compute cores goes to infinity. Based on this approach
we show that a class of non-reversible PT methods dominates its reversible
counterpart and identify distinct scaling limits for the non-reversible and
reversible schemes, the former being a piecewise-deterministic Markov process
(PDMP) and the latter a diffusion. In particular, we identify a class of
non-reversible PT algorithms which is provably scalable to massive parallel
implementation, in contrast to reversible PT algorithms, which are known to
collapse in the massive parallel regime. We then bring these theoretical tools
to bear on the development of novel methodologies. We develop an adaptive
non-reversible PT scheme which estimates the event rate of the limiting PDMP
and uses this estimated rate to approximate the optimal annealing schedule. We
provide a wide range of numerical examples supporting and extending our
theoretical and methodological contributions. Our adaptive non-reversible PT
method outperforms experimentally state-of-the-art PT methods in terms of
taking less time to adapt, as well as providing better target approximations.
Our scheme has no tuning parameters and appears in our simulations robust to
violations of the theoretical assumption used to carry out our analysis.},
added-at = {2019-05-20T13:20:02.000+0200},
author = {Syed, Saifuddin and Bouchard-Côté, Alexandre and Deligiannidis, George and Doucet, Arnaud},
biburl = {https://www.bibsonomy.org/bibtex/2e62ec6b65666d3c2413832af3de486e0/kirk86},
description = {[1905.02939] Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme},
interhash = {79383e41d212748e998dcc7a47e141a6},
intrahash = {e62ec6b65666d3c2413832af3de486e0},
keywords = {distributed mcmc parallel},
note = {cite arxiv:1905.02939Comment: 61 pages, 18 figures. The method is implemented in an open source probabilistic programming available at https://github.com/UBC-Stat-ML/blangSDK},
timestamp = {2019-05-20T13:20:02.000+0200},
title = {Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme},
url = {http://arxiv.org/abs/1905.02939},
year = 2019
}