Rates of Convergence for Sparse Variational Gaussian Process Regression
D. Burt, C. Rasmussen, and M. Van Der Wilk. Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, page 862--871. Long Beach, California, USA, PMLR, (09--15 Jun 2019)
Abstract
Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $Ołeft(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $Ołeft(NM^2\right)$, with $MN$ the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, $M=O(łog^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.
Description
Rates of Convergence for Sparse Variational Gaussian Process Regression
%0 Conference Paper
%1 pmlr-v97-burt19a
%A Burt, David
%A Rasmussen, Carl Edward
%A Van Der Wilk, Mark
%B Proceedings of the 36th International Conference on Machine Learning
%C Long Beach, California, USA
%D 2019
%E Chaudhuri, Kamalika
%E Salakhutdinov, Ruslan
%I PMLR
%K convergence gaussian-proceses readings sparsity variational
%P 862--871
%T Rates of Convergence for Sparse Variational Gaussian Process Regression
%U http://proceedings.mlr.press/v97/burt19a.html
%V 97
%X Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $Ołeft(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $Ołeft(NM^2\right)$, with $MN$ the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, $M=O(łog^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.
@inproceedings{pmlr-v97-burt19a,
abstract = {Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ the number of \emph{inducing variables}, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.},
added-at = {2020-07-16T12:13:53.000+0200},
address = {Long Beach, California, USA},
author = {Burt, David and Rasmussen, Carl Edward and Van Der Wilk, Mark},
biburl = {https://www.bibsonomy.org/bibtex/2149185b850836a99ee468b4d53ef0182/kirk86},
booktitle = {Proceedings of the 36th International Conference on Machine Learning},
description = {Rates of Convergence for Sparse Variational Gaussian Process Regression},
editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
interhash = {31df05d6ccbe4ce1f1d0e699832802fb},
intrahash = {149185b850836a99ee468b4d53ef0182},
keywords = {convergence gaussian-proceses readings sparsity variational},
month = {09--15 Jun},
pages = {862--871},
pdf = {http://proceedings.mlr.press/v97/burt19a/burt19a.pdf},
publisher = {PMLR},
series = {Proceedings of Machine Learning Research},
timestamp = {2020-07-16T12:13:53.000+0200},
title = {Rates of Convergence for Sparse Variational {G}aussian Process Regression},
url = {http://proceedings.mlr.press/v97/burt19a.html},
volume = 97,
year = 2019
}