Abstract
Gaussian processes (GPs) exhibit a classic tension of many machine learning
methods: they possess desirable modelling capabilities yet suffer from im-
portant practical limitations. In many instances, GPs are able to offer well-
calibrated uncertainty estimates, interpretable predictions, and the ability to en-
code prior knowledge. These properties have made them an indispensable tool
for black-box optimization, time series forecasting, and high-risk applications
like health care. Despite these benefits, GPs are typically not applied to datasets
with more than a few thousand data points. This is in part due to an inference
procedure that requires matrix inverses, determinants, and other expensive op-
erations. Moreover, specialty models often require significant implementation
efforts.
This thesis aims to alleviate these practical concerns through a single sim-
ple design decision. Taking inspiration from neural network libraries, we con-
struct GP inference algorithms using only matrix-vector multiplications (MVMs)
and other linear operations. This MVM-based approach simultaneously ad-
dress several of these practical concerns: it reduces asymptotic complexity, ef-
fectively utilizes GPU hardware, and provides straight-forward implementa-
tions for many specialty GP models.
The chapters of this thesis each address a different aspect of Gaussian pro-
cess inference. Chapter 3 introduces a MVM method for training Gaussianprocess regression models (i.e. optimizing kernel/likelihood hyperparameters).
This approach unifies several existing methods into a highly-parallel and stable
algorithm. Chapter 4 focuses on making predictions with Gaussian processes. A
memory-efficient cache, which can be computed through MVMs, significantly
reduces the computation of predictive distributions. Chapter 5 introduces a
multi-purpose MVM algorithm that can be used to draw samples from GP pos-
teriors and perform approximate Gaussian process inference. All three of these
methods offer speedups ranging from 4× to 40×. Importantly, applying any of
these algorithms to specialty models (e.g. multitask GPs and scalable approx-
imations) simply requires a matrix-vector multiplication routine that exploits
covariance structure afforded by the model.
The MVM methods from this thesis form the building blocks of the
GPyTorch library, an open-sourced GP implementation designed for scalability
and simple implementations. In the final chapter, we evaluate GPyTorch models
on several large-scale regression datasets. Using the proposed MVM methods,
we can apply exact Gaussian processes to datasets that are 2 orders of magnitude
larger than what has previously been reported—up to 1 million data points.
Users
Please
log in to take part in the discussion (add own reviews or comments).