PhD thesis,

A Scalable and Flexible Framework for Gaussian Processes via Matrix-Vector Multiplication

G. Pleiss.
Cornell University, (2020)

Abstract

Gaussian processes (GPs) exhibit a classic tension of many machine learning methods: they possess desirable modelling capabilities yet suffer from im- portant practical limitations. In many instances, GPs are able to offer well- calibrated uncertainty estimates, interpretable predictions, and the ability to en- code prior knowledge. These properties have made them an indispensable tool for black-box optimization, time series forecasting, and high-risk applications like health care. Despite these benefits, GPs are typically not applied to datasets with more than a few thousand data points. This is in part due to an inference procedure that requires matrix inverses, determinants, and other expensive op- erations. Moreover, specialty models often require significant implementation efforts. This thesis aims to alleviate these practical concerns through a single sim- ple design decision. Taking inspiration from neural network libraries, we con- struct GP inference algorithms using only matrix-vector multiplications (MVMs) and other linear operations. This MVM-based approach simultaneously ad- dress several of these practical concerns: it reduces asymptotic complexity, ef- fectively utilizes GPU hardware, and provides straight-forward implementa- tions for many specialty GP models. The chapters of this thesis each address a different aspect of Gaussian pro- cess inference. Chapter 3 introduces a MVM method for training Gaussianprocess regression models (i.e. optimizing kernel/likelihood hyperparameters). This approach unifies several existing methods into a highly-parallel and stable algorithm. Chapter 4 focuses on making predictions with Gaussian processes. A memory-efficient cache, which can be computed through MVMs, significantly reduces the computation of predictive distributions. Chapter 5 introduces a multi-purpose MVM algorithm that can be used to draw samples from GP pos- teriors and perform approximate Gaussian process inference. All three of these methods offer speedups ranging from 4× to 40×. Importantly, applying any of these algorithms to specialty models (e.g. multitask GPs and scalable approx- imations) simply requires a matrix-vector multiplication routine that exploits covariance structure afforded by the model. The MVM methods from this thesis form the building blocks of the GPyTorch library, an open-sourced GP implementation designed for scalability and simple implementations. In the final chapter, we evaluate GPyTorch models on several large-scale regression datasets. Using the proposed MVM methods, we can apply exact Gaussian processes to datasets that are 2 orders of magnitude larger than what has previously been reported—up to 1 million data points.

BibTeX key: pleiss2020scalable
entry type: phdthesis
year: 2020
school: Cornell University
Document: https://geoffpleiss.com/static/media/gpleiss_thesis.d218bc00.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Thesis %1 pleiss2020scalable %A Pleiss, Geoff %D 2020 %K Gaussian_processes Krylov_methods linear_algebra methods %T A Scalable and Flexible Framework for Gaussian Processes via Matrix-Vector Multiplication %U https://geoffpleiss.com/static/media/gpleiss_thesis.d218bc00.pdf %X Gaussian processes (GPs) exhibit a classic tension of many machine learning methods: they possess desirable modelling capabilities yet suffer from im- portant practical limitations. In many instances, GPs are able to offer well- calibrated uncertainty estimates, interpretable predictions, and the ability to en- code prior knowledge. These properties have made them an indispensable tool for black-box optimization, time series forecasting, and high-risk applications like health care. Despite these benefits, GPs are typically not applied to datasets with more than a few thousand data points. This is in part due to an inference procedure that requires matrix inverses, determinants, and other expensive op- erations. Moreover, specialty models often require significant implementation efforts. This thesis aims to alleviate these practical concerns through a single sim- ple design decision. Taking inspiration from neural network libraries, we con- struct GP inference algorithms using only matrix-vector multiplications (MVMs) and other linear operations. This MVM-based approach simultaneously ad- dress several of these practical concerns: it reduces asymptotic complexity, ef- fectively utilizes GPU hardware, and provides straight-forward implementa- tions for many specialty GP models. The chapters of this thesis each address a different aspect of Gaussian pro- cess inference. Chapter 3 introduces a MVM method for training Gaussianprocess regression models (i.e. optimizing kernel/likelihood hyperparameters). This approach unifies several existing methods into a highly-parallel and stable algorithm. Chapter 4 focuses on making predictions with Gaussian processes. A memory-efficient cache, which can be computed through MVMs, significantly reduces the computation of predictive distributions. Chapter 5 introduces a multi-purpose MVM algorithm that can be used to draw samples from GP pos- teriors and perform approximate Gaussian process inference. All three of these methods offer speedups ranging from 4× to 40×. Importantly, applying any of these algorithms to specialty models (e.g. multitask GPs and scalable approx- imations) simply requires a matrix-vector multiplication routine that exploits covariance structure afforded by the model. The MVM methods from this thesis form the building blocks of the GPyTorch library, an open-sourced GP implementation designed for scalability and simple implementations. In the final chapter, we evaluate GPyTorch models on several large-scale regression datasets. Using the proposed MVM methods, we can apply exact Gaussian processes to datasets that are 2 orders of magnitude larger than what has previously been reported—up to 1 million data points.

@phdthesis{pleiss2020scalable, abstract = {Gaussian processes (GPs) exhibit a classic tension of many machine learning methods: they possess desirable modelling capabilities yet suffer from im- portant practical limitations. In many instances, GPs are able to offer well- calibrated uncertainty estimates, interpretable predictions, and the ability to en- code prior knowledge. These properties have made them an indispensable tool for black-box optimization, time series forecasting, and high-risk applications like health care. Despite these benefits, GPs are typically not applied to datasets with more than a few thousand data points. This is in part due to an inference procedure that requires matrix inverses, determinants, and other expensive op- erations. Moreover, specialty models often require significant implementation efforts. This thesis aims to alleviate these practical concerns through a single sim- ple design decision. Taking inspiration from neural network libraries, we con- struct GP inference algorithms using only matrix-vector multiplications (MVMs) and other linear operations. This MVM-based approach simultaneously ad- dress several of these practical concerns: it reduces asymptotic complexity, ef- fectively utilizes GPU hardware, and provides straight-forward implementa- tions for many specialty GP models. The chapters of this thesis each address a different aspect of Gaussian pro- cess inference. Chapter 3 introduces a MVM method for training Gaussianprocess regression models (i.e. optimizing kernel/likelihood hyperparameters). This approach unifies several existing methods into a highly-parallel and stable algorithm. Chapter 4 focuses on making predictions with Gaussian processes. A memory-efficient cache, which can be computed through MVMs, significantly reduces the computation of predictive distributions. Chapter 5 introduces a multi-purpose MVM algorithm that can be used to draw samples from GP pos- teriors and perform approximate Gaussian process inference. All three of these methods offer speedups ranging from 4× to 40×. Importantly, applying any of these algorithms to specialty models (e.g. multitask GPs and scalable approx- imations) simply requires a matrix-vector multiplication routine that exploits covariance structure afforded by the model. The MVM methods from this thesis form the building blocks of the GPyTorch library, an open-sourced GP implementation designed for scalability and simple implementations. In the final chapter, we evaluate GPyTorch models on several large-scale regression datasets. Using the proposed MVM methods, we can apply exact Gaussian processes to datasets that are 2 orders of magnitude larger than what has previously been reported—up to 1 million data points.}, added-at = {2021-01-22T17:16:27.000+0100}, author = {Pleiss, Geoff}, biburl = {https://www.bibsonomy.org/bibtex/2fb1b59a452ad2e1eff420afab1ee3c11/peter.ralph}, interhash = {23e5fe9308e89e1d457680b21f2f9cc0}, intrahash = {fb1b59a452ad2e1eff420afab1ee3c11}, keywords = {Gaussian_processes Krylov_methods linear_algebra methods}, school = {Cornell University}, timestamp = {2021-01-22T17:16:27.000+0100}, title = {A Scalable and Flexible Framework for {Gaussian} Processes via Matrix-Vector Multiplication}, url = {https://geoffpleiss.com/static/media/gpleiss_thesis.d218bc00.pdf}, year = 2020 }

BibSonomy

A Scalable and Flexible Framework for Gaussian Processes via Matrix-Vector Multiplication

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on