Misc,

Bayesian Approximate Kernel Regression with Variable Selection

L. Crawford, K. Wood, X. Zhou, and S. Mukherjee.
(2015)cite arxiv:1508.01217v2.pdfComment: 32 pages, 3 figures, 3 tables; Supplementary Information upon request; theory added; new applications presented; references added.

Abstract

Nonlinear kernel regression models are often used in statistics and machine learning due to greater accuracy than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an analog of the effect size of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant---for example the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that (1) captures nonlinear structure and (2) can be projected onto the original explanatory variables. The projection onto the original explanatory variables serves as the analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. By adapting some classical results in compressive sensing we state conditions under which BAKR can recover a sparse set of effect sizes, simultaneous variable selection and regression. We illustrate the utility of BAKR by examining, in some detail, two important problems in statistical genetics: genomic selection (predicting phenotype from genotype) and association mapping (inference of significant variables or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings.

BibTeX key: crawford2015bayesian
entry type: misc
year: 2015
url: http://arxiv.org/abs/1508.01217
note: cite arxiv:1508.01217v2.pdfComment: 32 pages, 3 figures, 3 tables; Supplementary Information upon request; theory added; new applications presented; references added

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Generic %1 crawford2015bayesian %A Crawford, Lorin %A Wood, Kris C. %A Zhou, Xiang %A Mukherjee, Sayan %D 2015 %K acreuser bayesian %T Bayesian Approximate Kernel Regression with Variable Selection %U http://arxiv.org/abs/1508.01217 %X Nonlinear kernel regression models are often used in statistics and machine learning due to greater accuracy than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an analog of the effect size of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant---for example the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that (1) captures nonlinear structure and (2) can be projected onto the original explanatory variables. The projection onto the original explanatory variables serves as the analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. By adapting some classical results in compressive sensing we state conditions under which BAKR can recover a sparse set of effect sizes, simultaneous variable selection and regression. We illustrate the utility of BAKR by examining, in some detail, two important problems in statistical genetics: genomic selection (predicting phenotype from genotype) and association mapping (inference of significant variables or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings.

@misc{crawford2015bayesian, abstract = {Nonlinear kernel regression models are often used in statistics and machine learning due to greater accuracy than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an analog of the effect size of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant---for example the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that (1) captures nonlinear structure and (2) can be projected onto the original explanatory variables. The projection onto the original explanatory variables serves as the analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. By adapting some classical results in compressive sensing we state conditions under which BAKR can recover a sparse set of effect sizes, simultaneous variable selection and regression. We illustrate the utility of BAKR by examining, in some detail, two important problems in statistical genetics: genomic selection (predicting phenotype from genotype) and association mapping (inference of significant variables or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings.}, added-at = {2016-04-24T07:37:32.000+0200}, author = {Crawford, Lorin and Wood, Kris C. and Zhou, Xiang and Mukherjee, Sayan}, biburl = {https://www.bibsonomy.org/bibtex/241fa10125ce8ed33159aa019d7a9e39d/pixor}, description = {1508.01217v2.pdf}, interhash = {beca8946177c13cd86b3dc259430b81b}, intrahash = {41fa10125ce8ed33159aa019d7a9e39d}, keywords = {acreuser bayesian}, note = {cite arxiv:1508.01217v2.pdfComment: 32 pages, 3 figures, 3 tables; Supplementary Information upon request; theory added; new applications presented; references added}, timestamp = {2016-04-24T07:37:32.000+0200}, title = {Bayesian Approximate Kernel Regression with Variable Selection}, url = {http://arxiv.org/abs/1508.01217}, year = 2015 }

BibSonomy

Bayesian Approximate Kernel Regression with Variable Selection

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on