Article,

A weakly informative default prior distribution for logistic and other regression models

A. Gelman, A. Jakulin, M. Pittau, and Y. Su.
The Annals of Applied Statistics, 2 (4): 1360--1383 (2008)
DOI: 10.1214/08-AOAS191

Abstract

We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Crossvalidation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation. We implement a procedure to fit generalized linear models in R with the Student-t prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set.

BibTeX key: gelman_weakly_2008
entry type: article
year: 2008
journal: The Annals of Applied Statistics
number: 4
pages: 1360--1383
volume: 2
issn: 1932-6157
DOI: 10.1214/08-AOAS191
urldate: 2012-05-22
url: http://projecteuclid.org/euclid.aoas/1231424214

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 gelman_weakly_2008 %A Gelman, Andrew %A Jakulin, Aleks %A Pittau, Maria Grazia %A Su, Yu-Sung %D 2008 %J The Annals of Applied Statistics %K Bayesian Logistic distribution inference, model, noninformative, prior %N 4 %P 1360--1383 %R 10.1214/08-AOAS191 %T A weakly informative default prior distribution for logistic and other regression models %U http://projecteuclid.org/euclid.aoas/1231424214 %V 2 %X We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Crossvalidation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation. We implement a procedure to fit generalized linear models in R with the Student-t prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set.

@article{gelman_weakly_2008, abstract = {We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Crossvalidation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation. We implement a procedure to fit generalized linear models in R with the Student-t prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set.}, added-at = {2017-01-09T13:57:26.000+0100}, author = {Gelman, Andrew and Jakulin, Aleks and Pittau, Maria Grazia and Su, Yu-Sung}, biburl = {https://www.bibsonomy.org/bibtex/2a72c309f44088cc5851c159c7994aff4/yourwelcome}, doi = {10.1214/08-AOAS191}, interhash = {7515ee9c7f4653dce9056df4907bba4f}, intrahash = {a72c309f44088cc5851c159c7994aff4}, issn = {1932-6157}, journal = {The Annals of Applied Statistics}, keywords = {Bayesian Logistic distribution inference, model, noninformative, prior}, number = 4, pages = {1360--1383}, timestamp = {2017-01-09T14:01:11.000+0100}, title = {A weakly informative default prior distribution for logistic and other regression models}, url = {http://projecteuclid.org/euclid.aoas/1231424214}, urldate = {2012-05-22}, volume = 2, year = 2008 }

BibSonomy

A weakly informative default prior distribution for logistic and other regression models

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on