Zusammenfassung
We advocate an optimization-centric view on and introduce a novel
generalization of Bayesian inference. Our inspiration is the representation of
Bayes' rule as infinite-dimensional optimization problem (Csiszar, 1975;
Donsker and Varadhan; 1975, Zellner; 1988). First, we use it to prove an
optimality result of standard Variational Inference (VI): Under the proposed
view, the standard Evidence Lower Bound (ELBO) maximizing VI posterior is
preferable to alternative approximations of the Bayesian posterior. Next, we
argue for generalizing standard Bayesian inference. The need for this arises in
situations of severe misalignment between reality and three assumptions
underlying standard Bayesian inference: (1) Well-specified priors, (2)
well-specified likelihoods, (3) the availability of infinite computing power.
Our generalization addresses these shortcomings with three arguments and is
called the Rule of Three (RoT). We derive it axiomatically and recover existing
posteriors as special cases, including the Bayesian posterior and its
approximation by standard VI. In contrast, approximations based on alternative
ELBO-like objectives violate the axioms. Finally, we study a special case of
the RoT that we call Generalized Variational Inference (GVI). GVI posteriors
are a large and tractable family of belief distributions specified by three
arguments: A loss, a divergence and a variational family. GVI posteriors have
appealing properties, including consistency and an interpretation as
approximate ELBO. The last part of the paper explores some attractive
applications of GVI in popular machine learning models, including robustness
and more appropriate marginals. After deriving black box inference schemes for
GVI posteriors, their predictive performance is investigated on Bayesian Neural
Networks and Deep Gaussian Processes, where GVI can comprehensively improve
upon existing methods.
Nutzer