Abstract
Discriminative neural networks offer little or no performance guarantees when
deployed on data not generated by the same process as the training
distribution. On such out-of-distribution (OOD) inputs, the prediction may not
only be erroneous, but confidently so, limiting the safe deployment of
classifiers in real-world applications. One such challenging application is
bacteria identification based on genomic sequences, which holds the promise of
early detection of diseases, but requires a model that can output low
confidence predictions on OOD genomic sequences from new bacteria that were not
present in the training data. We introduce a genomics dataset for OOD detection
that allows other researchers to benchmark progress on this important problem.
We investigate deep generative model based approaches for OOD detection and
observe that the likelihood score is heavily affected by population level
background statistics. We propose a likelihood ratio method for deep generative
models which effectively corrects for these confounding background statistics.
We benchmark the OOD detection performance of the proposed method against
existing approaches on the genomics dataset and show that our method achieves
state-of-the-art performance. We demonstrate the generality of the proposed
method by showing that it significantly improves OOD detection when applied to
deep generative models of images.
Users
Please
log in to take part in the discussion (add own reviews or comments).