Abstract
Real-world large-scale datasets are heteroskedastic and imbalanced -- labels
have varying levels of uncertainty and label distributions are long-tailed.
Heteroskedasticity and imbalance challenge deep learning algorithms due to the
difficulty of distinguishing among mislabeled, ambiguous, and rare examples.
Addressing heteroskedasticity and imbalance simultaneously is under-explored.
We propose a data-dependent regularization technique for heteroskedastic
datasets that regularizes different regions of the input space differently.
Inspired by the theoretical derivation of the optimal regularization strength
in a one-dimensional nonparametric classification setting, our approach
adaptively regularizes the data points in higher-uncertainty, lower-density
regions more heavily. We test our method on several benchmark tasks, including
a real-world heteroskedastic and imbalanced dataset, WebVision. Our experiments
corroborate our theory and demonstrate a significant improvement over other
methods in noise-robust deep learning.
Users
Please
log in to take part in the discussion (add own reviews or comments).