Abstract
Robust statistics traditionally focuses on outliers, or perturbations in
total variation distance. However, a dataset could be corrupted in many other
ways, such as systematic measurement errors and missing covariates. We
generalize the robust statistics approach to consider perturbations under any
Wasserstein distance, and show that robust estimation is possible whenever a
distribution's population statistics are robust under a certain family of
friendly perturbations. This generalizes a property called resilience
previously employed in the special case of mean estimation with outliers. We
justify the generalized resilience property by showing that it holds under
moment or hypercontractive conditions. Even in the total variation case, these
subsume conditions in the literature for mean estimation, regression, and
covariance estimation; the resulting analysis simplifies and sometimes improves
these known results in both population limit and finite-sample rate. Our robust
estimators are based on minimum distance (MD) functionals (Donoho and Liu,
1988), which project onto a set of distributions under a discrepancy related to
the perturbation. We present two approaches for designing MD estimators with
good finite-sample rates: weakening the discrepancy and expanding the set of
distributions. We also present connections to Gao et al. (2019)'s recent
analysis of generative adversarial networks for robust estimation.
Users
Please
log in to take part in the discussion (add own reviews or comments).