T. Zrnic, and M. Hardt. (2019)cite arxiv:1901.11143Comment: 22 pages.
Abstract
Adaptive data analysis is frequently criticized for its pessimistic
generalization guarantees. The source of these pessimistic bounds is a model
that permits arbitrary, possibly adversarial analysts that optimally use
information to bias results. While being a central issue in the field, still
lacking are notions of natural analysts that allow for more optimistic bounds
faithful to the reality that typical analysts aren't adversarial.
In this work, we propose notions of natural analysts that smoothly
interpolate between the optimal non-adaptive bounds and the best-known adaptive
generalization bounds. To accomplish this, we model the analyst's knowledge as
evolving according to the rules of an unknown dynamical system that takes in
revealed information and outputs new statistical queries to the data. This
allows us to restrict the analyst through different natural control-theoretic
notions. One such notion corresponds to a recency bias, formalizing an
inability to arbitrarily use distant information. Another complementary notion
formalizes an anchoring bias, a tendency to weight initial information more
strongly. Both notions come with quantitative parameters that smoothly
interpolate between the non-adaptive case and the fully adaptive case, allowing
for a rich spectrum of intermediate analysts that are neither non-adaptive nor
adversarial.
Natural not only from a cognitive perspective, we show that our notions also
capture standard optimization methods, like gradient descent in various
settings. This gives a new interpretation to the fact that gradient descent
tends to overfit much less than its adaptive nature might suggest.
Description
[1901.11143] Natural Analysts in Adaptive Data Analysis
%0 Journal Article
%1 zrnic2019natural
%A Zrnic, Tijana
%A Hardt, Moritz
%D 2019
%K bounds generalization machine-learning mathematics probability stats theory
%T Natural Analysts in Adaptive Data Analysis
%U http://arxiv.org/abs/1901.11143
%X Adaptive data analysis is frequently criticized for its pessimistic
generalization guarantees. The source of these pessimistic bounds is a model
that permits arbitrary, possibly adversarial analysts that optimally use
information to bias results. While being a central issue in the field, still
lacking are notions of natural analysts that allow for more optimistic bounds
faithful to the reality that typical analysts aren't adversarial.
In this work, we propose notions of natural analysts that smoothly
interpolate between the optimal non-adaptive bounds and the best-known adaptive
generalization bounds. To accomplish this, we model the analyst's knowledge as
evolving according to the rules of an unknown dynamical system that takes in
revealed information and outputs new statistical queries to the data. This
allows us to restrict the analyst through different natural control-theoretic
notions. One such notion corresponds to a recency bias, formalizing an
inability to arbitrarily use distant information. Another complementary notion
formalizes an anchoring bias, a tendency to weight initial information more
strongly. Both notions come with quantitative parameters that smoothly
interpolate between the non-adaptive case and the fully adaptive case, allowing
for a rich spectrum of intermediate analysts that are neither non-adaptive nor
adversarial.
Natural not only from a cognitive perspective, we show that our notions also
capture standard optimization methods, like gradient descent in various
settings. This gives a new interpretation to the fact that gradient descent
tends to overfit much less than its adaptive nature might suggest.
@article{zrnic2019natural,
abstract = {Adaptive data analysis is frequently criticized for its pessimistic
generalization guarantees. The source of these pessimistic bounds is a model
that permits arbitrary, possibly adversarial analysts that optimally use
information to bias results. While being a central issue in the field, still
lacking are notions of natural analysts that allow for more optimistic bounds
faithful to the reality that typical analysts aren't adversarial.
In this work, we propose notions of natural analysts that smoothly
interpolate between the optimal non-adaptive bounds and the best-known adaptive
generalization bounds. To accomplish this, we model the analyst's knowledge as
evolving according to the rules of an unknown dynamical system that takes in
revealed information and outputs new statistical queries to the data. This
allows us to restrict the analyst through different natural control-theoretic
notions. One such notion corresponds to a recency bias, formalizing an
inability to arbitrarily use distant information. Another complementary notion
formalizes an anchoring bias, a tendency to weight initial information more
strongly. Both notions come with quantitative parameters that smoothly
interpolate between the non-adaptive case and the fully adaptive case, allowing
for a rich spectrum of intermediate analysts that are neither non-adaptive nor
adversarial.
Natural not only from a cognitive perspective, we show that our notions also
capture standard optimization methods, like gradient descent in various
settings. This gives a new interpretation to the fact that gradient descent
tends to overfit much less than its adaptive nature might suggest.},
added-at = {2019-05-31T16:21:09.000+0200},
author = {Zrnic, Tijana and Hardt, Moritz},
biburl = {https://www.bibsonomy.org/bibtex/237ea2c19c7bc19a0c351772d8cca01d4/kirk86},
description = {[1901.11143] Natural Analysts in Adaptive Data Analysis},
interhash = {2f903bf5bb8cf1d1fd0ce406991645c2},
intrahash = {37ea2c19c7bc19a0c351772d8cca01d4},
keywords = {bounds generalization machine-learning mathematics probability stats theory},
note = {cite arxiv:1901.11143Comment: 22 pages},
timestamp = {2019-05-31T16:21:09.000+0200},
title = {Natural Analysts in Adaptive Data Analysis},
url = {http://arxiv.org/abs/1901.11143},
year = 2019
}