Article,

Which Distribution Distances are Sublinearly Testable?

C. Daskalakis, G. Kamath, and J. Wright.
(2017)cite arxiv:1708.00002Comment: To appear in SODA 2018.

Abstract

Given samples from an unknown distribution $p$ and a description of a distribution $q$, are $p$ and $q$ close or far? This question of "identity testing" has received significant attention in the case of testing whether $p$ and $q$ are equal or far in total variation distance. However, in recent work, the following questions have been been critical to solving problems at the frontiers of distribution testing: -Alternative Distances: Can we test whether $p$ and $q$ are far in other distances, say Hellinger? -Tolerance: Can we test when $p$ and $q$ are close, rather than equal? And if so, close in which distances? Motivated by these questions, we characterize the complexity of distribution testing under a variety of distances, including total variation, $\ell_2$, Hellinger, Kullback-Leibler, and $\chi^2$. For each pair of distances $d_1$ and $d_2$, we study the complexity of testing if $p$ and $q$ are close in $d_1$ versus far in $d_2$, with a focus on identifying which problems allow strongly sublinear testers (i.e., those with complexity $O(n^1 - \gamma)$ for some $> 0$ where $n$ is the size of the support of the distributions $p$ and $q$). We provide matching upper and lower bounds for each case. We also study these questions in the case where we only have samples from $q$ (equivalence testing), showing qualitative differences from identity testing in terms of when tolerance can be achieved. Our algorithms fall into the classical paradigm of $\chi^2$-statistics, but require crucial changes to handle the challenges introduced by each distance we consider. Finally, we survey other recent results in an attempt to serve as a reference for the complexity of various distribution testing problems.

BibTeX key: daskalakis2017which
entry type: article
year: 2017
url: http://arxiv.org/abs/1708.00002
note: cite arxiv:1708.00002Comment: To appear in SODA 2018

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{daskalakis2017which, abstract = {Given samples from an unknown distribution $p$ and a description of a distribution $q$, are $p$ and $q$ close or far? This question of "identity testing" has received significant attention in the case of testing whether $p$ and $q$ are equal or far in total variation distance. However, in recent work, the following questions have been been critical to solving problems at the frontiers of distribution testing: -Alternative Distances: Can we test whether $p$ and $q$ are far in other distances, say Hellinger? -Tolerance: Can we test when $p$ and $q$ are close, rather than equal? And if so, close in which distances? Motivated by these questions, we characterize the complexity of distribution testing under a variety of distances, including total variation, $\ell_2$, Hellinger, Kullback-Leibler, and $\chi^2$. For each pair of distances $d_1$ and $d_2$, we study the complexity of testing if $p$ and $q$ are close in $d_1$ versus far in $d_2$, with a focus on identifying which problems allow strongly sublinear testers (i.e., those with complexity $O(n^{1 - \gamma})$ for some $\gamma > 0$ where $n$ is the size of the support of the distributions $p$ and $q$). We provide matching upper and lower bounds for each case. We also study these questions in the case where we only have samples from $q$ (equivalence testing), showing qualitative differences from identity testing in terms of when tolerance can be achieved. Our algorithms fall into the classical paradigm of $\chi^2$-statistics, but require crucial changes to handle the challenges introduced by each distance we consider. Finally, we survey other recent results in an attempt to serve as a reference for the complexity of various distribution testing problems.}, added-at = {2020-08-07T06:47:11.000+0200}, author = {Daskalakis, Constantinos and Kamath, Gautam and Wright, John}, biburl = {https://www.bibsonomy.org/bibtex/248d71b33f6af2e6d86af38a5ff863426/kirk86}, description = {[1708.00002] Which Distribution Distances are Sublinearly Testable?}, interhash = {e31ca4ac8c3f841e5809dd186af56774}, intrahash = {48d71b33f6af2e6d86af38a5ff863426}, keywords = {probability stats}, note = {cite arxiv:1708.00002Comment: To appear in SODA 2018}, timestamp = {2020-08-07T06:47:11.000+0200}, title = {Which Distribution Distances are Sublinearly Testable?}, url = {http://arxiv.org/abs/1708.00002}, year = 2017 }

BibSonomy

Which Distribution Distances are Sublinearly Testable?

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on