copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Optimal Identity Testing with High Probability

I. Diakonikolas, T. Gouleakis, J. Peebles, and E. Price. (2017)cite arxiv:1708.02728.

Abstract

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $0< \epsilon, < 1$, we wish to distinguish, \em with probability at least $1-\delta$, whether the distributions are identical versus $\varepsilon$-far in total variation distance. Most prior work focused on the case that $= Ømega(1)$, for which the sample complexity of identity testing is known to be $\Theta(n/\epsilon^2)$. Given such an algorithm, one can achieve arbitrarily small values of $\delta$ via black-box amplification, which multiplies the required number of samples by $\Theta(łog(1/\delta))$. We show that black-box amplification is suboptimal for any $= o(1)$, and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is \ \Thetałeft( 1\epsilon^2łeft(n łog(1/\delta) + łog(1/\delta) \right)\right) \ for any $n, \varepsilon$, and $\delta$. For the special case of uniformity testing, where the given distribution is the uniform distribution $U_n$ over the domain, our new tester is surprisingly simple: to test whether $p = U_n$ versus $d_TV(p, U_n) \geq \varepsilon$, we simply threshold $d_TV(p, U_n)$, where $p$ is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant $\delta$ case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of $\varepsilon$ and $\delta$.

Description

[1708.02728] Optimal Identity Testing with High Probability

Links and resources

BibTeX key: diakonikolas2017optimal
entry type: article
year: 2017
url: http://arxiv.org/abs/1708.02728
note: cite arxiv:1708.02728

Cite this publication

@article{diakonikolas2017optimal, abstract = {We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $0< \epsilon, \delta < 1$, we wish to distinguish, {\em with probability at least $1-\delta$}, whether the distributions are identical versus $\varepsilon$-far in total variation distance. Most prior work focused on the case that $\delta = \Omega(1)$, for which the sample complexity of identity testing is known to be $\Theta(\sqrt{n}/\epsilon^2)$. Given such an algorithm, one can achieve arbitrarily small values of $\delta$ via black-box amplification, which multiplies the required number of samples by $\Theta(\log(1/\delta))$. We show that black-box amplification is suboptimal for any $\delta = o(1)$, and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is \[ \Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} + \log(1/\delta) \right)\right) \] for any $n, \varepsilon$, and $\delta$. For the special case of uniformity testing, where the given distribution is the uniform distribution $U_n$ over the domain, our new tester is surprisingly simple: to test whether $p = U_n$ versus $d_{\mathrm TV}(p, U_n) \geq \varepsilon$, we simply threshold $d_{\mathrm TV}(\widehat{p}, U_n)$, where $\widehat{p}$ is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant $\delta$ case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of $\varepsilon$ and $\delta$.}, added-at = {2020-02-26T13:39:00.000+0100}, author = {Diakonikolas, Ilias and Gouleakis, Themis and Peebles, John and Price, Eric}, biburl = {https://www.bibsonomy.org/bibtex/2ad4fe0a2476868d145ebfcf84e322b82/kirk86}, description = {[1708.02728] Optimal Identity Testing with High Probability}, interhash = {8d4c83cb29cde2d48ac8ee871fc90c8a}, intrahash = {ad4fe0a2476868d145ebfcf84e322b82}, keywords = {stats}, note = {cite arxiv:1708.02728}, timestamp = {2020-02-26T13:39:00.000+0100}, title = {Optimal Identity Testing with High Probability}, url = {http://arxiv.org/abs/1708.02728}, year = 2017 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Optimal Identity Testing with High Probability

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Optimal Identity Testing with High Probability

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Optimal Identity Testing with High Probability

Comments and Reviews
(0)