Causal Inference With Selectively-Deconfounded Data

K. Gan, A. Li, Z. Lipton, и S. Tayur.
(2020)cite arxiv:2002.11096.

Аннотация

Given only data generated by a standard confounding graph with unobserved confounder, the Average Treatment Effect (ATE) is not identifiable. To estimate the ATE, a practitioner must then either (a) collect deconfounded data; (b) run a clinical trial; or (c) elucidate further properties of the causal graph that might render the ATE identifiable. In this paper, we consider the benefit of incorporating a (large) confounded observational dataset alongside a (small) deconfounded observational dataset when estimating the ATE. Our theoretical results show that the inclusion of confounded data can significantly reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level. Moreover, in some cases---say, genetics---we could imagine retrospectively selecting samples to deconfound. We demonstrate that by strategically selecting these examples based upon the (already observed) treatment and outcome, we can reduce our data dependence further. Our theoretical and empirical results establish that the worst-case relative performance of our approach (vs. a natural benchmark) is bounded while our best-case gains are unbounded. Next, we demonstrate the benefits of selective deconfounding using a large real-world dataset related to genetic mutation in cancer. Finally, we introduce an online version of the problem, proposing two adaptive heuristics.

ключ BibTeX: gan2020causal
тип записи: article
год: 2020
url: http://arxiv.org/abs/2002.11096
Примечание: cite arxiv:2002.11096

тэги

causal-analysis

Пользователи данного ресурса

Комментарии и рецензиипоказать / перейти в невидимый режим

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

Цитировать эту публикацию

@article{gan2020causal, abstract = {Given only data generated by a standard confounding graph with unobserved confounder, the Average Treatment Effect (ATE) is not identifiable. To estimate the ATE, a practitioner must then either (a) collect deconfounded data; (b) run a clinical trial; or (c) elucidate further properties of the causal graph that might render the ATE identifiable. In this paper, we consider the benefit of incorporating a (large) confounded observational dataset alongside a (small) deconfounded observational dataset when estimating the ATE. Our theoretical results show that the inclusion of confounded data can significantly reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level. Moreover, in some cases---say, genetics---we could imagine retrospectively selecting samples to deconfound. We demonstrate that by strategically selecting these examples based upon the (already observed) treatment and outcome, we can reduce our data dependence further. Our theoretical and empirical results establish that the worst-case relative performance of our approach (vs. a natural benchmark) is bounded while our best-case gains are unbounded. Next, we demonstrate the benefits of selective deconfounding using a large real-world dataset related to genetic mutation in cancer. Finally, we introduce an online version of the problem, proposing two adaptive heuristics.}, added-at = {2020-02-28T01:53:50.000+0100}, author = {Gan, Kyra and Li, Andrew A. and Lipton, Zachary C. and Tayur, Sridhar}, biburl = {https://www.bibsonomy.org/bibtex/258aaa63e9e8ee787181565418ec6034a/kirk86}, description = {[2002.11096v1] Causal Inference With Selectively-Deconfounded Data}, interhash = {2e024415046b0bda61b343de677c4fbf}, intrahash = {58aaa63e9e8ee787181565418ec6034a}, keywords = {causal-analysis}, note = {cite arxiv:2002.11096}, timestamp = {2020-02-28T01:53:50.000+0100}, title = {Causal Inference With Selectively-Deconfounded Data}, url = {http://arxiv.org/abs/2002.11096}, year = 2020 }

BibSonomy