Аннотация
Given only data generated by a standard confounding graph with unobserved
confounder, the Average Treatment Effect (ATE) is not identifiable. To estimate
the ATE, a practitioner must then either (a) collect deconfounded data; (b) run
a clinical trial; or (c) elucidate further properties of the causal graph that
might render the ATE identifiable. In this paper, we consider the benefit of
incorporating a (large) confounded observational dataset alongside a (small)
deconfounded observational dataset when estimating the ATE. Our theoretical
results show that the inclusion of confounded data can significantly reduce the
quantity of deconfounded data required to estimate the ATE to within a desired
accuracy level. Moreover, in some cases---say, genetics---we could imagine
retrospectively selecting samples to deconfound. We demonstrate that by
strategically selecting these examples based upon the (already observed)
treatment and outcome, we can reduce our data dependence further. Our
theoretical and empirical results establish that the worst-case relative
performance of our approach (vs. a natural benchmark) is bounded while our
best-case gains are unbounded. Next, we demonstrate the benefits of selective
deconfounding using a large real-world dataset related to genetic mutation in
cancer. Finally, we introduce an online version of the problem, proposing two
adaptive heuristics.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)