We are in the middle of a remarkable rise in the use and capability of
artificial intelligence. Much of this growth has been fueled by the success of
deep learning architectures: models that map from observables to outputs via
multiple layers of latent representations. These deep learning algorithms are
effective tools for unstructured prediction, and they can be combined in AI
systems to solve complex automated reasoning problems. This paper provides a
recipe for combining ML algorithms to solve for causal effects in the presence
of instrumental variables -- sources of treatment randomization that are
conditionally independent from the response. We show that a flexible IV
specification resolves into two prediction tasks that can be solved with deep
neural nets: a first-stage network for treatment prediction and a second-stage
network whose loss function involves integration over the conditional treatment
distribution. This Deep IV framework imposes some specific structure on the
stochastic gradient descent routine used for training, but it is general enough
that we can take advantage of off-the-shelf ML capabilities and avoid extensive
algorithm customization. We outline how to obtain out-of-sample causal
validation in order to avoid over-fit. We also introduce schemes for both
Bayesian and frequentist inference: the former via a novel adaptation of
dropout training, and the latter via a data splitting routine.
Description
[1612.09596] Counterfactual Prediction with Deep Instrumental Variables Networks
%0 Journal Article
%1 hartford2016counterfactual
%A Hartford, Jason
%A Lewis, Greg
%A Leyton-Brown, Kevin
%A Taddy, Matt
%D 2016
%K causal-analysis deep-learning
%T Counterfactual Prediction with Deep Instrumental Variables Networks
%U http://arxiv.org/abs/1612.09596
%X We are in the middle of a remarkable rise in the use and capability of
artificial intelligence. Much of this growth has been fueled by the success of
deep learning architectures: models that map from observables to outputs via
multiple layers of latent representations. These deep learning algorithms are
effective tools for unstructured prediction, and they can be combined in AI
systems to solve complex automated reasoning problems. This paper provides a
recipe for combining ML algorithms to solve for causal effects in the presence
of instrumental variables -- sources of treatment randomization that are
conditionally independent from the response. We show that a flexible IV
specification resolves into two prediction tasks that can be solved with deep
neural nets: a first-stage network for treatment prediction and a second-stage
network whose loss function involves integration over the conditional treatment
distribution. This Deep IV framework imposes some specific structure on the
stochastic gradient descent routine used for training, but it is general enough
that we can take advantage of off-the-shelf ML capabilities and avoid extensive
algorithm customization. We outline how to obtain out-of-sample causal
validation in order to avoid over-fit. We also introduce schemes for both
Bayesian and frequentist inference: the former via a novel adaptation of
dropout training, and the latter via a data splitting routine.
@article{hartford2016counterfactual,
abstract = {We are in the middle of a remarkable rise in the use and capability of
artificial intelligence. Much of this growth has been fueled by the success of
deep learning architectures: models that map from observables to outputs via
multiple layers of latent representations. These deep learning algorithms are
effective tools for unstructured prediction, and they can be combined in AI
systems to solve complex automated reasoning problems. This paper provides a
recipe for combining ML algorithms to solve for causal effects in the presence
of instrumental variables -- sources of treatment randomization that are
conditionally independent from the response. We show that a flexible IV
specification resolves into two prediction tasks that can be solved with deep
neural nets: a first-stage network for treatment prediction and a second-stage
network whose loss function involves integration over the conditional treatment
distribution. This Deep IV framework imposes some specific structure on the
stochastic gradient descent routine used for training, but it is general enough
that we can take advantage of off-the-shelf ML capabilities and avoid extensive
algorithm customization. We outline how to obtain out-of-sample causal
validation in order to avoid over-fit. We also introduce schemes for both
Bayesian and frequentist inference: the former via a novel adaptation of
dropout training, and the latter via a data splitting routine.},
added-at = {2019-05-23T04:11:21.000+0200},
author = {Hartford, Jason and Lewis, Greg and Leyton-Brown, Kevin and Taddy, Matt},
biburl = {https://www.bibsonomy.org/bibtex/23b62017e14811c63a0aa466cf747a808/kirk86},
description = {[1612.09596] Counterfactual Prediction with Deep Instrumental Variables Networks},
interhash = {9e1ae383e078018fa8ac824153a70a40},
intrahash = {3b62017e14811c63a0aa466cf747a808},
keywords = {causal-analysis deep-learning},
note = {cite arxiv:1612.09596},
timestamp = {2019-05-23T04:11:21.000+0200},
title = {Counterfactual Prediction with Deep Instrumental Variables Networks},
url = {http://arxiv.org/abs/1612.09596},
year = 2016
}