Abstract
This work explores the idea of a causal contextual multi-armed bandit
approach to automated marketing, where we estimate and optimize the causal
(incremental) effects. Focusing on causal effect leads to better return on
investment (ROI) by targeting only the persuadable customers who wouldn't have
taken the action organically. Our approach draws on strengths of causal
inference, uplift modeling, and multi-armed bandits. It optimizes on causal
treatment effects rather than pure outcome, and incorporates counterfactual
generation within data collection. Following uplift modeling results, we
optimize over the incremental business metric. Multi-armed bandit methods allow
us to scale to multiple treatments and to perform off-policy policy evaluation
on logged data. The Thompson sampling strategy in particular enables
exploration of treatments on similar customer contexts and materialization of
counterfactual outcomes. Preliminary offline experiments on a retail Fashion
marketing dataset show merits of our proposal.
Users
Please
log in to take part in the discussion (add own reviews or comments).