@dblp

Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy.

, , , , , and . CoRR, (2018)

Links and resources

Tags