Abstract
We present a study in Distributed Deep Reinforcement Learning (DDRL) focused
on scalability of a state-of-the-art Deep Reinforcement Learning algorithm
known as Batch Asynchronous Advantage ActorCritic (BA3C). We show that using
the Adam optimization algorithm with a batch size of up to 2048 is a viable
choice for carrying out large scale machine learning computations. This,
combined with careful reexamination of the optimizer's hyperparameters, using
synchronous training on the node level (while keeping the local, single node
part of the algorithm asynchronous) and minimizing the memory footprint of the
model, allowed us to achieve linear scaling for up to 64 CPU nodes. This
corresponds to a training time of 21 minutes on 768 CPU cores, as opposed to 10
hours when using a single node with 24 cores achieved by a baseline single-node
implementation.
Users
Please
log in to take part in the discussion (add own reviews or comments).