Abstract
We address the issue of speeding up the training of convolutional neural
networks by studying a distributed method adapted to stochastic gradient
descent. Our parallel optimization setup uses several threads, each applying
individual gradient descents on a local variable. We propose a new way of
sharing information between different threads based on gossip algorithms that
show good consensus convergence properties. Our method called GoSGD has the
advantage to be fully asynchronous and decentralized.
Users
Please
log in to take part in the discussion (add own reviews or comments).