Abstract
This paper introduces multiplicative LSTM, a novel hybrid recurrent neural
network architecture for sequence modelling that combines the long short-term
memory (LSTM) and multiplicative recurrent neural network architectures.
Multiplicative LSTM is motivated by its flexibility to have very different
recurrent transition functions for each possible input, which we argue helps
make it more expressive in autoregressive density estimation. We show
empirically that multiplicative LSTM outperforms standard LSTM and its deep
variants for a range of character level modelling tasks. We also found that
this improvement increases as the complexity of the task scales up. This model
achieves a test error of 1.19 bits/character on the last 4 million characters
of the Hutter prize dataset when combined with dynamic evaluation.
Users
Please
log in to take part in the discussion (add own reviews or comments).