Abstract
Long Short-Term Memory (LSTM) is a recurrent neural network (RNN)
architecture that has been designed to address the vanishing and exploding
gradient problems of conventional RNNs. Unlike feedforward neural networks,
RNNs have cyclic connections making them powerful for modeling sequences. They
have been successfully used for sequence labeling and sequence prediction
tasks, such as handwriting recognition, language modeling, phonetic labeling of
acoustic frames. However, in contrast to the deep neural networks, the use of
RNNs in speech recognition has been limited to phone recognition in small scale
tasks. In this paper, we present novel LSTM based RNN architectures which make
more effective use of model parameters to train acoustic models for large
vocabulary speech recognition. We train and compare LSTM, RNN and DNN models at
various numbers of parameters and configurations. We show that LSTM models
converge quickly and give state of the art speech recognition performance for
relatively small sized models.
Users
Please
log in to take part in the discussion (add own reviews or comments).