Abstract
One advantage of spiking recurrent neural networks (SNNs) is an ability
to categorise data using a synchrony-based latching mechnanism. This
is particularly useful in problems where timewarping is encountered,
such as speech recognition. Differentiable recurrent neural networks
(RNNs) by contrast fail at tasks involving difficult timewarping,
despite having sequence learning capabilities superior to SNNs. In
this paper we demonstrate that Long Short-Term Memory (LSTM) is an
RNN capable of robustly categorizing timewarped speech data, thus
combining the most useful features of both paradigms. We compare
its performance to SNNs on two variants of a spoken digit identification
task, using data from an international competition. The first task
(described in Nature (Nadis 2003)) required the categorisation of
spoken digits with only a single training exemplar, and was specifically
designed to test robustness to timewarping. Here LSTM performed better
than all the SNNs in the competition. The second task was to predict
spoken digits using a larger training set. Here LSTM greatly outperformed
an SNN-like model found in the literature. These results suggest
that LSTM has a place in domains that require the learning of large
timewarped datasets, such as automatic speech recognition.
Users
Please
log in to take part in the discussion (add own reviews or comments).