Techreport,

Comparing LSTM Recurrent Networks and Spiking Recurrent Networks on the Recognition of Spoken Digits

, , and .
IDSIA-13-03. IDSIA, www.idsia.ch/\-techrep.html, (May 2003)

Abstract

One advantage of spiking recurrent neural networks (SNNs) is an ability to categorise data using a synchrony-based latching mechnanism. This is particularly useful in problems where timewarping is encountered, such as speech recognition. Differentiable recurrent neural networks (RNNs) by contrast fail at tasks involving difficult timewarping, despite having sequence learning capabilities superior to SNNs. In this paper we demonstrate that Long Short-Term Memory (LSTM) is an RNN capable of robustly categorizing timewarped speech data, thus combining the most useful features of both paradigms. We compare its performance to SNNs on two variants of a spoken digit identification task, using data from an international competition. The first task (described in Nature (Nadis 2003)) required the categorisation of spoken digits with only a single training exemplar, and was specifically designed to test robustness to timewarping. Here LSTM performed better than all the SNNs in the competition. The second task was to predict spoken digits using a larger training set. Here LSTM greatly outperformed an SNN-like model found in the literature. These results suggest that LSTM has a place in domains that require the learning of large timewarped datasets, such as automatic speech recognition.

Tags

Users

  • @tb2332

Comments and Reviews