Abstract
This paper presents an algorithm for continuous speech recognition
built from two Long Short-Term Memory (LSTM) recurrent neural networks.
A first LSTM network performs frame-level phone probability estimation.
A second network maps these phone predictions onto words. In contrast
to HMMs, this allows greater exploitation of long-timescale correlations.
Simulation results are presented for a hand-segmented subset of the
"Numbers-95" database. These results include isolated phone prediction,
continuous frame-level phone prediction and continuous word prediction.
We conclude that despite its early stage of development, our new
model is already competitive with existing approaches on certain
aspects of speech recognition and promising on others, warranting
further research.
Users
Please
log in to take part in the discussion (add own reviews or comments).