P. Gonnet, and T. Deselaers. (2019)cite arxiv:1903.08023Comment: 8 pages, submitted to ICDAR 2019.
Abstract
We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs.
These differ from regular LSTM cells in that the recurrent weights are not
modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state
of each LSTM cell depends on the inputs and its own output/state, as opposed to
the input and the outputs/states of all the cells in the layer. The number of
parameters per IndyLSTM layer, and thus the number of FLOPS per evaluation, is
linear in the number of nodes in the layer, as opposed to quadratic for regular
LSTM layers, resulting in potentially both smaller and faster models. We
evaluate their performance experimentally by training several models on the
popular and CASIA online handwriting datasets, as well as on several
of our in-house datasets. We show that IndyLSTMs, despite their smaller size,
consistently outperform regular LSTMs both in terms of accuracy per parameter,
and in best accuracy overall. We attribute this improved performance to the
IndyLSTMs being less prone to overfitting.
%0 Journal Article
%1 gonnet2019indylstms
%A Gonnet, Pedro
%A Deselaers, Thomas
%D 2019
%K deep-learning memory
%T IndyLSTMs: Independently Recurrent LSTMs
%U http://arxiv.org/abs/1903.08023
%X We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs.
These differ from regular LSTM cells in that the recurrent weights are not
modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state
of each LSTM cell depends on the inputs and its own output/state, as opposed to
the input and the outputs/states of all the cells in the layer. The number of
parameters per IndyLSTM layer, and thus the number of FLOPS per evaluation, is
linear in the number of nodes in the layer, as opposed to quadratic for regular
LSTM layers, resulting in potentially both smaller and faster models. We
evaluate their performance experimentally by training several models on the
popular and CASIA online handwriting datasets, as well as on several
of our in-house datasets. We show that IndyLSTMs, despite their smaller size,
consistently outperform regular LSTMs both in terms of accuracy per parameter,
and in best accuracy overall. We attribute this improved performance to the
IndyLSTMs being less prone to overfitting.
@article{gonnet2019indylstms,
abstract = {We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs.
These differ from regular LSTM cells in that the recurrent weights are not
modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state
of each LSTM cell depends on the inputs and its own output/state, as opposed to
the input and the outputs/states of all the cells in the layer. The number of
parameters per IndyLSTM layer, and thus the number of FLOPS per evaluation, is
linear in the number of nodes in the layer, as opposed to quadratic for regular
LSTM layers, resulting in potentially both smaller and faster models. We
evaluate their performance experimentally by training several models on the
popular \iamondb and CASIA online handwriting datasets, as well as on several
of our in-house datasets. We show that IndyLSTMs, despite their smaller size,
consistently outperform regular LSTMs both in terms of accuracy per parameter,
and in best accuracy overall. We attribute this improved performance to the
IndyLSTMs being less prone to overfitting.},
added-at = {2019-03-21T00:49:48.000+0100},
author = {Gonnet, Pedro and Deselaers, Thomas},
biburl = {https://www.bibsonomy.org/bibtex/24f29b2003670001bbdafa7689e986f0a/kirk86},
description = {IndyLSTMs: Independently Recurrent LSTMs},
interhash = {cf29590b3ad0a76f6c7d735dc7309c3b},
intrahash = {4f29b2003670001bbdafa7689e986f0a},
keywords = {deep-learning memory},
note = {cite arxiv:1903.08023Comment: 8 pages, submitted to ICDAR 2019},
timestamp = {2019-03-21T00:49:48.000+0100},
title = {IndyLSTMs: Independently Recurrent LSTMs},
url = {http://arxiv.org/abs/1903.08023},
year = 2019
}