J. Moniz, and D. Krueger. (2018)cite arxiv:1801.10308Comment: Accepted at ACML 2017.
Abstract
We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple
levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to
stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell,
which has its own inner memory cell. Specifically, instead of computing the
value of the (outer) memory cell as $c^outer_t = f_t c_t-1 + i_t
g_t$, NLSTM memory cells use the concatenation $(f_t c_t-1, i_t
g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set
$c^outer_t$ = $h^inner_t$. Nested LSTMs outperform both stacked and
single-layer LSTMs with similar numbers of parameters in our experiments on
various character-level language modeling tasks, and the inner memories of an
LSTM learn longer term dependencies compared with the higher-level units of a
stacked LSTM.
%0 Journal Article
%1 moniz2018nested
%A Moniz, Joel Ruben Antony
%A Krueger, David
%D 2018
%K memory
%T Nested LSTMs
%U http://arxiv.org/abs/1801.10308
%X We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple
levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to
stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell,
which has its own inner memory cell. Specifically, instead of computing the
value of the (outer) memory cell as $c^outer_t = f_t c_t-1 + i_t
g_t$, NLSTM memory cells use the concatenation $(f_t c_t-1, i_t
g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set
$c^outer_t$ = $h^inner_t$. Nested LSTMs outperform both stacked and
single-layer LSTMs with similar numbers of parameters in our experiments on
various character-level language modeling tasks, and the inner memories of an
LSTM learn longer term dependencies compared with the higher-level units of a
stacked LSTM.
@article{moniz2018nested,
abstract = {We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple
levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to
stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell,
which has its own inner memory cell. Specifically, instead of computing the
value of the (outer) memory cell as $c^{outer}_t = f_t \odot c_{t-1} + i_t
\odot g_t$, NLSTM memory cells use the concatenation $(f_t \odot c_{t-1}, i_t
\odot g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set
$c^{outer}_t$ = $h^{inner}_t$. Nested LSTMs outperform both stacked and
single-layer LSTMs with similar numbers of parameters in our experiments on
various character-level language modeling tasks, and the inner memories of an
LSTM learn longer term dependencies compared with the higher-level units of a
stacked LSTM.},
added-at = {2019-03-21T20:39:07.000+0100},
author = {Moniz, Joel Ruben Antony and Krueger, David},
biburl = {https://www.bibsonomy.org/bibtex/2974ca53818cb82de8bb287a0c121b4f5/kirk86},
description = {[1801.10308] Nested LSTMs},
interhash = {1a4de60a43b7745a41563aa78d197f9f},
intrahash = {974ca53818cb82de8bb287a0c121b4f5},
keywords = {memory},
note = {cite arxiv:1801.10308Comment: Accepted at ACML 2017},
timestamp = {2019-03-21T20:39:07.000+0100},
title = {Nested LSTMs},
url = {http://arxiv.org/abs/1801.10308},
year = 2018
}