We present two approaches that use unlabeled data to improve sequence
learning with recurrent networks. The first approach is to predict what comes
next in a sequence, which is a conventional language model in natural language
processing. The second approach is to use a sequence autoencoder, which reads
the input sequence into a vector and predicts the input sequence again. These
two algorithms can be used as a "pretraining" step for a later supervised
sequence learning algorithm. In other words, the parameters obtained from the
unsupervised step can be used as a starting point for other supervised training
models. In our experiments, we find that long short term memory recurrent
networks after being pretrained with the two approaches are more stable and
generalize better. With pretraining, we are able to train long short term
memory recurrent networks up to a few hundred timesteps, thereby achieving
strong performance in many text classification tasks, such as IMDB, DBpedia and
20 Newsgroups.
%0 Generic
%1 dai2015semisupervised
%A Dai, Andrew M.
%A Le, Quoc V.
%D 2015
%K classification semi-supervised sequence text
%T Semi-supervised Sequence Learning
%U http://arxiv.org/abs/1511.01432
%X We present two approaches that use unlabeled data to improve sequence
learning with recurrent networks. The first approach is to predict what comes
next in a sequence, which is a conventional language model in natural language
processing. The second approach is to use a sequence autoencoder, which reads
the input sequence into a vector and predicts the input sequence again. These
two algorithms can be used as a "pretraining" step for a later supervised
sequence learning algorithm. In other words, the parameters obtained from the
unsupervised step can be used as a starting point for other supervised training
models. In our experiments, we find that long short term memory recurrent
networks after being pretrained with the two approaches are more stable and
generalize better. With pretraining, we are able to train long short term
memory recurrent networks up to a few hundred timesteps, thereby achieving
strong performance in many text classification tasks, such as IMDB, DBpedia and
20 Newsgroups.
@misc{dai2015semisupervised,
abstract = {We present two approaches that use unlabeled data to improve sequence
learning with recurrent networks. The first approach is to predict what comes
next in a sequence, which is a conventional language model in natural language
processing. The second approach is to use a sequence autoencoder, which reads
the input sequence into a vector and predicts the input sequence again. These
two algorithms can be used as a "pretraining" step for a later supervised
sequence learning algorithm. In other words, the parameters obtained from the
unsupervised step can be used as a starting point for other supervised training
models. In our experiments, we find that long short term memory recurrent
networks after being pretrained with the two approaches are more stable and
generalize better. With pretraining, we are able to train long short term
memory recurrent networks up to a few hundred timesteps, thereby achieving
strong performance in many text classification tasks, such as IMDB, DBpedia and
20 Newsgroups.},
added-at = {2017-10-04T16:31:42.000+0200},
author = {Dai, Andrew M. and Le, Quoc V.},
biburl = {https://www.bibsonomy.org/bibtex/2d16f61a37f1cd9179fdd177abdf3d9ba/daschloer},
description = {Semi-supervised Sequence Learning},
interhash = {64c80f6048e3c96de27b973f0a76d707},
intrahash = {d16f61a37f1cd9179fdd177abdf3d9ba},
keywords = {classification semi-supervised sequence text},
note = {cite arxiv:1511.01432},
timestamp = {2017-10-04T16:31:42.000+0200},
title = {Semi-supervised Sequence Learning},
url = {http://arxiv.org/abs/1511.01432},
year = 2015
}