A. Dai, C. Olah, and Q. Le. (2015)cite arxiv:1507.07998Comment: 8 pages.
Abstract
Paragraph Vectors has been recently proposed as an unsupervised method for
learning distributed representations for pieces of texts. In their work, the
authors showed that the method can learn an embedding of movie review texts
which can be leveraged for sentiment analysis. That proof of concept, while
encouraging, was rather narrow. Here we consider tasks other than sentiment
analysis, provide a more thorough comparison of Paragraph Vectors to other
document modelling algorithms such as Latent Dirichlet Allocation, and evaluate
performance of the method as we vary the dimensionality of the learned
representation. We benchmarked the models on two document similarity data sets,
one from Wikipedia, one from arXiv. We observe that the Paragraph Vector method
performs significantly better than other methods, and propose a simple
improvement to enhance embedding quality. Somewhat surprisingly, we also show
that much like word embeddings, vector operations on Paragraph Vectors can
perform useful semantic results.
%0 Generic
%1 dai2015document
%A Dai, Andrew M.
%A Olah, Christopher
%A Le, Quoc V.
%D 2015
%K document embeddings genre2020 nlp paragraph
%T Document Embedding with Paragraph Vectors
%U http://arxiv.org/abs/1507.07998
%X Paragraph Vectors has been recently proposed as an unsupervised method for
learning distributed representations for pieces of texts. In their work, the
authors showed that the method can learn an embedding of movie review texts
which can be leveraged for sentiment analysis. That proof of concept, while
encouraging, was rather narrow. Here we consider tasks other than sentiment
analysis, provide a more thorough comparison of Paragraph Vectors to other
document modelling algorithms such as Latent Dirichlet Allocation, and evaluate
performance of the method as we vary the dimensionality of the learned
representation. We benchmarked the models on two document similarity data sets,
one from Wikipedia, one from arXiv. We observe that the Paragraph Vector method
performs significantly better than other methods, and propose a simple
improvement to enhance embedding quality. Somewhat surprisingly, we also show
that much like word embeddings, vector operations on Paragraph Vectors can
perform useful semantic results.
@misc{dai2015document,
abstract = {Paragraph Vectors has been recently proposed as an unsupervised method for
learning distributed representations for pieces of texts. In their work, the
authors showed that the method can learn an embedding of movie review texts
which can be leveraged for sentiment analysis. That proof of concept, while
encouraging, was rather narrow. Here we consider tasks other than sentiment
analysis, provide a more thorough comparison of Paragraph Vectors to other
document modelling algorithms such as Latent Dirichlet Allocation, and evaluate
performance of the method as we vary the dimensionality of the learned
representation. We benchmarked the models on two document similarity data sets,
one from Wikipedia, one from arXiv. We observe that the Paragraph Vector method
performs significantly better than other methods, and propose a simple
improvement to enhance embedding quality. Somewhat surprisingly, we also show
that much like word embeddings, vector operations on Paragraph Vectors can
perform useful semantic results.},
added-at = {2020-07-06T11:22:15.000+0200},
author = {Dai, Andrew M. and Olah, Christopher and Le, Quoc V.},
biburl = {https://www.bibsonomy.org/bibtex/2f37df21cc8d4e46ccbc2ad2a7334c072/schwemmlein},
description = {Document Embedding with Paragraph Vectors},
interhash = {dbf240d8026419bf0be33daf2b754764},
intrahash = {f37df21cc8d4e46ccbc2ad2a7334c072},
keywords = {document embeddings genre2020 nlp paragraph},
note = {cite arxiv:1507.07998Comment: 8 pages},
timestamp = {2020-07-06T11:22:15.000+0200},
title = {Document Embedding with Paragraph Vectors},
url = {http://arxiv.org/abs/1507.07998},
year = 2015
}