An Empirical Evaluation of doc2vec with Practical Insights into Document
Embedding Generation
J. Lau, und T. Baldwin. (2016)cite arxiv:1607.05368Comment: 1st Workshop on Representation Learning for NLP.
Zusammenfassung
Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec
(Mikolov et al., 2013a) to learn document-level embeddings. Despite promising
results in the original paper, others have struggled to reproduce those
results. This paper presents a rigorous empirical evaluation of doc2vec over
two tasks. We compare doc2vec to two baselines and two state-of-the-art
document embedding methodologies. We found that doc2vec performs robustly when
using models trained on large external corpora, and can be further improved by
using pre-trained word embeddings. We also provide recommendations on
hyper-parameter settings for general purpose applications, and release source
code to induce document embeddings using our trained doc2vec models.
Beschreibung
An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
%0 Generic
%1 lau2016empirical
%A Lau, Jey Han
%A Baldwin, Timothy
%D 2016
%K doc2vec document embeddings genre2020 nlp
%T An Empirical Evaluation of doc2vec with Practical Insights into Document
Embedding Generation
%U http://arxiv.org/abs/1607.05368
%X Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec
(Mikolov et al., 2013a) to learn document-level embeddings. Despite promising
results in the original paper, others have struggled to reproduce those
results. This paper presents a rigorous empirical evaluation of doc2vec over
two tasks. We compare doc2vec to two baselines and two state-of-the-art
document embedding methodologies. We found that doc2vec performs robustly when
using models trained on large external corpora, and can be further improved by
using pre-trained word embeddings. We also provide recommendations on
hyper-parameter settings for general purpose applications, and release source
code to induce document embeddings using our trained doc2vec models.
@misc{lau2016empirical,
abstract = {Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec
(Mikolov et al., 2013a) to learn document-level embeddings. Despite promising
results in the original paper, others have struggled to reproduce those
results. This paper presents a rigorous empirical evaluation of doc2vec over
two tasks. We compare doc2vec to two baselines and two state-of-the-art
document embedding methodologies. We found that doc2vec performs robustly when
using models trained on large external corpora, and can be further improved by
using pre-trained word embeddings. We also provide recommendations on
hyper-parameter settings for general purpose applications, and release source
code to induce document embeddings using our trained doc2vec models.},
added-at = {2020-07-06T10:55:26.000+0200},
author = {Lau, Jey Han and Baldwin, Timothy},
biburl = {https://www.bibsonomy.org/bibtex/2af3c2a80cf9139a541308611ea5b7162/schwemmlein},
description = {An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation},
interhash = {ffa9ea0ab141686bb174a4c165cda0f9},
intrahash = {af3c2a80cf9139a541308611ea5b7162},
keywords = {doc2vec document embeddings genre2020 nlp},
note = {cite arxiv:1607.05368Comment: 1st Workshop on Representation Learning for NLP},
timestamp = {2020-07-06T10:55:26.000+0200},
title = {An Empirical Evaluation of doc2vec with Practical Insights into Document
Embedding Generation},
url = {http://arxiv.org/abs/1607.05368},
year = 2016
}