@albinzehe

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

, and . (2016)cite arxiv:1607.05368Comment: 1st Workshop on Representation Learning for NLP.

Abstract

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding methodologies. We found that doc2vec performs robustly when using models trained on large external corpora, and can be further improved by using pre-trained word embeddings. We also provide recommendations on hyper-parameter settings for general purpose applications, and release source code to induce document embeddings using our trained doc2vec models.

Description

[1607.05368] An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation Pretrained doc2vec models available from https://github.com/jhlau/doc2vec

Links and resources

Tags

community

  • @schwemmlein
  • @thoni
  • @albinzehe
  • @dblp
@albinzehe's tags highlighted