копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Analysis of the Paragraph Vector Model for Information Retrieval

Q. Ai, L. Yang, J. Guo, и W. Croft. Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, стр. 133--142. New York, NY, USA, ACM, (2016)
DOI: 10.1145/2970398.2970409

Аннотация

Previous studies have shown that semantically meaningful representations of words and text can be acquired through neural embedding models. In particular, paragraph vector (PV) models have shown impressive performance in some natural language processing tasks by estimating a document (topic) level language model. Integrating the PV models with traditional language model approaches to retrieval, however, produces unstable performance and limited improvements. In this paper, we formally discuss three intrinsic problems of the original PV model that restrict its performance in retrieval tasks. We also describe modifications to the model that make it more suitable for the IR task, and show their impact through experiments and case studies. The three issues we address are (1) the unregulated training process of PV is vulnerable to short document over-fitting that produces length bias in the final retrieval model; (2) the corpus-based negative sampling of PV leads to a weighting scheme for words that overly suppresses the importance of frequent words; and (3) the lack of word-context information makes PV unable to capture word substitution relationships.

Описание

Analysis of the Paragraph Vector Model for Information Retrieval

Линки и ресурсы

ключ BibTeX: Ai:2016:APV:2970398.2970409
тип записи: inproceedings
адрес: New York, NY, USA
название книги: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
год: 2016
страницы: 133--142
издательство: ACM
серии: ICTIR '16
acmid: 2970409
isbn: 978-1-4503-4497-5
location: Newark, Delaware, USA
numpages: 10
DOI: 10.1145/2970398.2970409
url: http://doi.acm.org/10.1145/2970398.2970409

тэги

@albinzehe- тэги данного пользователя выделены

Цитировать эту публикацию

@inproceedings{Ai:2016:APV:2970398.2970409, abstract = {Previous studies have shown that semantically meaningful representations of words and text can be acquired through neural embedding models. In particular, paragraph vector (PV) models have shown impressive performance in some natural language processing tasks by estimating a document (topic) level language model. Integrating the PV models with traditional language model approaches to retrieval, however, produces unstable performance and limited improvements. In this paper, we formally discuss three intrinsic problems of the original PV model that restrict its performance in retrieval tasks. We also describe modifications to the model that make it more suitable for the IR task, and show their impact through experiments and case studies. The three issues we address are (1) the unregulated training process of PV is vulnerable to short document over-fitting that produces length bias in the final retrieval model; (2) the corpus-based negative sampling of PV leads to a weighting scheme for words that overly suppresses the importance of frequent words; and (3) the lack of word-context information makes PV unable to capture word substitution relationships.}, acmid = {2970409}, added-at = {2016-12-18T14:45:47.000+0100}, address = {New York, NY, USA}, author = {Ai, Qingyao and Yang, Liu and Guo, Jiafeng and Croft, W. Bruce}, biburl = {https://www.bibsonomy.org/bibtex/285d9a1411ddba7ebca06c0ed2b8004d9/albinzehe}, booktitle = {Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval}, description = {Analysis of the Paragraph Vector Model for Information Retrieval}, doi = {10.1145/2970398.2970409}, interhash = {bb8a371ee918861e300d4935d65dd849}, intrahash = {85d9a1411ddba7ebca06c0ed2b8004d9}, isbn = {978-1-4503-4497-5}, keywords = {doc2vec ma-zehe paragraphvectors}, location = {Newark, Delaware, USA}, numpages = {10}, pages = {133--142}, publisher = {ACM}, series = {ICTIR '16}, timestamp = {2016-12-18T14:45:47.000+0100}, title = {Analysis of the Paragraph Vector Model for Information Retrieval}, url = {http://doi.acm.org/10.1145/2970398.2970409}, year = 2016 }

искать в

Метаданные

Последнее изменение 8 лет назад
Создан 8 лет назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!