Article,

A Minimally Supervised Approach for Synonym Extraction with Word Embeddings

A. Leeuwenberg, M. Vela, J. Dehdari, and J. Genabithb.
The Prague Bulletin of Mathematical Linguistics, (April 2016)

Abstract

In this paper we present a novel approach to minimally supervised synonym extraction. The approach is based on the word embeddings and aims at presenting a method for synonym extraction that is extensible to various languages. We report experiments with word vectors trained by using both the continuous bag-of-words model (CBoW) and the skip-gram model (SG) investigating the effects of different settings with respect to the contextual window size, the number of dimensions and the type of word vectors. We analyze the word categories that are (cosine) similar in the vector space, showing that cosine similarity on its own is a bad indicator to determine if two words are synonymous. In this context, we propose a new measure, relative cosine similarity, for calculating similarity relative to other cosine-similar words in the corpus. We show that calculating similarity relative to other words boosts the precision of the extraction. We also experiment with combining similarity scores from differently-trained vectors and explore the advantages of using a part-of-speech tagger as a way of introducing some light supervision, thus aiding extraction. We perform both intrinsic and extrinsic evaluation on our final system: intrinsic evaluation is carried out manually by two human evaluators and we use the output of our system in a machine translation task for extrinsic evaluation, showing that the extracted synonyms improve the evaluation metric

BibTeX key: noauthororeditor2016minimally
entry type: article
year: 2016
month: APRIL
journal: The Prague Bulletin of Mathematical Linguistics
number: 105
pages: 111-142
language: English
url: https://ufal.mff.cuni.cz/pbml/105/art-leeuwenberg-et-al.pdf%3E

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{noauthororeditor2016minimally, abstract = {In this paper we present a novel approach to minimally supervised synonym extraction. The approach is based on the word embeddings and aims at presenting a method for synonym extraction that is extensible to various languages. We report experiments with word vectors trained by using both the continuous bag-of-words model (CBoW) and the skip-gram model (SG) investigating the effects of different settings with respect to the contextual window size, the number of dimensions and the type of word vectors. We analyze the word categories that are (cosine) similar in the vector space, showing that cosine similarity on its own is a bad indicator to determine if two words are synonymous. In this context, we propose a new measure, relative cosine similarity, for calculating similarity relative to other cosine-similar words in the corpus. We show that calculating similarity relative to other words boosts the precision of the extraction. We also experiment with combining similarity scores from differently-trained vectors and explore the advantages of using a part-of-speech tagger as a way of introducing some light supervision, thus aiding extraction. We perform both intrinsic and extrinsic evaluation on our final system: intrinsic evaluation is carried out manually by two human evaluators and we use the output of our system in a machine translation task for extrinsic evaluation, showing that the extracted synonyms improve the evaluation metric}, added-at = {2017-09-22T10:27:18.000+0200}, author = {Leeuwenberg, Artuur and Vela, Mihaela and Dehdari, Jon and Genabithb, Josef van}, biburl = {https://www.bibsonomy.org/bibtex/27ec9c6ef72fe81ad22953b1ca8eaa43e/ckiesl}, interhash = {e5c4ee21fcfb88fba0ace60452a6f2e7}, intrahash = {7ec9c6ef72fe81ad22953b1ca8eaa43e}, journal = {The Prague Bulletin of Mathematical Linguistics}, keywords = {Synonym-Mining Word-Embeddings}, language = {English}, month = {APRIL }, number = 105, pages = {111-142}, timestamp = {2017-09-22T10:27:18.000+0200}, title = {A Minimally Supervised Approach for Synonym Extraction with Word Embeddings}, url = {https://ufal.mff.cuni.cz/pbml/105/art-leeuwenberg-et-al.pdf%3E}, year = 2016 }

BibSonomy

A Minimally Supervised Approach for Synonym Extraction with Word Embeddings

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on