LSX team5 at SemEval-2022 Task 8: Multilingual News Article Similarity Assessment based on Word- and Sentence Mover's Distance
S. Heil, K. Kopp, A. Zehe, K. Kobs, and A. Hotho. Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), page 1190--1195. Seattle, United States, Association for Computational Linguistics, (July 2022)
DOI: 10.18653/v1/2022.semeval-1.168
Abstract
This paper introduces our submission for the SemEval 2022 Task 8: Multilingual News Article Similarity. The task of the competition consisted of the development of a model, capable of determining the similarity between pairs of multilingual news articles. To address this challenge, we evaluated the Word Mover's Distance in conjunction with word embeddings from ConceptNet Numberbatch and term frequencies of WorldLex, as well the Sentence Mover's Distance based on sentence embeddings generated by pretrained transformer models of Sentence-BERT. To facilitate the comparison of multilingual articles with Sentence-BERT models, we deployed a Neural Machine Translation system. All our models achieve stable results in multilingual similarity estimation without learning parameters.
%0 Conference Paper
%1 heil2022lsxteam5
%A Heil, Stefan
%A Kopp, Karina
%A Zehe, Albin
%A Kobs, Konstantin
%A Hotho, Andreas
%B Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
%C Seattle, United States
%D 2022
%I Association for Computational Linguistics
%K 2022 from:albinzehe mlnlprjak multilingual myown news semeval similarity
%P 1190--1195
%R 10.18653/v1/2022.semeval-1.168
%T LSX team5 at SemEval-2022 Task 8: Multilingual News Article Similarity Assessment based on Word- and Sentence Mover's Distance
%U https://aclanthology.org/2022.semeval-1.168
%X This paper introduces our submission for the SemEval 2022 Task 8: Multilingual News Article Similarity. The task of the competition consisted of the development of a model, capable of determining the similarity between pairs of multilingual news articles. To address this challenge, we evaluated the Word Mover's Distance in conjunction with word embeddings from ConceptNet Numberbatch and term frequencies of WorldLex, as well the Sentence Mover's Distance based on sentence embeddings generated by pretrained transformer models of Sentence-BERT. To facilitate the comparison of multilingual articles with Sentence-BERT models, we deployed a Neural Machine Translation system. All our models achieve stable results in multilingual similarity estimation without learning parameters.
@inproceedings{heil2022lsxteam5,
abstract = {This paper introduces our submission for the SemEval 2022 Task 8: Multilingual News Article Similarity. The task of the competition consisted of the development of a model, capable of determining the similarity between pairs of multilingual news articles. To address this challenge, we evaluated the Word Mover{'}s Distance in conjunction with word embeddings from ConceptNet Numberbatch and term frequencies of WorldLex, as well the Sentence Mover{'}s Distance based on sentence embeddings generated by pretrained transformer models of Sentence-BERT. To facilitate the comparison of multilingual articles with Sentence-BERT models, we deployed a Neural Machine Translation system. All our models achieve stable results in multilingual similarity estimation without learning parameters.},
added-at = {2022-08-09T09:47:08.000+0200},
address = {Seattle, United States},
author = {Heil, Stefan and Kopp, Karina and Zehe, Albin and Kobs, Konstantin and Hotho, Andreas},
biburl = {https://www.bibsonomy.org/bibtex/27ff0e9c8ff222416be5c1277aed4d4ab/hotho},
booktitle = {Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)},
doi = {10.18653/v1/2022.semeval-1.168},
interhash = {aa42530de04debbb227f9cd480dda3cf},
intrahash = {7ff0e9c8ff222416be5c1277aed4d4ab},
keywords = {2022 from:albinzehe mlnlprjak multilingual myown news semeval similarity},
month = jul,
pages = {1190--1195},
publisher = {Association for Computational Linguistics},
timestamp = {2023-01-27T13:13:10.000+0100},
title = {{LSX} team5 at {S}em{E}val-2022 Task 8: Multilingual News Article Similarity Assessment based on Word- and Sentence Mover{'}s Distance},
url = {https://aclanthology.org/2022.semeval-1.168},
year = 2022
}