Inproceedings,

Similar Terms Grouping Yields Faster Terminological Saturation

V. Kosa, D. Chaves-Fraga, N. Keberle, and A. Birukou.
Information and Communication Technologies in Education, Research, and Industrial Applications, page 43--70. Cham, Springer International Publishing, (2019)
DOI: https://doi.org/10.1007/978-3-030-13929-2_3

Abstract

This paper reports on the refinement of the algorithm for measuring terminological difference between text datasets (THD). This baseline THD algorithm, developed in the OntoElect project, used exact string matches for term comparison. In this work, it has been refined by the use of appropriately selected string similarity measures (SSM) for grouping the terms, which look similar as text strings and presumably have similar meanings. To determine rational term similarity thresholds for several chosen SSMs, the measures have been implemented as software functions and evaluated on the developed test set of term pairs in English. Further, the refined algorithm implementation has been evaluated against the baseline THD algorithm. For this evaluation, the bags of terms have been used that had been extracted from the three different document collections of scientific papers, belonging to different subject domains. The experiment revealed that the use of the refined THD algorithm, compared to the baseline, resulted in quicker terminological saturation on more compact sets of source documents, though at an expense of a noticeably higher computation time.

BibTeX key: 10.1007/978-3-030-13929-2_3
entry type: inproceedings
address: Cham
booktitle: Information and Communication Technologies in Education, Research, and Industrial Applications
year: 2019
pages: 43--70
publisher: Springer International Publishing
isbn: 978-3-030-13929-2
DOI: https://doi.org/10.1007/978-3-030-13929-2_3

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{10.1007/978-3-030-13929-2_3, abstract = {This paper reports on the refinement of the algorithm for measuring terminological difference between text datasets (THD). This baseline THD algorithm, developed in the OntoElect project, used exact string matches for term comparison. In this work, it has been refined by the use of appropriately selected string similarity measures (SSM) for grouping the terms, which look similar as text strings and presumably have similar meanings. To determine rational term similarity thresholds for several chosen SSMs, the measures have been implemented as software functions and evaluated on the developed test set of term pairs in English. Further, the refined algorithm implementation has been evaluated against the baseline THD algorithm. For this evaluation, the bags of terms have been used that had been extracted from the three different document collections of scientific papers, belonging to different subject domains. The experiment revealed that the use of the refined THD algorithm, compared to the baseline, resulted in quicker terminological saturation on more compact sets of source documents, though at an expense of a noticeably higher computation time.}, added-at = {2019-02-26T12:59:38.000+0100}, address = {Cham}, author = {Kosa, Victoria and Chaves-Fraga, David and Keberle, Nataliya and Birukou, Aliaksandr}, biburl = {https://www.bibsonomy.org/bibtex/2c1f55cc38cfac1e4e5f3eaa0d5baeba0/marianorico}, booktitle = {Information and Communication Technologies in Education, Research, and Industrial Applications}, description = {Similar Terms Grouping Yields Faster Terminological Saturation | SpringerLink}, doi = {https://doi.org/10.1007/978-3-030-13929-2_3}, editor = {Ermolayev, Vadim and Suárez-Figueroa, Mari Carmen and Yakovyna, Vitaliy and Mayr, Heinrich C. and Nikitchenko, Mykola and Spivakovsky, Aleksander}, interhash = {02033771d864cf63dee7e3ab0e1f48ef}, intrahash = {c1f55cc38cfac1e4e5f3eaa0d5baeba0}, isbn = {978-3-030-13929-2}, keywords = {datos4.0}, pages = {43--70}, publisher = {Springer International Publishing}, timestamp = {2019-02-26T13:39:50.000+0100}, title = {Similar Terms Grouping Yields Faster Terminological Saturation}, year = 2019 }

BibSonomy

Similar Terms Grouping Yields Faster Terminological Saturation

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on