Document clustering by relevant terms : an approach

C. Reyes-Peña, M. Tovar Vidal, и J. Lavalle Martínez.
Proceedings of the Future Technologies Conference (FTC) 2019, стр. 610--617. Cham, Springer International Publishing, (2020)
DOI: 10.1007/978-3-030-32520-6_44

Аннотация

In this work, a document clustering based on relevant terms into an untagged medical text corpus approach is presented. To achieve this, to create a list of documents containing each word is necessary. Then, for relevant term extraction, the frequency of each term is obtained in order to compute the word weight into the corpus and into each document. Finally, the clusters are built by mapping using main concepts from an ontology and the relevant terms (only subjects), assuming that if two words appear in the same documents these words are related. The obtained clusters have a category corresponding to ontology concepts, and they are measured with cluster from K-Means (assuming the k-Means cluster were well formed) using the Overlap Coefficient and obtaining 70\% of similarity among the clusters.

ключ BibTeX: reyes-pena_document_2020
тип записи: inproceedings
адрес: Cham
название книги: Proceedings of the Future Technologies Conference (FTC) 2019
год: 2020
страницы: 610--617
издательство: Springer International Publishing
серии: Advances in Intelligent Systems and Computing
shorttitle: Document Clustering by Relevant Terms
isbn: 978-3-030-32520-6
language: en
DOI: 10.1007/978-3-030-32520-6_44

тэги

clustering

Пользователи данного ресурса

Комментарии и рецензиипоказать / перейти в невидимый режим

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

BibSonomy