Comparison of semantic and single term similarity measures for clustering turkish documents
B. Yucesoy, and S. Oguducu. Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on, page 393-398. (December 2007)
DOI: 10.1109/ICMLA.2007.52
Abstract
With the rapid growth of the World Wide Web (www), it becomes a critical issue to design and organize the vast amounts of on-line documents on the web according to their topic. Even for the search engines it is very important to group similar documents in order to improve their performance when a query is submitted to the system. Clusterng is useful for taxonomy design and similarity search of documents on such a domain. Similarity is fundamental to many clustering applications on hypertext. In this paper, we will study how measures of similarity are used to cluster a collection of documents on a web site. Most of the document clustering techniques rely on single term analysis of text, such as vector space model. To better group of related documents we propose a new semantic similarity measure. We compare our measure with Wu-Palmer similarity and cosine similarity. Experimental results show that cosine similarity perform better than the semantic similarities. We demonstrate our results on Turkish documents. This is a first study that considers the semantic similarities between Turkish documents.
Description
Welcome to IEEE Xplore 2.0: Comparison of semantic and single term similarity measures for clustering turkish documents
%0 Conference Paper
%1 Yucesoy07semanticSimilarity
%A Yucesoy, B.
%A Oguducu, S.G.
%B Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
%D 2007
%K 07 Yucesoy clustering semantic similarity term turkish
%P 393-398
%R 10.1109/ICMLA.2007.52
%T Comparison of semantic and single term similarity measures for clustering turkish documents
%U http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&arnumber=4457262&isnumber=4457184
%X With the rapid growth of the World Wide Web (www), it becomes a critical issue to design and organize the vast amounts of on-line documents on the web according to their topic. Even for the search engines it is very important to group similar documents in order to improve their performance when a query is submitted to the system. Clusterng is useful for taxonomy design and similarity search of documents on such a domain. Similarity is fundamental to many clustering applications on hypertext. In this paper, we will study how measures of similarity are used to cluster a collection of documents on a web site. Most of the document clustering techniques rely on single term analysis of text, such as vector space model. To better group of related documents we propose a new semantic similarity measure. We compare our measure with Wu-Palmer similarity and cosine similarity. Experimental results show that cosine similarity perform better than the semantic similarities. We demonstrate our results on Turkish documents. This is a first study that considers the semantic similarities between Turkish documents.
@inproceedings{Yucesoy07semanticSimilarity,
abstract = {With the rapid growth of the World Wide Web (www), it becomes a critical issue to design and organize the vast amounts of on-line documents on the web according to their topic. Even for the search engines it is very important to group similar documents in order to improve their performance when a query is submitted to the system. Clusterng is useful for taxonomy design and similarity search of documents on such a domain. Similarity is fundamental to many clustering applications on hypertext. In this paper, we will study how measures of similarity are used to cluster a collection of documents on a web site. Most of the document clustering techniques rely on single term analysis of text, such as vector space model. To better group of related documents we propose a new semantic similarity measure. We compare our measure with Wu-Palmer similarity and cosine similarity. Experimental results show that cosine similarity perform better than the semantic similarities. We demonstrate our results on Turkish documents. This is a first study that considers the semantic similarities between Turkish documents.},
added-at = {2010-01-26T03:41:30.000+0100},
author = {Yucesoy, B. and Oguducu, S.G.},
biburl = {https://www.bibsonomy.org/bibtex/2b87f04dd555f54a9cba00d716ed0902b/lee_peck},
booktitle = {Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on},
description = {Welcome to IEEE Xplore 2.0: Comparison of semantic and single term similarity measures for clustering turkish documents},
doi = {10.1109/ICMLA.2007.52},
interhash = {d5a1574e478339a5de103178d0ac2e1d},
intrahash = {b87f04dd555f54a9cba00d716ed0902b},
keywords = {07 Yucesoy clustering semantic similarity term turkish},
month = {Dec.},
pages = {393-398},
timestamp = {2010-01-26T03:41:30.000+0100},
title = {Comparison of semantic and single term similarity measures for clustering turkish documents},
url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&arnumber=4457262&isnumber=4457184},
year = 2007
}