copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Text document information retrieval based on concepts

A. Hamzah. Jurnal Teknologi, 4 (1): 45--51 (2011)

Abstract

The huge volume of digital information collected automatically by internet technology has caused problems in information retrieval. Finding the right information from a large collection is very difficult. The difficulty in most search engines are caused by a string matching algorithm that return a match whenever an exact occurrence of the search term is found. To address this problem and considering that the document collection is not only a collection of words but also a collection of concepts, we promote a new technique of information retrieval that is based on concepts. The difference between word-based and concept-based technique are indexing and retrieval. During indexing, this technique classifies documents into concepts extracted from the collection via clustering technique to construct concept indexing besides term indexing. During retrieval, this techniques ranks document base on a combination of term and conceptual similarity, in the formulation of doc-score = β * conceptScore + (1-β)*TermScore where β is the weight of concept score. The clustering algorithm is chosen from partitional model that linear in complexity, that is Bisecting K-Means. Two kinds of test collections, i.e. text document of news (1000 and 3000 news documents), and text document of academic articles (1000 academic abstract in information technology) were used to conduct the experiment. Performance evaluation was measured using average precision and R-precision. The results of the research showed that by setting β =0.5 to β =0.9 would improve significantly the precision of concept-based approach over the word-based only (β =0). The improvements are about 5.2\% to 8,3\% for average precision and 16.9\% to 31.5\% for R-precision.

Links and resources

BibTeX key: hamzah_text_2011
entry type: article
year: 2011
journal: Jurnal Teknologi
number: 1
pages: 45--51
volume: 4
Document: http://jurtek.akprind.ac.id/sites/default/files/45-51_amir.pdf

Cite this publication

@article{hamzah_text_2011, abstract = {The huge volume of digital information collected automatically by internet technology has caused problems in information retrieval. Finding the right information from a large collection is very difficult. The difficulty in most search engines are caused by a string matching algorithm that return a match whenever an exact occurrence of the search term is found. To address this problem and considering that the document collection is not only a collection of words but also a collection of concepts, we promote a new technique of information retrieval that is based on concepts. The difference between word-based and concept-based technique are indexing and retrieval. During indexing, this technique classifies documents into concepts extracted from the collection via clustering technique to construct concept indexing besides term indexing. During retrieval, this techniques ranks document base on a combination of term and conceptual similarity, in the formulation of doc-score = β * conceptScore + (1-β)*TermScore where β is the weight of concept score. The clustering algorithm is chosen from partitional model that linear in complexity, that is Bisecting K-Means. Two kinds of test collections, i.e. text document of news (1000 and 3000 news documents), and text document of academic articles (1000 academic abstract in information technology) were used to conduct the experiment. Performance evaluation was measured using average precision and R-precision. The results of the research showed that by setting β =0.5 to β =0.9 would improve significantly the precision of concept-based approach over the word-based only (β =0). The improvements are about 5.2\% to 8,3\% for average precision and 16.9\% to 31.5\% for R-precision.}, added-at = {2018-11-04T17:00:37.000+0100}, author = {Hamzah, Amir}, biburl = {https://www.bibsonomy.org/bibtex/22dd2440928230730e424c89ee086c99a/lepsky}, interhash = {5fccf46918a352e2088d0b2895627cdd}, intrahash = {2dd2440928230730e424c89ee086c99a}, journal = {Jurnal Teknologi}, keywords = {information_retrieval}, number = 1, pages = {45--51}, timestamp = {2018-11-07T09:14:29.000+0100}, title = {Text document information retrieval based on concepts}, url = {http://jurtek.akprind.ac.id/sites/default/files/45-51_amir.pdf}, volume = 4, year = 2011 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Text document information retrieval based on concepts

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Text document information retrieval based on concepts

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Text document information retrieval based on concepts

Comments and Reviews
(0)