Article,

RAID : robust algorithm for stemming text Document

, and .
International Journal of Computer Information Systems and Industrial Management Applications, (2016)

Abstract

In this work, we propose a robust algorithm for automatic indexing unstructured Document. It can detect the most relevant words in an unstructured document. This algorithm is based on two main modules: the first module ensures the processing of compound words and the second allows the detection of the endings of the words that have not been taken into consideration by the approaches presented in literature. The proposed algorithm allows the detection and removal of suffixes and enriches the basis of suffixes by eliminating the suffixes of compound words. We have experienced our algorithm on two bases of words: a standard collection of terms and a medical corpus. The results show the remarkable effectiveness of our algorithm compared to others presented in related works.

Tags

Users

  • @lepsky

Comments and Reviews