Article,

Farsi lexical analysis and stop word list

, , and .
Library Hi Tech, (2009)

Abstract

Purpose GCô The purpose of this article is to present an aggregated methodology for construction of the stop word list in Farsi language and generate a generic Farsi stop word list. Design/methodology/approach GCô The stop word list is extracted based on: syntactic classes, domain dependent, corpus statistic and expert judgments. Some of the main challenges that arise in the Farsi automatic text processing are outlined as well. Findings GCô Results from the techniques are aggregated and a general Farsi stop word list containing 927 words is generated. Practical implications GCô The created stop word list can affect the efficiency and effectiveness of retrieval and indexing process in Farsi information retrieval system, moreover, it can play an important role during Farsi text segmentation. Originality/value GCô Our stop word extraction algorithm is a promising technique; it could be applied into other languages that they have ambiguities in automatic text segmentation.

Tags

Users

  • @sofiagruiz92
  • @dblp

Comments and Reviews