Article,

A novel semantic level text classification by combining NLP and thesaurus concepts

R. Nagaraj, V. Thiagarasu, and P. Vijayakumar.
IOSR Journal of Computer Engineering (IOSR-JCE), 16 (4): 14--26 (2014)

Full text

Abstract

Text categorization (also known as text classification or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. Automated text classification is attractive because it frees organizations from the need of manually organizing document bases, but it can be too expensive or simply not feasible given the time constraints of the application or the number of documents involved. In the previous approaches only the Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. This paper proposes a new approach to represent semantic level with the use of Word Net. The semantic weight of terms related to the concepts from Wikipedia and Word Net are used to represent semantic information. The semantic vector space model of terms by combining the Word Net and Wikipedia is being further improved the classification accuracy of the Text classification. Because of, two different concept extractor are gives the concepts related to the terms in the syntactic level o find the better concept vector space for documents. So we obtain the improved classification by using this approach. In this study the classification framework are presented. In classification framework, the primary information is effectively kept and the noise is reduced by compressing the original information, so that this framework can guarantee the quality of the input of all classifiers. This proposed method can help to further improve the performance of classification framework by introducing Wikipedia with Word Net. We find that the proposed approach result in a high classification accuracy.

BibTeX key: nagaraj_novel_2014
entry type: article
year: 2014
journal: IOSR Journal of Computer Engineering (IOSR-JCE)
number: 4
pages: 14--26
volume: 16
Document: http://www.iosrjournals.org/iosr-jce/papers/Vol16-issue4/Version-6/C016461426.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 nagaraj_novel_2014 %A Nagaraj, R %A Thiagarasu, V %A Vijayakumar, P %D 2014 %J IOSR Journal of Computer Engineering (IOSR-JCE) %K thesauri automatisches_klassifizieren %N 4 %P 14--26 %T A novel semantic level text classification by combining NLP and thesaurus concepts %U http://www.iosrjournals.org/iosr-jce/papers/Vol16-issue4/Version-6/C016461426.pdf %V 16 %X Text categorization (also known as text classification or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. Automated text classification is attractive because it frees organizations from the need of manually organizing document bases, but it can be too expensive or simply not feasible given the time constraints of the application or the number of documents involved. In the previous approaches only the Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. This paper proposes a new approach to represent semantic level with the use of Word Net. The semantic weight of terms related to the concepts from Wikipedia and Word Net are used to represent semantic information. The semantic vector space model of terms by combining the Word Net and Wikipedia is being further improved the classification accuracy of the Text classification. Because of, two different concept extractor are gives the concepts related to the terms in the syntactic level o find the better concept vector space for documents. So we obtain the improved classification by using this approach. In this study the classification framework are presented. In classification framework, the primary information is effectively kept and the noise is reduced by compressing the original information, so that this framework can guarantee the quality of the input of all classifiers. This proposed method can help to further improve the performance of classification framework by introducing Wikipedia with Word Net. We find that the proposed approach result in a high classification accuracy.

@article{nagaraj_novel_2014, abstract = {Text categorization (also known as text classification or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. Automated text classification is attractive because it frees organizations from the need of manually organizing document bases, but it can be too expensive or simply not feasible given the time constraints of the application or the number of documents involved. In the previous approaches only the Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. This paper proposes a new approach to represent semantic level with the use of Word Net. The semantic weight of terms related to the concepts from Wikipedia and Word Net are used to represent semantic information. The semantic vector space model of terms by combining the Word Net and Wikipedia is being further improved the classification accuracy of the Text classification. Because of, two different concept extractor are gives the concepts related to the terms in the syntactic level o find the better concept vector space for documents. So we obtain the improved classification by using this approach. In this study the classification framework are presented. In classification framework, the primary information is effectively kept and the noise is reduced by compressing the original information, so that this framework can guarantee the quality of the input of all classifiers. This proposed method can help to further improve the performance of classification framework by introducing Wikipedia with Word Net. We find that the proposed approach result in a high classification accuracy.}, added-at = {2018-11-04T17:02:36.000+0100}, author = {Nagaraj, R and Thiagarasu, V and Vijayakumar, P}, biburl = {https://www.bibsonomy.org/bibtex/2244b0dce2c7bd1b1c5dfec5f9123a4f6/lepsky}, interhash = {32e77586afa4a4d0015f612cc39a86d7}, intrahash = {244b0dce2c7bd1b1c5dfec5f9123a4f6}, journal = {IOSR Journal of Computer Engineering (IOSR-JCE)}, keywords = {thesauri automatisches_klassifizieren}, number = 4, pages = {14--26}, timestamp = {2018-11-06T18:02:54.000+0100}, title = {A novel semantic level text classification by combining {NLP} and thesaurus concepts}, url = {http://www.iosrjournals.org/iosr-jce/papers/Vol16-issue4/Version-6/C016461426.pdf}, volume = 16, year = 2014 }

BibSonomy

A novel semantic level text classification by combining NLP and thesaurus concepts

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on