Article,

NOVEL TEXT CATEGORIZATION BY AMALGAMATION OF AUGMENTED K-NEAREST NEIGHBOURHOODCLASSIFICATION AND K-MEDOIDS CLUSTERING

.
International Journal of Computational Science and Information Technology (IJCSITY), 1 (4): 1 - 13 (November 20-13)
DOI: 10.5121/ijcsity.2013.1406

Abstract

Machine learning for text classification is the underpinning of document cataloging, news filtering, document steering and exemplification. In text mining realm, effective feature selection is significant to make the learning task more accurate and competent. One of the traditional lazy text classifier k-Nearest Neighborhood (kNN) has a major pitfall in calculating the similarity between all the objects in training and testing sets, there by leads to exaggeration of both computational complexity of the algorithm and massive consumption of main memory. To diminish these shortcomings in viewpoint of a data-mining practitioner an amalgamative technique is proposed in this paper using a novel restructured version of kNN called AugmentedkNN(AkNN) and k-Medoids(kMdd) clustering.The proposed work comprises preprocesses on the initial training set by imposing attribute feature selection for reduction of high dimensionality, also it detects and excludes the high-fliers samples in the initial training set and restructures a constrictedtraining set. The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category and restructures the constricted training set with centroids. This technique is amalgamated with AkNN classifier that was prearranged with text mining similarity measures. Eventually, significantweights and ranks were assigned to each object in the new training set based upon their accessory towards the object in testing set. Experiments conducted on Reuters-21578 a UCI benchmark text mining data set, and comparisons with traditional kNNclassifier designates the referredmethod yieldspreeminentrecitalin both clustering and classification.

Tags

Users

  • @anderson_sam

Comments and Reviews