Abstract
This paper shows how different measures of similarity
derived from the citation information and the
structural content (e.g., title, abstract) of the
collection can be fused to improve classification
effectiveness. To discover the best fusion framework,
we apply Genetic Programming (GP) techniques. Our
experiments with the ACM Computing Classification
Scheme, using documents from the ACM Digital Library,
indicate that GP can discover similarity functions
superior to those based solely on a single type of
evidence. Effectiveness of the similarity functions
discovered through simple majority voting is better
than that of content-based as well as combination-based
Support Vector Machine classifiers. Experiments also
were conducted to compare the performance between GP
techniques and other fusion techniques such as Genetic
Algorithms (GA) and linear fusion. Empirical results
show that GP was able to discover better similarity
functions than other fusion techniques.
Users
Please
log in to take part in the discussion (add own reviews or comments).