Abstract
This paper investigates how citation-based information
and structural content (e.g., title, abstract) can be
combined to improve classification of text documents
into predefined categories. We evaluate different
measures of similarity, five derived from the citation
structure of the collection, and three measures derived
from the structural content, and determine how they can
be fused to improve classification effectiveness. To
discover the best fusion framework, we apply Genetic
Programming (GP) techniques. Our empirical experiments
using documents from the ACM digital library and the
ACM classification scheme show that we can discover
similarity functions that work better than any evidence
in isolation and whose combined performance through a
simple majority voting is comparable to that of Support
Vector Machine classifiers.
Users
Please
log in to take part in the discussion (add own reviews or comments).