Аннотация
This paper investigates how citation-based information
and structural content (e.g., title, abstract) can be
combined to improve classification of text documents
into predefined categories. We evaluate different
measures of similarity, five derived from the citation
structure of the collection, and three measures derived
from the structural content, and determine how they can
be fused to improve classification effectiveness. To
discover the best fusion framework, we apply Genetic
Programming (GP) techniques. Our empirical experiments
using documents from the ACM digital library and the
ACM classification scheme show that we can discover
similarity functions that work better than any evidence
in isolation and whose combined performance through a
simple majority voting is comparable to that of Support
Vector Machine classifiers.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)