Zusammenfassung
This paper addresses the well-known classification
task of data mining, where the goal is to discover
rules predicting the class of examples (records of a
given dataset). In the context of data mining, small
disjuncts are rules covering a small number of
examples. Hence, these rules are usually error-prone,
which contributes to a decrease in predictive accuracy.
At first glance, this is not a serious problem, since
the impact on predictive accuracy should be small.
However, although each small-disjunct covers few
examples, the set of all small disjuncts can cover a
large number of examples. This paper presents evidence
that this is the case in several datasets. This paper
also addresses the problem of small disjuncts by using
a hybrid decision-tree/genetic-algorithm approach. In
essence, examples belonging to large disjuncts are
classified by rules produced by a decision-tree
algorithm (C4.5), while examples belonging to small
disjuncts are classified by a genetic-algorithm
specifically designed for discovering small-disjunct
rules. We present results comparing the predictive
accuracy of this hybrid system with the prediction
accuracy of three versions of C4.5 alone in eight
public domain datasets. Overall, the results show that
our hybrid system achieves better predictive accuracy
than all three versions of C4.5 alone.
Nutzer