Аннотация
One of the basic problems in knowledge discovery in databases (KDD) is
the following: given a data set r, a class L of sentences for defining
subgroups of r, and a selection predicate, find all sentences of L
deemed interesting by the selection predicate. We analyze the simple
levelwise algorithm for finding all such descriptions. We give bounds
for the number of database accesses that the algorithm makes. For
this, we introduce the concept of the border of a theory, a notion
that turns out to be surprisingly powerful in analyzing the algorithm.
We also consider the verification problem of a KDD process: given r
and a set of sentences S ⊆ L, determine whether S is exactly the set
of interesting statements about r. We show strong connections between
the verification problem and the hypergraph transversal problem. The
verification problem arises in a natural way when using sampling to
speed up the pattern discovery step in KDD. Keywords: theory of
knowledge discovery, association rules, episodes, integrity
constraints, hypergraph transversals
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)