Abstract
We present a new quartet heuristic for hierarchical
clustering from a given distance matrix. We determine a
dendrogram (ternary tree) by a new quartet method and a
fast heuristic to implement it. We do not assume that
there is a true ternary tree that generated the
distances and which we with to recover as closely as
possible. Our aim is to model the distance matrix as
faithfully as possible by the dendrogram. Our algorithm
is essentially randomised hill-climbing, using
parallelised Genetic Programming, where undirected
trees evolve in a random walk driven by a prescribed
fitness function. Our method is capable of handling up
to 60--80 objects in a matter of hours, while no
existing quartet heuristic can directly compute a
quartet tree of more than about 20--30 objects without
running for years. The method is implemented and
available as public software at www.complearn.org. We
present applications in many areas like music,
literature, bird-flu (H5N1) virus clustering, and
automatic meaning discovery using Google.
Users
Please
log in to take part in the discussion (add own reviews or comments).