Abstract
Several learning tasks comprise hierarchies. Comparison with a "goldstandard" is often performed to evaluate the quality of a learned hierarchy. We assembled various similarity metrics that have been proposed in different disciplines and compared them in a unified interdisciplinary framework for hierarchical evaluation which is based on the distinction of three fundamental dimensions. Identifying deficiencies for measuring structural similarity, we suggest three new measures for this purpose, either extending existing ones or based on new ideas. Experiments with an artificial dataset were performed to compare the different measures. As shown by our results, the measures vary greatly in their properties.
Users
Please
log in to take part in the discussion (add own reviews or comments).