Zusammenfassung
The curse of dimensionality is a phenomenon frequently observed in machine
learning (ML) and knowledge discovery (KD). There is a large body of literature
investigating its origin and impact, using methods from mathematics as well as
from computer science. Among the mathematical insights into data
dimensionality, there is an intimate link between the dimension curse and the
phenomenon of measure concentration, which makes the former accessible to
methods of geometric analysis. The present work provides a comprehensive study
of the intrinsic geometry of a data set, based on Gromov's metric measure
geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we
define a concept of geometric data set and introduce a metric as well as a
partial order on the set of isomorphism classes of such data sets. Based on
these objects, we propose and investigate an axiomatic approach to the
intrinsic dimension of geometric data sets and establish a concrete dimension
function with the desired properties. Our mathematical model for data sets and
their intrinsic dimension is computationally feasible and, moreover, adaptable
to specific ML/KD-algorithms, as illustrated by various experiments.
Nutzer