T. Hanika, F. Schneider, und G. Stumme. Accepted for publication in: Tohoku Mathematical Journal, (2020)cite arxiv:1801.07985Comment: v2: completely rewritten 28 pages, 3 figures, 2 tables.
Zusammenfassung
The curse of dimensionality is a phenomenon frequently observed in machine
learning (ML) and knowledge discovery (KD). There is a large body of literature
investigating its origin and impact, using methods from mathematics as well as
from computer science. Among the mathematical insights into data
dimensionality, there is an intimate link between the dimension curse and the
phenomenon of measure concentration, which makes the former accessible to
methods of geometric analysis. The present work provides a comprehensive study
of the intrinsic geometry of a data set, based on Gromov's metric measure
geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we
define a concept of geometric data set and introduce a metric as well as a
partial order on the set of isomorphism classes of such data sets. Based on
these objects, we propose and investigate an axiomatic approach to the
intrinsic dimension of geometric data sets and establish a concrete dimension
function with the desired properties. Our mathematical model for data sets and
their intrinsic dimension is computationally feasible and, moreover, adaptable
to specific ML/KD-algorithms, as illustrated by various experiments.
%0 Journal Article
%1 hanika2018intrinsic
%A Hanika, Tom
%A Schneider, Friedrich Martin
%A Stumme, Gerd
%D 2020
%J Accepted for publication in: Tohoku Mathematical Journal
%K 2020 data fca geometry itegpub kde kdepub mm-space myown publist
%T Intrinsic Dimension of Geometric Data Sets
%U http://arxiv.org/abs/1801.07985
%X The curse of dimensionality is a phenomenon frequently observed in machine
learning (ML) and knowledge discovery (KD). There is a large body of literature
investigating its origin and impact, using methods from mathematics as well as
from computer science. Among the mathematical insights into data
dimensionality, there is an intimate link between the dimension curse and the
phenomenon of measure concentration, which makes the former accessible to
methods of geometric analysis. The present work provides a comprehensive study
of the intrinsic geometry of a data set, based on Gromov's metric measure
geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we
define a concept of geometric data set and introduce a metric as well as a
partial order on the set of isomorphism classes of such data sets. Based on
these objects, we propose and investigate an axiomatic approach to the
intrinsic dimension of geometric data sets and establish a concrete dimension
function with the desired properties. Our mathematical model for data sets and
their intrinsic dimension is computationally feasible and, moreover, adaptable
to specific ML/KD-algorithms, as illustrated by various experiments.
@article{hanika2018intrinsic,
abstract = {The curse of dimensionality is a phenomenon frequently observed in machine
learning (ML) and knowledge discovery (KD). There is a large body of literature
investigating its origin and impact, using methods from mathematics as well as
from computer science. Among the mathematical insights into data
dimensionality, there is an intimate link between the dimension curse and the
phenomenon of measure concentration, which makes the former accessible to
methods of geometric analysis. The present work provides a comprehensive study
of the intrinsic geometry of a data set, based on Gromov's metric measure
geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we
define a concept of geometric data set and introduce a metric as well as a
partial order on the set of isomorphism classes of such data sets. Based on
these objects, we propose and investigate an axiomatic approach to the
intrinsic dimension of geometric data sets and establish a concrete dimension
function with the desired properties. Our mathematical model for data sets and
their intrinsic dimension is computationally feasible and, moreover, adaptable
to specific ML/KD-algorithms, as illustrated by various experiments.},
added-at = {2020-12-14T09:36:43.000+0100},
author = {Hanika, Tom and Schneider, Friedrich Martin and Stumme, Gerd},
biburl = {https://www.bibsonomy.org/bibtex/23be366138968b0b1762585fe1aa7aecc/stumme},
description = {Intrinsic Dimension of Geometric Data Sets},
interhash = {db5b3f0b7d5f5851b3b00c1756fc6352},
intrahash = {3be366138968b0b1762585fe1aa7aecc},
journal = {Accepted for publication in: Tohoku Mathematical Journal},
keywords = {2020 data fca geometry itegpub kde kdepub mm-space myown publist},
note = {cite arxiv:1801.07985Comment: v2: completely rewritten 28 pages, 3 figures, 2 tables},
timestamp = {2022-02-15T13:05:27.000+0100},
title = {Intrinsic Dimension of Geometric Data Sets},
url = {http://arxiv.org/abs/1801.07985},
year = 2020
}