An algorithm for the principal component analysis of large data sets

Аннотация

Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy --- even on parallel processors --- unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently öut-of-core.") We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.

ключ BibTeX: halko2010algorithm
тип записи: misc
год: 2010
url: http://arxiv.org/abs/1007.5510
Примечание: cite arxiv:1007.5510Comment: 17 pages, 3 figures (each with 2 or 3 subfigures), 2 tables (each with 2 subtables)

тэги

Пользователи данного ресурса

Комментарии и рецензиипоказать / перейти в невидимый режим

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

BibSonomy