We observed that generally the embedding representation is very rich and information dense. For example, reducing the dimensionality of the inputs using SVD or PCA, even by 10%, generally results in worse downstream performance on specific tasks.
Die Grundannahme für die Verwendung der PCA zur Clusteranalyse und Dimensionsreduktion lautet: Die Richtungen mit der größten Streuung (Varianz) beinhalten die meiste Information.
S. Cohen, K. Stratos, M. Collins, D. Foster, and L. Ungar. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), page 223--231. Jeju Island, Korea, Association for Computational Linguistics, (July 2012)