We observed that generally the embedding representation is very rich and information dense. For example, reducing the dimensionality of the inputs using SVD or PCA, even by 10%, generally results in worse downstream performance on specific tasks.
M. Ryabinin, S. Popov, L. Prokhorenkova, and E. Voita. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 7317--7331. Online, Association for Computational Linguistics, (November 2020)