We observed that generally the embedding representation is very rich and information dense. For example, reducing the dimensionality of the inputs using SVD or PCA, even by 10%, generally results in worse downstream performance on specific tasks.
M. Paris, and R. Jäschke. Proceedings of the 14th International Conference on Knowledge Science, Engineering and Management, volume 12816 of Lecture Notes in Artificial Intelligence, page 1--14. Springer, (2021)