I'm not sure what these vectors are, since BERT does not generate meaningful sentence vectors. It seems that this is is doing average pooling over the word tokens to get a sentence vector, but we never suggested that this will generate meaningful sentence representations. And even if they are decent representations when fed into a DNN trained for a downstream task, it doesn't mean that they will be meaningful in terms of cosine distance. (Since cosine distance is a linear space where all dimensions are weighted equally).
V. Perri, L. Qarkaxhija, A. Zehe, A. Hotho, and I. Scholtes. Proceedings of the Computational Humanities Research Conference 2022, CHR 2022, Antwerp, Belgium, December 12-14, 2022, volume 3290 of CEUR Workshop Proceedings, page 291--317. CEUR-WS.org, (2022)
V. Perri, L. Qarkaxhija, A. Zehe, A. Hotho, and I. Scholtes. Proceedings of the Computational Humanities Research Conference 2022, CHR 2022, Antwerp, Belgium, December 12-14, 2022, volume 3290 of CEUR Workshop Proceedings, page 291--317. CEUR-WS.org, (2022)
I. Roesiger, S. Schulz, and N. Reiter. Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, page 129--138. Santa Fe, New Mexico, Association for Computational Linguistics, (August 2018)
S. Jänicke, T. Efer, M. Büchler, and G. Scheuermann. Computer Vision, Imaging and Computer Graphics - Theory and Applications, page 153--171. Cham, Springer International Publishing, (2015)