When a word appears in different contexts, its vector gets moved in different directions during updates. The final vector then represents some sort of weighted average over the various contexts. Averaging over vectors that point in different directions typically results in a vector that gets shorter with increasing number of different contexts in which the word appears. For words to be used in many different contexts, they must carry little meaning. Prime examples of such insignificant words are high-frequency stop words, which are indeed represented by short vectors despite their high term frequencies ...
When the downstream applications only care about the direction of the word vectors (e.g. they only pay attention to the cosine similarity of two words), then normalize, and forget about length.
However, if the downstream applications are able to (or need to) consider more sensible aspects, such as word significance, or consistency in word usage (see below), then normalization might not be such a good idea.
Facebook Research open sourced a great project recently – fastText, a fast (no surprise) and effective method to learn word representations and perform text classification. I was curious about comparing these embeddings to other commonly used embeddings, so word2vec seemed like the obvious choice, especially considering fastText embeddings are an extension of word2vec.
Q. Le, und T. Mikolov. Proceedings of the 31st International Conference on Machine Learning, Volume 32 von Proceedings of Machine Learning Research, Seite 1188--1196. Bejing, China, PMLR, (Juni 2014)
H. Aras, R. Türker, D. Geiss, M. Milbradt, und H. Sack. Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems co-located with the 14th International Conference on Semantic Systems (SEMANTiCS 2018), Vienna, Austria, September 10-13, 2018, Volume 2198 von CEUR Workshop Proceedings, CEUR-WS.org, (2018)
D. Dligach, T. Miller, C. Lin, S. Bethard, und G. Savova. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2, Seite 746--751. (2017)
G. Marco Baroni, Georgiana Dinu. 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference, (2014)
D. Dligach, T. Miller, C. Lin, S. Bethard, und G. Savova. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2, Seite 746--751. (2017)
S. Cordeiro, C. Ramisch, M. Idiart, und A. Villavicencio. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1, Seite 1986--1997. The Association for Computer Linguistics, (2016)
M. Hartung, F. Kaupmann, S. Jebbara, und P. Cimiano. Proceedings of the 15th Meeting of the European Chapter of the Association for Computational Linguistics (EACL), 1, Association for Computational Linguistics, (2017)