Damián H. Zanette, Marcelo A. Montemurro
We investigate the origin of Zipf's law for words in written texts by means of a stochastic dynamical model for text generation. The model incorporates both features related to the general structure of languages and memory effects inherent to the production of long coherent messages in the communication process. It is shown that the multiplicative dynamics of our model leads to rank-frequency distributions in quantitative agreement with empirical data. Our results give support to the linguistic relevance of Zipf's law in human language.
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, page 44--54. New York, NY, USA, ACM, (2006)
B. Sigurbjörnsson, and R. van Zwol. WWW '08: Proceeding of the 17th international conference on World Wide Web, page 327--336. New York, NY, USA, ACM, (2008)
P. Heymann, G. Koutrika, and H. Garcia-Molina. WSDM '08: Proceedings of the international conference on Web search and web data mining, page 195--206. New York, NY, USA, ACM, (2008)