- Zipf's Law and Miller's Random-Monkey Model JSTOR: The American Journal of Psychology: Vol. 81, No. 2 (Jun., 1968), pp. 269-272
- List of publications
- Nature 179, 595 (16 March 1957); doi:10.1038/179595a0 Distribution of Word Frequencies I. J. GOOD 25 Scott House, Princess Elizabeth Way, Cheltenh...Nature 179, 595 (16 March 1957); doi:10.1038/179595a0 Distribution of Word Frequencies I. J. GOOD 25 Scott House, Princess Elizabeth Way, Cheltenham. THE purpose of this communication is to explain, in terms of the theory of information, the implications of the Zipf distribution of word frequencies1. The distribution is formally identical with the Pareto income and Willis taxonomic distributions, but the present discussion is restricted to word frequencies. The discussion resembles that of Mandelbrot2 but is simpler. The discussion by Parker-Rhodes and Joyce3 also resembles Mandelbrot's, but is fallacious.
- Letters to Nature Nature 178, 1308 (08 December 1956); doi:10.1038/1781308a0 A Theory of Word-Frequency Distribution A. F. PARKER-RHODES & T. JOYCE ...Letters to Nature Nature 178, 1308 (08 December 1956); doi:10.1038/1781308a0 A Theory of Word-Frequency Distribution A. F. PARKER-RHODES & T. JOYCE Cambridge Language Research Unit, 20 Millington Road, Cambridge. THE object of this communication is to show that a certain remarkably simple experimental relation governing word-frequencies in language can be explained by a simple model of the process of searching for information, about each word heard or read, in the memory of words employed in the language faculty.
- Cover, T. King, R. Abstract In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on...Cover, T. King, R. Abstract In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text.....
- Bibliography on Zipf's law
- Dynamics of Text Generation with Realistic Zipf's Distribution - Journal of Quantitative LinguisticsDamián Zanette Marcelo Montemurro Abstract We investigate the origin of Zipf's law for words in written texts by means of a stochastic dynamic model fo...Damián Zanette Marcelo Montemurro Abstract We investigate the origin of Zipf's law for words in written texts by means of a stochastic dynamic model for text generation. The model incorporates both features related to the general structure of languages and memory effects inherent to the production of long coherent messages in the communication process. It is shown that the multiplicative dynamics of our model lead to rank-frequency distributions in quantitative agreement with empirical data. Our results give support to the linguistic relevance of Zipf's law in human language.
- Abstracts of contents
- Le Quan Ha Queen's University of Belfast, Belfast, Northern Ireland E. I. Sicilia-Garcia Ji Ming F. J. Smith Zipf's law states that the frequency ...Le Quan Ha Queen's University of Belfast, Belfast, Northern Ireland E. I. Sicilia-Garcia Ji Ming F. J. Smith Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for n-gram word phrases as well as for single words. The law for single words is shown to be valid only for high frequency words. However, when single word and n-gram phrases are combined together in one list and put in order of frequency the combined list follows Zipf's law accurately for all words and phrases, down to the lowest frequencies in both languages. The Zipf curves for the two languages are then almost identical.
- R. Bailón-Moreno1 Contact Information, E. Jurado-Alameda2, R. Ruiz-Baños3 and J. P. Courtial4 Summary The bibliometric laws of Zipf, Bradfo...R. Bailón-Moreno1 Contact Information, E. Jurado-Alameda2, R. Ruiz-Baños3 and J. P. Courtial4 Summary The bibliometric laws of Zipf, Bradford, and Lotka, in their various mathematical expressions, frequently present difficulties in the fitting of empirical values. The empirical flaws of fit take place in the frequency of the words, in the productivity of the authors and the journals, as well as in econometric and demographic aspects. This indicates that the underlying fractal model should be revised, since, as shown, the inverse power equations (of the Zipf-Mandelbrot type) are not adequate, as they need to include exponential terms. These modifications not only affect Bibliometrics and Scientometrics, but also, for the generality of the fractal model, apply to Economy, Demography, and even Natural Sciences in general.
- Maximum likelihood estimation for constrained parameters of multinomial distributions—Application to Zipf–Mandelbrot models F. Izsáka, b, Corresponding ...Maximum likelihood estimation for constrained parameters of multinomial distributions—Application to Zipf–Mandelbrot models F. Izsáka, b, Corresponding Author Contact Information, E-mail The Corresponding Author aELTE, Institute of Mathematics, P.O. Box 120, 1518 Budapest, Hungary bUniversity of Twente, EWI, P.O. Box 217, 7500 AE Enschede, Netherlands Received 3 June 2005; revised 10 May 2006; accepted 11 May 2006. Available online 12 June 2006. Abstract A numerical maximum likelihood (ML) estimation procedure is developed for the constrained parameters of multinomial distributions. The main difficulty involved in computing the likelihood function is the precise and fast determination of the multinomial coefficients. For this the coefficients are rewritten into a telescopic product. The presented method is applied to the ML estimation of the Zipf–Mandelbrot (ZM) distribution, which provides a true model in many real-life cases. The examples discussed arise from ecological and medical observations. Based on the estimates, the hypothesis that the data is ZM distributed is tested using a chi-square test. The computer code of the presented procedure is available on request by the author.
- Journal of Information Science, Vol. 19, No. 4, 247-257 (1993) DOI: 10.1177/016555159301900401 © 1993 Chartered Institute of Library and Information Prof...Journal of Information Science, Vol. 19, No. 4, 247-257 (1993) DOI: 10.1177/016555159301900401 © 1993 Chartered Institute of Library and Information Professionals An analysis of Zipf-Mandelbrot language measures and their application to artificial languages Charles T. Meadow Faculty of Library and Information Science, University of Toronto, Toronto, Ontario, Canada Jiabin Wang Faculty of Library and Information Science, University of Toronto, Toronto, Ontario, Canada Manal Stamboulie Faculty of Library and Information Science, University of Toronto, Toronto, Ontario, Canada Studies of word frequency distributions have been used in linguisties for some time and are frequently used in information socnce research for such purposes as predicting numbers of key words or computing the significance of a word by its frequency of occurrence In this paper we provide a historical review of some of the developments in a particular aspect of word frequency analysis known as Zipf's Law but in fact first explicitly formulated by E U Condon, and later modified by B Mandelbrot. We present an exploratory analy sis of the use of Mandelbrot's parameters in discriminating among languages and language usage Some suggestions are made for using these parameters to characterize artificial (command) languages or the manner of use of these languages by different groups, for the purpose of enabling a computer interface to respond to users in a manner suited to their backgrounds or skills.
- by Eytan Adar, Li Zhang, Lada A. Adamic, Rajan M. Lukose
- Proceedings of the International Workshop on Modeling Social Media, page 8:1--8:4. New York, NY, USA, ACM, (2010)
- Mining Social Data MSoDa Workshop Proceedings, page 26-30. ECAI 2008, (July 2008)
- 4th International Workshop on Text-Based Information Retrieval (2007)
- Worth Publishers, New York, 5th edition edition, (2000)
- Proceedings of the 16th nternational World Wide Web Conference WWW'07, New York, NY, USA, ACM Press, (2007)
- (1951)
- Journal of Quantitative Linguistics 12(1):29-40 (2005)
- Fractals (2002)
- Biometrika 42(3/4):425 (1955)
- Glottometrics (2002)
- Glottometrics (2002)
- PNAS 788-791 100(3):788--791 (2003)
- American Journal of Psychology (1957)
- (2006)
- Jorunal of Quantitative Linguistics (1996)
- Glottometrics (2002)
- Communication theory (1953)
- Journal of General Psychology 1945(33):251--256 (1945)
- Addison-Wesley, Reading MA USA, (1949)
- Institut Stenographique de France (1916)


user