On November 20, 1924, French American mathematician Benoite B. Mandelbrot was born. Mandelbrot worked on a wide range of mathematical problems, including mathematical physics and quantitative finance, but is best known as the popularizer of fractal geometry. He was the one who coined the term 'fractal' and described the Mandelbrot set named after him.
The original Mandelbrot is an amazing object that has captured the public's imagination for 30 years with its cascading patterns and hypnotically colourful detail. It's known as a 'fractal' - a type of shape that yields (sometimes elaborate) detail forever, no matter how far you 'zoom' into it (think of the trunk of a tree sprouting branches, which in turn split off into smaller branches, which themselves yield twigs etc.).
It's found by following a relatively simple math formula. But in the end, it's still only 2D and flat - there's no depth, shadows, perspective, or light sourcing. What we have featured in this article is a potential 3D version of the same fractal.
Nature 179, 595 (16 March 1957); doi:10.1038/179595a0
Distribution of Word Frequencies
I. J. GOOD
25 Scott House, Princess Elizabeth Way, Cheltenham.
THE purpose of this communication is to explain, in terms of the theory of information, the implications of the Zipf distribution of word frequencies1. The distribution is formally identical with the Pareto income and Willis taxonomic distributions, but the present discussion is restricted to word frequencies. The discussion resembles that of Mandelbrot2 but is simpler. The discussion by Parker-Rhodes and Joyce3 also resembles Mandelbrot's, but is fallacious.
Letters to Nature
Nature 178, 1308 (08 December 1956); doi:10.1038/1781308a0
A Theory of Word-Frequency Distribution
A. F. PARKER-RHODES & T. JOYCE
Cambridge Language Research Unit, 20 Millington Road, Cambridge.
THE object of this communication is to show that a certain remarkably simple experimental relation governing word-frequencies in language can be explained by a simple model of the process of searching for information, about each word heard or read, in the memory of words employed in the language faculty.
Cover, T. King, R.
Abstract
In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text.....
Damián Zanette Marcelo Montemurro
Abstract
We investigate the origin of Zipf's law for words in written texts by means of a stochastic dynamic model for text generation. The model incorporates both features related to the general structure of languages and memory effects inherent to the production of long coherent messages in the communication process. It is shown that the multiplicative dynamics of our model lead to rank-frequency distributions in quantitative agreement with empirical data. Our results give support to the linguistic relevance of Zipf's law in human language.
Le Quan Ha Queen's University of Belfast, Belfast, Northern Ireland
E. I. Sicilia-Garcia Ji Ming F. J. Smith
Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for n-gram word phrases as well as for single words. The law for single words is shown to be valid only for high frequency words. However, when single word and n-gram phrases are combined together in one list and put in order of frequency the combined list follows Zipf's law accurately for all words and phrases, down to the lowest frequencies in both languages. The Zipf curves for the two languages are then almost identical.