Article,

Statistical analysis of the Indus script using n-grams.

N. Yadav, H. Joglekar, R. Rao, M. Vahia, R. Adhikari, and I. Mahadevan.
PloS one, 5 (3): e9506 (January 2010)
DOI: 10.1371/journal.pone.0009506

Abstract

The Indus script is one of the major undeciphered scripts of the ancient world. The small size of the corpus, the absence of bilingual texts, and the lack of definite knowledge of the underlying language has frustrated efforts at decipherment since the discovery of the remains of the Indus civilization. Building on previous statistical approaches, we apply the tools of statistical language processing, specifically n-gram Markov chains, to analyze the syntax of the Indus script. We find that unigrams follow a Zipf-Mandelbrot distribution. Text beginner and ender distributions are unequal, providing internal evidence for syntax. We see clear evidence of strong bigram correlations and extract significant pairs and triplets using a log-likelihood measure of association. Highly frequent pairs and triplets are not always highly significant. The model performance is evaluated using information-theoretic measures and cross-validation. The model can restore doubtfully read texts with an accuracy of about 75\%. We find that a quadrigram Markov chain saturates information theoretic measures against a held-out corpus. Our work forms the basis for the development of a stochastic grammar which may be used to explore the syntax of the Indus script in greater detail.

BibTeX key: Yadav2010
entry type: article
year: 2010
month: jan
journal: PloS one
number: 3
pages: e9506
volume: 5
pmid: 20333254
issn: 1932-6203
DOI: 10.1371/journal.pone.0009506
url: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2841631\&tool=pmcentrez\&rendertype=abstract

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{Yadav2010, abstract = {The Indus script is one of the major undeciphered scripts of the ancient world. The small size of the corpus, the absence of bilingual texts, and the lack of definite knowledge of the underlying language has frustrated efforts at decipherment since the discovery of the remains of the Indus civilization. Building on previous statistical approaches, we apply the tools of statistical language processing, specifically n-gram Markov chains, to analyze the syntax of the Indus script. We find that unigrams follow a Zipf-Mandelbrot distribution. Text beginner and ender distributions are unequal, providing internal evidence for syntax. We see clear evidence of strong bigram correlations and extract significant pairs and triplets using a log-likelihood measure of association. Highly frequent pairs and triplets are not always highly significant. The model performance is evaluated using information-theoretic measures and cross-validation. The model can restore doubtfully read texts with an accuracy of about 75\%. We find that a quadrigram Markov chain saturates information theoretic measures against a held-out corpus. Our work forms the basis for the development of a stochastic grammar which may be used to explore the syntax of the Indus script in greater detail.}, added-at = {2011-03-27T17:20:41.000+0200}, author = {Yadav, Nisha and Joglekar, Hrishikesh and Rao, Rajesh P N and Vahia, Mayank N and Adhikari, Ronojoy and Mahadevan, Iravatham}, biburl = {https://www.bibsonomy.org/bibtex/262cee40de56de6a96d825b083a6360f6/yevb0}, doi = {10.1371/journal.pone.0009506}, interhash = {7c2787f0401ef803d53d1e421272c7de}, intrahash = {62cee40de56de6a96d825b083a6360f6}, issn = {1932-6203}, journal = {PloS one}, keywords = {imported}, month = jan, number = 3, pages = {e9506}, pmid = {20333254}, timestamp = {2011-03-27T17:21:14.000+0200}, title = {Statistical analysis of the Indus script using n-grams.}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2841631\&tool=pmcentrez\&rendertype=abstract}, volume = 5, year = 2010 }

BibSonomy

Statistical analysis of the Indus script using n-grams.

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on