@danieladzeko

A Probabilistic Approach to Automatic Keyword Indexing. Part II: An Algorithm for Probabilistic Indexing.

. Journal of the American Society for Informaiton Science, 26 (4): 280-289 (1975)

Zusammenfassung

In Part I of this study,* a mixture of two Poisson distributions was examined as a model of specialty word distribution. Formulas expressing the three parameters of the model in terms of empirical frequency statistics were derived, and a statistical measure intended to identify specialty words, consistent with the model, was proposed. In the present paper, Part II of the study, a probabilistic model of keyword indexing is outlined, and some of the consequences of the model are examined. An algorithm defining a measure of indexability is developed‐a measure intended to reflect the relative significance of words in documents. The measure is evaluated and is found to consistently produce indexes superior to those produced by another measure which had previously been identified in the literature as producing the best results.

Links und Ressourcen

Tags

Community