Abstract

Clusters of normalized title-words in two sets of patent data in the food-sector (from 1985 and 1989, respectively) are analyzed in terms of their underlying document and word structures. The clusters were generated by using the system LEXIMAPPE of the Paris School of Mines. Both input and output data were kindly made available for validation purposes. Analysis of the data shows that the centrality and the density of the clusters produced by LEXIMAPPE are primarily dependent on the number of word occurrences in the corresponding parts of the input matrix. While the clusters are kept approximately equal in terms of the number of words (with a maximum of 10), they vary widely in terms of the number of word occurrences in the underlying document sets. Centrality and density vary correspondingly. The contribution of the smallest cluster to the reduction of uncertainty in the prediction of the document structure is even smaller than that of 77 (other) single words. In the dynamic analysis, I found significant stability where LEXIMAPPE indicated major changes. However, like every clustering algorithm LEXIMAPPE is based on specific assumptions which may lead to specific results that cannot be simulated by using other methods. Researchers who base their results on LEXIMAPPE should be aware of the peculiarities specific to this system.

Description

SpringerLink - Zeitschriftenbeitrag

Links and resources

Tags