Social bookmarking systems and their emergent information structures, known as folksonomies, are increasingly important data sources for Semantic Web applications. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as navigation support, semantic search, and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures derived from established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity among tags and resources, considering different ways to aggregate annotations across users. After comparing how tag similarity measures predict user-created tag relations, we provide an external grounding by user-validated semantic proxies based on WordNet and the Open Directory. We also investigate the issue of scalability. We find that mutual information with distributional micro-aggregation across users yields the highest accuracy, but is not scalable; per-user projection with collaborative aggregation provides the best scalable approach via incremental computations. The results are consistent across resource and tag similarity.
With the Web serving as a huge worldwide data repository, issues related to data semantics (familiar to database modelers since the 1970s) have again become of paramount importance. As Web data comes from heterogeneous, possibly ...
B. Ganter, G. Stumme, and R. Wille (Eds.) volume 3626 of LNAI, Heidelberg, Springer, (2005)http://www.informatik.uni-trier.de/~ley/db/conf/fca/fca2005.html.
G. Stumme. Proc. 3rd Intl. Conf. on Formal Concept Analysis, volume 3403 of Lecture Notes in Computer Science, page 315-328. Heidelberg, Springer, (2005)
P. Cimiano, A. Hotho, G. Stumme, and J. Tane. Concept Lattices, volume 2961 of LNAI, page 189-207. Heidelberg, Second International Conference on Formal Concept Analysis, ICFCA 2004, Springer, (2004)
G. Stumme. Conceptual Structures at Work: 12th International Conference on Conceptual Structures (ICCS 2004), volume 3127 of LNCS, page 109-125. Heidelberg, Springer, (2004)
A. Hotho, S. Staab, and G. Stumme. Proceedings of the 2003 IEEE International Conference on Data Mining, page 541-544 (Poster. Melbourne, Florida, IEEE Computer Society, (November 2003)