Formal Concept Analysis and Tag Recommendations in Collaborative Tagging Systems
R. Jäschke. Dissertationen zur Künstlichen Intelligenz Akademische Verlagsgesellschaft AKA, Heidelberg, Germany, (January 2011)
One of the most noticeable innovation that emerged with the advent of the Web 2.0 and the focal point of this thesis are collaborative tagging systems. They allow users to annotate arbitrary resources with freely chosen keywords, so called tags. The tags are used for navigation, finding resources, and serendipitous browsing and thus provide an immediate benefit for the user. By now, several systems for tagging photos, web links, publication references, videos, etc. have attracted millions of users which in turn annotated countless resources. Tagging gained so much popularity that it spread into other applications like web browsers, software packet managers, and even file systems. Therefore, the relevance of the methods presented in this thesis goes beyond the Web 2.0. The conceptual structure underlying collaborative tagging systems is called folksonomy. It can be represented as a tripartite hypergraph with user, tag, and resource nodes. Each edge of the graph expresses the fact that a user annotated a resource with a tag. This social network constitutes a lightweight conceptual structure that is not formalized, but rather implicit and thus needs to be extracted with knowledge discovery methods. In this thesis a new data mining task – the mining of all frequent tri-concepts – is presented, together with an efficient algorithm for discovering such implicit shared conceptualizations. Our approach extends the data mining task of discovering all closed itemsets to three-dimensional data structures to allow for mining folksonomies. Extending the theory of triadic Formal Concept Analysis, we provide a formal definition of the problem, and present an efficient algorithm for its solution. We show the applicability of our approach on three large real-world examples and thereby perform a conceptual clustering of two collaborative tagging systems. Finally, we introduce neighborhoods of triadic concepts as basis for a lightweight visualization of tri-lattices. The social bookmark and publication sharing system BibSonomy, which is currently among the three most popular systems of its kind, has been developed by our research group. Besides being a useful tool for many scientists, it provides interested researchers a basis for the evaluation and integration of their knowledge discovery methods. This thesis introduces BibSonomy as an exemplary collaborative tagging system and gives an overview of its architecture and some of its features. Furthermore, BibSonomy is used as foundation for evaluating and integrating some of the discussed approaches. Collaborative tagging systems usually include tag recommendation mechanisms easing the process of finding good tags for a resource, but also consolidating the tag vocabulary across users. In this thesis we evaluate and compare several recommendation algorithms on large-scale real-world datasets: an adaptation of user-based Collaborative Filtering, a graph-based recommender built on top of the FolkRank algorithm, and simple methods based on counting tag co-occurences. We show that both FolkRank and Collaborative Filtering provide better results than non-personalized baseline methods. Moreover, since methods based on counting tag co-occurrences are computationally cheap, and thus usually preferable for real time scenarios, we discuss simple approaches for improving the performance of such methods. We demonstrate how a simple recommender based on counting tags from users and resources can perform almost as good as the best recommender. Furthermore, we show how to integrate recommendation methods into a real tagging system, record and evaluate their performance by describing the tag recommendation framework we developed for BibSonomy. With the intention to develop, test, and evaluate recommendation algorithms and supporting cooperation with researchers, we designed the framework to be easily extensible, open for a variety of methods, and usable independent from BibSonomy. We also present an evaluation of the framework which demonstrates its power. The folksonomy graph shows specific structural properties that explain its growth and the possibility of serendipitous exploration. Clicklogs of web search engines can be represented as a folksonomy in which queries are descriptions of clicked URLs. The resulting network structure, which we will term logsonomy is very similar to the one of folksonomies. In order to find out about its properties, we analyze the topological characteristics of the tripartite hypergraph of queries, users and bookmarks on a large folksonomy snapshot and on query logs of two large search engines. We find that all of the three datasets exhibit similar structural properties and thus conclude that the clicking behaviour of search engine users based on the displayed search results and the tagging behaviour of collaborative tagging users is driven by similar dynamics. In this thesis we further transfer the folksonomy paradigm to the Social Semantic Desktop – a new model of computer desktop that uses Semantic Web technologies to better link information items. There we apply community support methods to the folksonomy found in the network of social semantic desktops. Thus, we connect knowledge discovery for folksonomies with semantic technologies. Alltogether, the research in this thesis is centered around collaborative tagging systems and their underlying datastructure – folksonomies – and thereby paves the way for the further dissemination of this successful knowledge management paradigm.