bookmarks  36

  •  

    With the advent of Web 2.0, Social Computing has emerged as one of the hot research topics recently. Social Computing involves the investigation of collective intelligence by using computational techniques such as machine learning, data mining, natural language processing, etc. on social behavior data collected from blogs, wikis, clickthrough data, query logs, tags, etc. from areas such as social networks, social search, social media, social bookmarks, social news, social knowledge sharing, and social games. In this tutorial, we will introduce Social Computing and elaborate on how the various characteristics and aspects are involved in the social platforms for collective intelligence. Moreover, we will discuss the challenging issues involved in Social Computing in the context of theory and models of social networks, mining of spatial and temporal social information, ways to deal with partial and incomplete information in the social context, scalability and algorithmic issues with social computational techniques, and security & privacy issues. Some potential social computing applications for discussion would include collaborative filtering, query log processing, learning to rank, large graph and link algorithms, etc.
    15 years ago by @dbenz
     
     
  •  

     
  •  

     
  •  

     
  •  

    Fabrizio Silvestri, ISTI - CNR Ricardo Baeza-Yates, Yahoo! Research Abstract Web Search Engines have stored in their logs information about users since they started to operate. This information often serves many purposes. The primary focus of this tutorial is to introduce to the discipline of query mining by showing its foundations and by analyzing the basic algorithms and techniques that could be used to extract and to exploit useful knowledge from this (potentially) infinite source of information. Furthermore, participants to this tutorial will be given a unified view on the literature on query log analysis.
    15 years ago by @dbenz
     
     
  •  

    Rich media social networks promote not only creation and consumption of media, but also communication about the posted media item. What causes a conversation to be interesting, that prompts a user to participate in the discussion on a posted video? We conjecture that people participate in conversations when they find the conversation theme interesting, see comments by people whom they are familiar with, or observe an engaging dialogue between two or more people (absorbing back and forth exchange of comments). Importantly, a conversation that is interesting must be consequential – i.e. it must impact the social network itself. Our framework has three parts. First, we detect conversational themes using a mixture model approach. Second, we determine interestingness of participants and interestingness of conversations based on a random walk model. Third, we measure the consequence of a conversation by measuring how interestingness affects the following three variables – participation in related themes, participant cohesiveness and theme diffusion. We have conducted extensive experiments using a dataset from the popular video sharing site, YouTube. Our results show that our method of interestingness maximizes the mutual information, and is significantly better (twice as large) than three other baseline methods (number of comments, number of new participants and PageRank based assessment). create (e.g. upload photo on Flickr), and consume media (e.g. watch a video on YouTube). These websites also allow for significant communication between the users – such as comments by one user on a media uploaded by another. These comments reveal a rich dialogue structure (user A comments on the upload, user B comments on the upload, A comments in response to B’s comment, B responds to A’s comment etc.) between users, where the discussion is often about themes unrelated to the original video. Example of a conversation from YouTube [1] is shown in Figure 1. In this paper, the sequence of comments on a media object is referred to as a conversation. Note the theme of the conversation is latent and depends on the content of the conversation. The fundamental idea explored in this paper is that analysis of communication activity is crucial to understanding repeated visits to a rich media social networking site. People return to a video post that they have already seen and post further comments (say in YouTube) in response to the communication activity, rather than to watch the video again. Thus it is the content of the communication activity itself that the people want to read (or see, if the response to a video post is another video, as is possible in the case of YouTube). Furthermore, these rich media sites have notification mechanisms that alert users of new comments on a video post / image upload promoting this communication activity.
    15 years ago by @dbenz
     
     
  •  

    Due to the reliance on the textual information associated with an image, image search engines on the Web lack the discriminative power to deliver visually diverse search results. The textual descriptions are key to retrieve relevant results for a given user query, but at the same time provide little information about the rich image content. In this paper we investigate three methods for visual diversification of image search results. The methods deploy lightweight clustering techniques in combination with a dynamic weighting function of the visual features, to best capture the discriminative aspects of the resulting set of images that is retrieved. A representative image is selected from each cluster, which together form a diverse result set. Based on a performance evaluation we find that the outcome of the methods closely resembles human perception of diversity, which was established in an extensive clustering experiment carried out by human assessors. models deployed on the Web and by these photo sharing sites rely heavily on search paradigms developed within the field Information Retrieval. This way, image retrieval can benefit from years of research experience, and the better this textual metadata captures the content of the image, the better the retrieval performance will be. It is also commonly acknowledged that a picture has to be seen to fully understand its meaning, significance, beauty, or context, simply because it conveys information that words can not capture, or at least not in any practical setting. This explains the large number of papers on content-based image retrieval (CBIR) that has been published since 1990, the breathtaking publication rates since 1997 [12], and the continuing interest in the field [4]. Moving on from simple low-level features to more discriminative descriptions, the field has come a long way in narrowing down the semantic gap by using high-level semantics [8]. Unfortunately, CBIR-methods using higher level semantics usually require extensive training, intricate object ontologies or expensive construction of a visual dictionary, and their performance remains unfit for use in large scale online applications such as the aforementioned search engines or websites. Consequently, retrieval models operating in the textual metadata domain are therefore deployed here. In these applications, image search results are usually displayed in a ranked list. This ranking reflects the similarity of the image’s metadata to the textual query, according to the textual retrieval model of choice. There may exist two problems with this ranking. First, it may be lacking visual diversity. For instance, when a specific type or brand of car is issued as query, it may very well be that the top of this ranking displays many times the same picture that was released by the marketing division of the company. Similarly, pictures of a popular holiday destination tend to show the same touristic hot spot, often taken from the same angle and distance. This absence of visual diversity is due to the nature of the image annotation, which does not allow or motivate people to adequately describe the visual content of an image. Second, the query may have several aspects to it that are not sufficiently covered by the ranking. Perhaps the user is interested in a particular aspect of the query, but doesn’t know how to express this explicitly and issues a broader, more general query. It could also be that a query yields so many different results, that it’s hard to get an overview of the collection of relevant images in the database.
    15 years ago by @dbenz
     
     
  •  

     
  •  

    In this paper we present Triplify – a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relational database queries. Triplify transforms the resulting relations into RDF statements and publishes the data on the Web in various RDF serializations, in particular as Linked Data. The rationale for developing Triplify is that the largest part of information on the Web is already stored in structured form, often as data contained in relational databases, but usually published by Web applications only as HTML mixing structure, layout and content. In order to reveal the pure structured information behind the current Web, we have implemented Triplify as a light-weight software component, which can be easily integrated into and deployed by the numerous, widely installed Web applications. Our approach includes a method for publishing update logs to enable incremental crawling of linked data sources. Triplify is complemented by a library of configurations for common relational schemata and a REST-enabled data source registry. Triplify configurations containing mappings are provided for many popular Web applications, including osCommerce, WordPress, Drupal, Gallery, and phpBB. We will show that despite its light-weight architecture Triplify is usable to publish very large datasets, such as 160GB of geo data from the OpenStreetMap project.
    15 years ago by @dbenz
     
     
  •  

    We analyse the corpus of user relationships of the Slash- dot technology news site. The data was collected from the Slashdot Zoo feature where users of the website can tag other users as friends and foes, providing positive and negative en- dorsements. We adapt social network analysis techniques to the problem of negative edge weights. In particular, we con- sider signed variants of global network characteristics such as the clustering coefficient, node-level characteristics such as centrality and popularity measures, and link-level character- istics such as distances and similarity measures. We evaluate these measures on the task of identifying unpopular users, as well as on the task of predicting the sign of links and show that the network exhibits multiplicative transitivity which allows algebraic methods based on matrix multiplication to be used. We compare our methods to traditional methods which are only suitable for positively weighted edges.
    15 years ago by @dbenz
     
     
  •  

    Tagging has emerged as a powerful mechanism that enables users to find, organize, and understand online entities. Recommender systems similarly enable users to efficiently navigate vast collections of items. Algorithms combining tags with recommenders may deliver both the automation inherent in recommenders, and the flexibility and conceptual comprehensibility inherent in tagging systems. In this paper we explore tagommenders, recommender algorithms that predict users’ preferences for items based on their inferred preferences for tags. We describe tag preference inference algorithms based on users’ interactions with tags and movies, and evaluate these algorithms based on tag preference ratings collected from 995 MovieLens users. We design and evaluate algorithms that predict users’ ratings for movies based on their inferred tag preferences. Our tag-based algorithms generate better recommendation rankings than state-of-the-art algorithms, and they may lead to flexible recommender systems that leverage the characteristics of items users find most important.
    15 years ago by @dbenz
     
     
  •  

    Social media sharing web sites like Flickr allow users to annotate images with free tags, which significantly facilitate Web image search and organization. However, the tags associated with an image generally are in a random order without any importance or relevance information, which limits the effectiveness of these tags in search and other applications. In this paper, we propose a tag ranking scheme, aiming to automatically rank the tags associated with a given image according to their relevance to the image content. We first estimate initial relevance scores for the tags based on probability density estimation, and then perform a random walk over a tag similarity graph to refine the relevance scores. Experimental results on a 50, 000 Flickr photo collection show that the proposed tag ranking method is both effective and efficient. We also apply tag ranking into three applications: (1) tag-based image search, (2) tag recommendation, and (3) group recommendation, which demonstrates that the proposed tag ranking approach really boosts the performances of social-tagging related applications.
    15 years ago by @dbenz
     
     
  •  

    This paper presents SOFIE, a system for automated on- tology extension. SOFIE can parse natural language docu- ments, extract ontological facts from them and link the facts into an ontology. SOFIE uses logical reasoning on the exist- ing knowledge and on the new knowledge in order to disam- biguate words to their most probable meaning, to reason on the meaning of text patterns and to take into account world knowledge axioms. This allows SOFIE to check the plau- sibility of hypotheses and to avoid inconsistencies with the ontology. The framework of SOFIE unites the paradigms of pattern matching, word sense disambiguation and ontolog- ical reasoning in one unified model. Our experiments show that SOFIE delivers high-quality output, even from unstruc- tured Internet documents.
    15 years ago by @dbenz
     
     
  •  

     
  •  

     
  •  

     
  •  

     
  •  

    In this paper we give models and algorithms to describe and analyze the collaboration among authors of Wikipedia from a network analytical perspective. The edit network encodes who interacts how with whom when editing an article; it significantly extends previous network models that code author communities in Wikipedia. Several characteristics summarizing some aspects of the organization process and allowing the analyst to identify certain types of authors can be obtained from the edit network. Moreover, we propose several indicators characterizing the global network structure and methods to visualize edit networks. It is shown that the structural network indicators are correlated with quality labels of the associated Wikipedia articles.
    15 years ago by @dbenz
     
     
  •  

    There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC to improve search. We will show live demos to find relations in the Wikipedia or to improve image search as well as our current research in the topic. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.
    15 years ago by @dbenz
     
     
  •  

    The increasing availability of GPS-enabled devices is changing the way people interact with the Web, and brings us a large amount of GPS trajectories representing people’s location histories. In this paper, based on multiple users’ GPS trajectories, we aim to mine interesting locations and classical travel sequences in a given geospatial region. Here, interesting locations mean the culturally important places, such as Tiananmen Square in Beijing, and frequented public areas, like shopping malls and restaurants, etc. Such information can help users understand surrounding locations, and would enable travel recommendation. In this work, we first model multiple individuals’ location histories with a tree-based hierarchical graph (TBHG). Second, based on the TBHG, we propose a HITS (Hypertext Induced Topic Search)-based inference model, which regards an individual’s access on a location as a directed link from the user to that location. This model infers the interest of a location by taking into account the following three factors. 1) The interest of a location depends on not only the number of users visiting this location but also these users’ travel experiences. 2) Users’ travel experiences and location interests have a mutual reinforcement relationship. 3) The interest of a location and the travel experience of a user are relative values and are region-related. Third, we mine the classical travel sequences among locations considering the interests of these locations and users’ travel experiences. We evaluated our system using a large GPS dataset collected by 107 users over a period of one year in the real world. As a result, our HITS-based inference model outperformed baseline approaches like rank-by-count and rank-by-frequency. Meanwhile, when considering the users’ travel experiences and location interests, we achieved a better performance beyond baselines, such as rank-by-count and rank-by-interest, etc.
    15 years ago by @dbenz