Lesezeichen  26

  •  

    Fabrizio Silvestri, ISTI - CNR Ricardo Baeza-Yates, Yahoo! Research Abstract Web Search Engines have stored in their logs information about users since they started to operate. This information often serves many purposes. The primary focus of this tutorial is to introduce to the discipline of query mining by showing its foundations and by analyzing the basic algorithms and techniques that could be used to extract and to exploit useful knowledge from this (potentially) infinite source of information. Furthermore, participants to this tutorial will be given a unified view on the literature on query log analysis.
    vor 15 Jahren von @dbenz
     
     
  •  

    We analyse the corpus of user relationships of the Slash- dot technology news site. The data was collected from the Slashdot Zoo feature where users of the website can tag other users as friends and foes, providing positive and negative en- dorsements. We adapt social network analysis techniques to the problem of negative edge weights. In particular, we con- sider signed variants of global network characteristics such as the clustering coefficient, node-level characteristics such as centrality and popularity measures, and link-level character- istics such as distances and similarity measures. We evaluate these measures on the task of identifying unpopular users, as well as on the task of predicting the sign of links and show that the network exhibits multiplicative transitivity which allows algebraic methods based on matrix multiplication to be used. We compare our methods to traditional methods which are only suitable for positively weighted edges.
    vor 15 Jahren von @dbenz
     
     
  •  

    There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC to improve search. We will show live demos to find relations in the Wikipedia or to improve image search as well as our current research in the topic. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.
    vor 15 Jahren von @dbenz
     
     
  •  

    Social tagging provides valuable and crucial information for large-scale web image retrieval. It is ontology-free and easy to obtain; however, irrelevant tags frequently appear, and users typically will not tag all semantic objects in the image, which is also called semantic loss. To avoid noises and compensate for the semantic loss, tag recommendation is proposed in literature. However, current recommendation simply ranks the related tags based on the single modality of tag co-occurrence on the whole dataset, which ignores other modalities, such as visual correlation. This paper proposes a multi-modality recommendation based on both tag and visual correlation, and formulates the tag recommendation as a learning problem. Each modality is used to generate a ranking feature, and Rankboost algorithm is applied to learn an optimal combination of these ranking features from different modalities. Experiments on Flickr data demonstrate the effectiveness of this learning-based multi-modality recommendation strategy.
    vor 15 Jahren von @dbenz
     
     
  •  

    Traditional machine learning methods only consider relationships between feature values within individual data instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of these higher-order co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.
    vor 15 Jahren von @dbenz
     
     
  •  

    Social media sharing web sites like Flickr allow users to annotate images with free tags, which significantly facilitate Web image search and organization. However, the tags associated with an image generally are in a random order without any importance or relevance information, which limits the effectiveness of these tags in search and other applications. In this paper, we propose a tag ranking scheme, aiming to automatically rank the tags associated with a given image according to their relevance to the image content. We first estimate initial relevance scores for the tags based on probability density estimation, and then perform a random walk over a tag similarity graph to refine the relevance scores. Experimental results on a 50, 000 Flickr photo collection show that the proposed tag ranking method is both effective and efficient. We also apply tag ranking into three applications: (1) tag-based image search, (2) tag recommendation, and (3) group recommendation, which demonstrates that the proposed tag ranking approach really boosts the performances of social-tagging related applications.
    vor 15 Jahren von @dbenz
     
     
  •  

     
  •  

    Web searches tend to be short and ambiguous. It is therefore not surprising that Web query disambiguation is an actively researched topic. To provide a personalized experience for a user, most existing work relies on search engine log data in which the search activities of that particular user, as well as other users, are recorded over long periods of time. Such approaches may raise privacy concerns and may be difficult to implement for pragmatic reasons. We present an approach to Web query disambiguation that bases its predictions only on a short glimpse of user search activity, captured in a brief session of 4-6 previous searches on average. Our method exploits the relations of the current search session to previous similarly short sessions of other users in order to predict the user’s intentions and is based on Markov logic, a statistical relational learning model that has been successfully applied to challenging language problems in the past. We present empirical results that demonstrate the effectiveness of our proposed approach on data collected from a commercial general-purpose search engine.
    vor 15 Jahren von @dbenz
     
     
  •  

    Tagging has emerged as a powerful mechanism that enables users to find, organize, and understand online entities. Recommender systems similarly enable users to efficiently navigate vast collections of items. Algorithms combining tags with recommenders may deliver both the automation inherent in recommenders, and the flexibility and conceptual comprehensibility inherent in tagging systems. In this paper we explore tagommenders, recommender algorithms that predict users’ preferences for items based on their inferred preferences for tags. We describe tag preference inference algorithms based on users’ interactions with tags and movies, and evaluate these algorithms based on tag preference ratings collected from 995 MovieLens users. We design and evaluate algorithms that predict users’ ratings for movies based on their inferred tag preferences. Our tag-based algorithms generate better recommendation rankings than state-of-the-art algorithms, and they may lead to flexible recommender systems that leverage the characteristics of items users find most important.
    vor 15 Jahren von @dbenz
     
     
  •  

    The increasing availability of GPS-enabled devices is changing the way people interact with the Web, and brings us a large amount of GPS trajectories representing people’s location histories. In this paper, based on multiple users’ GPS trajectories, we aim to mine interesting locations and classical travel sequences in a given geospatial region. Here, interesting locations mean the culturally important places, such as Tiananmen Square in Beijing, and frequented public areas, like shopping malls and restaurants, etc. Such information can help users understand surrounding locations, and would enable travel recommendation. In this work, we first model multiple individuals’ location histories with a tree-based hierarchical graph (TBHG). Second, based on the TBHG, we propose a HITS (Hypertext Induced Topic Search)-based inference model, which regards an individual’s access on a location as a directed link from the user to that location. This model infers the interest of a location by taking into account the following three factors. 1) The interest of a location depends on not only the number of users visiting this location but also these users’ travel experiences. 2) Users’ travel experiences and location interests have a mutual reinforcement relationship. 3) The interest of a location and the travel experience of a user are relative values and are region-related. Third, we mine the classical travel sequences among locations considering the interests of these locations and users’ travel experiences. We evaluated our system using a large GPS dataset collected by 107 users over a period of one year in the real world. As a result, our HITS-based inference model outperformed baseline approaches like rank-by-count and rank-by-frequency. Meanwhile, when considering the users’ travel experiences and location interests, we achieved a better performance beyond baselines, such as rank-by-count and rank-by-interest, etc.
    vor 15 Jahren von @dbenz