copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Improved search for socially annotated data

N. Sarkas, G. Das, and N. Koudas. Proc. VLDB Endow., (August 2009)

Abstract

Social annotation is an intuitive, on-line, collaborative process through which each element of a collection of resources (e.g., URLs, pictures, videos, etc.) is associated with a group of descriptive keywords, widely known as tags. Each such group is a concise and accurate summary of the relevant resource's content and is obtained via aggregating the opinion of individual users, as expressed in the form of short tag sequences. The availability of this information gives rise to a new searching paradigm where resources are retrieved and ranked based on the similarity of a keyword query to their accompanying tags. In this paper, we present a principled and efficient search and resource ranking methodology that utilizes exclusively the user-assigned tag sequences. Ranking is based on solid probabilistic foundations and our growing understanding of the dynamics and structure of the social annotation process, which we capture by employing powerful interpolated n-gram models on the tag sequences. The efficiency and applicability of the proposed solution to large data sets is guaranteed through the introduction of a novel and highly scalable constrained optimization framework, employed both for training and incrementally maintaining the n-gram models. We experimentally validate the efficiency and effectiveness of our solutions compared to other applicable approaches. Our evaluation is based on a large crawl of del.icio.us, numbering hundreds of thousands of users and millions of resources, thus demonstrating the applicability of our solutions to real-life, large scale systems. In particular, we demonstrate that the use of interpolated n-grams for modeling tag sequences results in superior ranking effectiveness, while the proposed optimization framework is superior in terms of performance both for obtaining ranking parameters and incrementally maintaining them.

Description

Improved search for socially annotated data

Cite this publication

%0 Journal Article %1 sarkas2009improved %A Sarkas, Nikos %A Das, Gautam %A Koudas, Nick %D 2009 %I VLDB Endowment %J Proc. VLDB Endow. %K ranking social-search tagging toread %P 778--789 %T Improved search for socially annotated data %U http://dl.acm.org/citation.cfm?id=1687627.1687715 %V 2 %X Social annotation is an intuitive, on-line, collaborative process through which each element of a collection of resources (e.g., URLs, pictures, videos, etc.) is associated with a group of descriptive keywords, widely known as tags. Each such group is a concise and accurate summary of the relevant resource's content and is obtained via aggregating the opinion of individual users, as expressed in the form of short tag sequences. The availability of this information gives rise to a new searching paradigm where resources are retrieved and ranked based on the similarity of a keyword query to their accompanying tags. In this paper, we present a principled and efficient search and resource ranking methodology that utilizes exclusively the user-assigned tag sequences. Ranking is based on solid probabilistic foundations and our growing understanding of the dynamics and structure of the social annotation process, which we capture by employing powerful interpolated n-gram models on the tag sequences. The efficiency and applicability of the proposed solution to large data sets is guaranteed through the introduction of a novel and highly scalable constrained optimization framework, employed both for training and incrementally maintaining the n-gram models. We experimentally validate the efficiency and effectiveness of our solutions compared to other applicable approaches. Our evaluation is based on a large crawl of del.icio.us, numbering hundreds of thousands of users and millions of resources, thus demonstrating the applicability of our solutions to real-life, large scale systems. In particular, we demonstrate that the use of interpolated n-grams for modeling tag sequences results in superior ranking effectiveness, while the proposed optimization framework is superior in terms of performance both for obtaining ranking parameters and incrementally maintaining them.

@article{sarkas2009improved, abstract = {Social annotation is an intuitive, on-line, collaborative process through which each element of a collection of resources (e.g., URLs, pictures, videos, etc.) is associated with a group of descriptive keywords, widely known as tags. Each such group is a concise and accurate summary of the relevant resource's content and is obtained via aggregating the opinion of individual users, as expressed in the form of short tag sequences. The availability of this information gives rise to a new searching paradigm where resources are retrieved and ranked based on the similarity of a keyword query to their accompanying tags. In this paper, we present a principled and efficient search and resource ranking methodology that utilizes exclusively the user-assigned tag sequences. Ranking is based on solid probabilistic foundations and our growing understanding of the dynamics and structure of the social annotation process, which we capture by employing powerful interpolated n-gram models on the tag sequences. The efficiency and applicability of the proposed solution to large data sets is guaranteed through the introduction of a novel and highly scalable constrained optimization framework, employed both for training and incrementally maintaining the n-gram models. We experimentally validate the efficiency and effectiveness of our solutions compared to other applicable approaches. Our evaluation is based on a large crawl of del.icio.us, numbering hundreds of thousands of users and millions of resources, thus demonstrating the applicability of our solutions to real-life, large scale systems. In particular, we demonstrate that the use of interpolated n-grams for modeling tag sequences results in superior ranking effectiveness, while the proposed optimization framework is superior in terms of performance both for obtaining ranking parameters and incrementally maintaining them.}, acmid = {1687715}, added-at = {2012-02-11T19:59:48.000+0100}, author = {Sarkas, Nikos and Das, Gautam and Koudas, Nick}, biburl = {https://www.bibsonomy.org/bibtex/2eb4af45f7872ffe54dd9d735a8738f77/beate}, description = {Improved search for socially annotated data}, interhash = {fc5eaed3510d129866329750b76d19a4}, intrahash = {eb4af45f7872ffe54dd9d735a8738f77}, issn = {2150-8097}, issue = {1}, issue_date = {August 2009}, journal = {Proc. VLDB Endow.}, keywords = {ranking social-search tagging toread}, month = {August}, numpages = {12}, pages = {778--789}, publisher = {VLDB Endowment}, timestamp = {2012-02-11T19:59:48.000+0100}, title = {Improved search for socially annotated data}, url = {http://dl.acm.org/citation.cfm?id=1687627.1687715}, volume = 2, year = 2009 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Improved search for socially annotated data

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Improved search for socially annotated data

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Improved search for socially annotated data

Comments and Reviews
(0)