copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Query-Sets + + : A Scalable Approach for Modeling Web Sites

B. Poblete, M. Spiliopoulou, and M. Mendoza. String Processing and Information Retrieval, volume 7024 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2011)
DOI: 10.1007/978-3-642-24583-1_13

Abstract

We explore an effective approach for modeling and classifying Web sites in the World Wide Web. The aim of this work is to classify Web sites using features which are independent of size, structure and vocabulary. We establish Web site similarity based on search engine query hits, which convey document relevance and utility in direct relation to users’ needs and interests. To achieve this, we use a generic Web site representation scheme over different feature spaces, built upon query traffic to the site’s documents. For this task we extend, in a non-trivial way, our prior work using query-sets for single document representation. We discuss why this previous methodology is not scalable for a large set of heterogeneous Web sites. We show that our models achieve very compact Web site representations. Furthermore, our experiments on site classification show excellent performance and quality/dimensionality trade-off. In particular, we sustain a reduction in the feature space to 5% of the size of the bag-of-words representation, while achieving 99% precision in our classification experiments on DMOZ.

@kmd-ovgu's tags highlighted

Cite this publication

@incollection{noKey, abstract = {We explore an effective approach for modeling and classifying Web sites in the World Wide Web. The aim of this work is to classify Web sites using features which are independent of size, structure and vocabulary. We establish Web site similarity based on search engine query hits, which convey document relevance and utility in direct relation to users’ needs and interests. To achieve this, we use a generic Web site representation scheme over different feature spaces, built upon query traffic to the site’s documents. For this task we extend, in a non-trivial way, our prior work using query-sets for single document representation. We discuss why this previous methodology is not scalable for a large set of heterogeneous Web sites. We show that our models achieve very compact Web site representations. Furthermore, our experiments on site classification show excellent performance and quality/dimensionality trade-off. In particular, we sustain a reduction in the feature space to 5% of the size of the bag-of-words representation, while achieving 99% precision in our classification experiments on DMOZ.}, added-at = {2014-06-20T12:34:20.000+0200}, author = {Poblete, Barbara and Spiliopoulou, Myra and Mendoza, Marcelo}, biburl = {https://www.bibsonomy.org/bibtex/2c61a1783da11859956e1d1b7aafd187e/kmd-ovgu}, booktitle = {String Processing and Information Retrieval}, doi = {10.1007/978-3-642-24583-1_13}, editor = {Grossi, Roberto and Sebastiani, Fabrizio and Silvestri, Fabrizio}, interhash = {d918ba496f8181dd4d0e2a6f4b2553bb}, intrahash = {c61a1783da11859956e1d1b7aafd187e}, isbn = {978-3-642-24582-4}, keywords = {kmd}, language = {English}, pages = {129-134}, publisher = {Springer Berlin Heidelberg}, series = {Lecture Notes in Computer Science}, timestamp = {2014-06-20T12:34:20.000+0200}, title = {Query-Sets + + : A Scalable Approach for Modeling Web Sites}, url = {http://dx.doi.org/10.1007/978-3-642-24583-1_13}, volume = 7024, year = 2011 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Query-Sets + + : A Scalable Approach for Modeling Web Sites

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Query-Sets + + : A Scalable Approach for Modeling Web Sites

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Query-Sets + + : A Scalable Approach for Modeling Web Sites

Comments and Reviews
(0)