Article,

Optimization of Search Results with Duplicate Page Elimination using Usage Data

, and .
International Journal on Network Security, 2 (2): 6 (April 2011)

Abstract

The performance and scalability of search engines are greatly affected by the presence of enormous amount of duplicate data on the World Wide Web. The flooded search results containing a large number of identical or near identical web pages affect the search efficiency and seek time of the users to find the desired information within the search results. When navigating through the results, the only information left behind by the users is the trace through the pages they accessed. This data is recorded in the query log files and usually referred to as Web Usage Data. In this paper, a novel technique for optimizing search efficiency by removing duplicate data from search results is being proposed, which utilizes the usage data stored in the query logs. The duplicate data detection is performed by the proposed Duplicate Data Detection (D3) algorithm, which works offline on the basis of favored user queries found by pre-mining the logs with query clustering. The proposed result optimization technique is supposed to enhance the search engine efficiency and effectiveness to a large scale.

Tags

Users

  • @ideseditor

Comments and Reviews