Article,

AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN SPECIFIC AND INCREMENTAL CRAWLING

M. Farooqui, D. Beg, and D. Rafiq.
International Journal on Web Service Computing (IJWSC), 3 (3): 85-93 (September 2012)
DOI: 10.5121/ijwsc.2012.3308

Full text

Abstract

The size of the internet is large and it had grown enormously search engines are the tools for Web site navigation and search. Search engines maintain indices for web documents and provide search facilities by continuously downloading Web pages for processing. This process of downloading web pages is known as web crawling. In this paper we propose the architecture for Effective Migrating Parallel Web Crawling approach with domain specific and incremental crawling strategy that makes web crawling system more effective and efficient. The major advantages of migrating parallel web crawler are that the analysis portion of the crawling process is done locally at the residence of data rather than inside the Web search engine repository. This significantly reduces network load and traffic which in turn improves the performance, effectiveness and efficiency of the crawling process. The another advantage of migrating parallel crawler is that as the size of the Web grows, it becomes necessary to parallelize a crawling process, in order to finish downloading web pages in a comparatively shorter time. Domain specific crawling will yield high quality pages. The crawling process will migrate to host or server with specific domain and start downloading pages within specific domain. Incremental crawling will keep the pages in local database fresh thus increasing the quality of downloaded pages.

BibTeX key: noauthororeditor
entry type: article
year: 2012
month: September
journal: International Journal on Web Service Computing (IJWSC)
number: 3
pages: 85-93
volume: 3
language: English
issn: 0976 - 9811 (Online); 2230 - 7702 (print)
DOI: 10.5121/ijwsc.2012.3308
Document: http://airccse.org/journal/jwsc/papers/3312ijwsc08.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{noauthororeditor, abstract = {The size of the internet is large and it had grown enormously search engines are the tools for Web site navigation and search. Search engines maintain indices for web documents and provide search facilities by continuously downloading Web pages for processing. This process of downloading web pages is known as web crawling. In this paper we propose the architecture for Effective Migrating Parallel Web Crawling approach with domain specific and incremental crawling strategy that makes web crawling system more effective and efficient. The major advantages of migrating parallel web crawler are that the analysis portion of the crawling process is done locally at the residence of data rather than inside the Web search engine repository. This significantly reduces network load and traffic which in turn improves the performance, effectiveness and efficiency of the crawling process. The another advantage of migrating parallel crawler is that as the size of the Web grows, it becomes necessary to parallelize a crawling process, in order to finish downloading web pages in a comparatively shorter time. Domain specific crawling will yield high quality pages. The crawling process will migrate to host or server with specific domain and start downloading pages within specific domain. Incremental crawling will keep the pages in local database fresh thus increasing the quality of downloaded pages.}, added-at = {2020-12-24T06:28:13.000+0100}, author = {Farooqui, Md. Faizan and Beg, Dr. Md. Rizwan and Rafiq, Dr. Md. Qasim}, biburl = {https://www.bibsonomy.org/bibtex/2546816ba56fa3112cb1e083debd85546/ijwsc}, doi = {10.5121/ijwsc.2012.3308}, interhash = {f78bcc846803a69c42b728e3e35a3311}, intrahash = {546816ba56fa3112cb1e083debd85546}, issn = {0976 - 9811 (Online); 2230 - 7702 (print)}, journal = {International Journal on Web Service Computing (IJWSC)}, keywords = {Web crawler crawling engine migrating parallel search web}, language = {English}, month = {September}, number = 3, pages = {85-93}, timestamp = {2020-12-24T06:28:13.000+0100}, title = {AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN SPECIFIC AND INCREMENTAL CRAWLING}, url = {http://airccse.org/journal/jwsc/papers/3312ijwsc08.pdf}, volume = 3, year = 2012 }

BibSonomy

AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN SPECIFIC AND INCREMENTAL CRAWLING

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on