@jaeschke

PEBL: Positive Example Based Learning for Web Page Classification Using SVM

, , and . Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 239--248. New York, NY, USA, ACM, (2002)
DOI: 10.1145/775047.775083

Abstract

Web page classification is one of the essential techniques for Web mining. Specifically, classifying Web pages of a user-interesting class is the first step of mining interesting information from the Web. However, constructing a classifier for an interesting class requires laborious pre-processing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of non-homepages (negative examples). In particular, collecting negative training examples requires arduous work and special caution to avoid biasing them. We introduce in this paper the Positive Example Based Learning (PEBL) framework for Web page classification which eliminates the need for manually collecting negative training examples in pre-processing. We present an algorithm called Mapping-Convergence (M-C) that achieves classification accuracy (with positive and unlabeled data) as high as that of traditional SVM (with positive and negative data). Our experiments show that when the M-C algorithm uses the same amount of positive examples as that of traditional SVM, the M-C algorithm performs as well as traditional SVM.

Links and resources

Tags

community

  • @jaeschke
  • @kzhou
  • @dblp
@jaeschke's tags highlighted