PEBL: Positive Example Based Learning for Web Page Classification Using SVM
H. Yu, J. Han, and K. Chang. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 239--248. New York, NY, USA, ACM, (2002)
DOI: 10.1145/775047.775083
Abstract
Web page classification is one of the essential techniques for Web mining. Specifically, classifying Web pages of a user-interesting class is the first step of mining interesting information from the Web. However, constructing a classifier for an interesting class requires laborious pre-processing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of non-homepages (negative examples). In particular, collecting negative training examples requires arduous work and special caution to avoid biasing them. We introduce in this paper the Positive Example Based Learning (PEBL) framework for Web page classification which eliminates the need for manually collecting negative training examples in pre-processing. We present an algorithm called Mapping-Convergence (M-C) that achieves classification accuracy (with positive and unlabeled data) as high as that of traditional SVM (with positive and negative data). Our experiments show that when the M-C algorithm uses the same amount of positive examples as that of traditional SVM, the M-C algorithm performs as well as traditional SVM.
%0 Conference Paper
%1 yu2002positive
%A Yu, Hwanjo
%A Han, Jiawei
%A Chang, Kevin Chen-Chuan
%B Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
%C New York, NY, USA
%D 2002
%I ACM
%K classification example one-class pebl positive svm web
%P 239--248
%R 10.1145/775047.775083
%T PEBL: Positive Example Based Learning for Web Page Classification Using SVM
%U http://doi.acm.org/10.1145/775047.775083
%X Web page classification is one of the essential techniques for Web mining. Specifically, classifying Web pages of a user-interesting class is the first step of mining interesting information from the Web. However, constructing a classifier for an interesting class requires laborious pre-processing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of non-homepages (negative examples). In particular, collecting negative training examples requires arduous work and special caution to avoid biasing them. We introduce in this paper the Positive Example Based Learning (PEBL) framework for Web page classification which eliminates the need for manually collecting negative training examples in pre-processing. We present an algorithm called Mapping-Convergence (M-C) that achieves classification accuracy (with positive and unlabeled data) as high as that of traditional SVM (with positive and negative data). Our experiments show that when the M-C algorithm uses the same amount of positive examples as that of traditional SVM, the M-C algorithm performs as well as traditional SVM.
%@ 1-58113-567-X
@inproceedings{yu2002positive,
abstract = {Web page classification is one of the essential techniques for Web mining. Specifically, classifying Web pages of a user-interesting class is the first step of mining interesting information from the Web. However, constructing a classifier for an interesting class requires laborious pre-processing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of non-homepages (negative examples). In particular, collecting negative training examples requires arduous work and special caution to avoid biasing them. We introduce in this paper the Positive Example Based Learning (PEBL) framework for Web page classification which eliminates the need for manually collecting negative training examples in pre-processing. We present an algorithm called Mapping-Convergence (M-C) that achieves classification accuracy (with positive and unlabeled data) as high as that of traditional SVM (with positive and negative data). Our experiments show that when the M-C algorithm uses the same amount of positive examples as that of traditional SVM, the M-C algorithm performs as well as traditional SVM.},
acmid = {775083},
added-at = {2014-05-21T11:30:34.000+0200},
address = {New York, NY, USA},
author = {Yu, Hwanjo and Han, Jiawei and Chang, Kevin Chen-Chuan},
biburl = {https://www.bibsonomy.org/bibtex/2599e58b844a32ebf7714e19189356450/jaeschke},
booktitle = {Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
doi = {10.1145/775047.775083},
interhash = {c43980d2a7f4c69075f4cf84c419d4ce},
intrahash = {599e58b844a32ebf7714e19189356450},
isbn = {1-58113-567-X},
keywords = {classification example one-class pebl positive svm web},
location = {Edmonton, Alberta, Canada},
numpages = {10},
pages = {239--248},
publisher = {ACM},
series = {KDD '02},
timestamp = {2014-07-28T15:57:31.000+0200},
title = {PEBL: Positive Example Based Learning for Web Page Classification Using SVM},
url = {http://doi.acm.org/10.1145/775047.775083},
year = 2002
}