Inproceedings,

On The Effect of Data Set Size on Bias And Variance in Classification Learning

D. Brain, and G. Webb.
Proceedings of the Fourth Australian Knowledge Acquisition Workshop (AKAW '99), page 117-128. Sydney, The University of New South Wales, (1999)

Abstract

With the advent of data mining, machine learning has come of age and is now a critical technology in many businesses. However, machine learning evolved in a different research context to that in which it now finds itself employed. A particularly important problem in the data mining world is working effectively with large data sets. However, most machine learning research has been conducted in the context of learning from very small data sets. To date most approaches to scaling up machine learning to large data sets have attempted to modify existing algorithms to deal with large data sets in a more computationally efficient and effective manner. But is this necessarily the best method? This paper explores the possibility of designing algorithms specifically for large data sets. Specifically, the paper looks at how increasing data set size affects bias and variance error decompositions for classification algorithms. Preliminary results of experiments to determine these effects are presented, showing that, as hypothesized variance can be expected to decrease as training set size increases. No clear effect of training set size on bias was observed. These results have profound implications for data mining from large data sets, indicating that developing effective learning algorithms for large data sets is not simply a matter of finding computationally efficient variants of existing learning algorithms.

BibTeX key: BrainWebb99
entry type: inproceedings
address: Sydney
booktitle: Proceedings of the Fourth Australian Knowledge Acquisition Workshop (AKAW '99)
year: 1999
pages: 117-128
publisher: The University of New South Wales
audit-trail: *
location: Sydney, Australia

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{BrainWebb99, abstract = {With the advent of data mining, machine learning has come of age and is now a critical technology in many businesses. However, machine learning evolved in a different research context to that in which it now finds itself employed. A particularly important problem in the data mining world is working effectively with large data sets. However, most machine learning research has been conducted in the context of learning from very small data sets. To date most approaches to scaling up machine learning to large data sets have attempted to modify existing algorithms to deal with large data sets in a more computationally efficient and effective manner. But is this necessarily the best method? This paper explores the possibility of designing algorithms specifically for large data sets. Specifically, the paper looks at how increasing data set size affects bias and variance error decompositions for classification algorithms. Preliminary results of experiments to determine these effects are presented, showing that, as hypothesized variance can be expected to decrease as training set size increases. No clear effect of training set size on bias was observed. These results have profound implications for data mining from large data sets, indicating that developing effective learning algorithms for large data sets is not simply a matter of finding computationally efficient variants of existing learning algorithms.}, added-at = {2016-03-20T05:42:04.000+0100}, address = {Sydney}, audit-trail = {*}, author = {Brain, D. and Webb, G. I.}, biburl = {https://www.bibsonomy.org/bibtex/2eb55c4bdfb45c25cad6b1c613e9ef74f/giwebb}, booktitle = {Proceedings of the Fourth Australian Knowledge Acquisition Workshop (AKAW '99)}, editor = {Richards, D. and Beydoun, G. and Hoffmann, A. and Compton, P.}, interhash = {ddd950aa58b184dcf89bc180a00d8416}, intrahash = {eb55c4bdfb45c25cad6b1c613e9ef74f}, keywords = {Learning datasets from large}, location = {Sydney, Australia}, pages = {117-128}, publisher = {The University of New South Wales}, timestamp = {2016-03-20T05:42:04.000+0100}, title = {On The Effect of Data Set Size on Bias And Variance in Classification Learning}, year = 1999 }

BibSonomy

On The Effect of Data Set Size on Bias And Variance in Classification Learning

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on