@giwebb

Proportional K-Interval Discretization for Naive-Bayes Classifiers

, and . Lecture Notes in Computer Science 2167: Proceedings of the 12th European Conference on Machine Learning (ECML'01), page 564-575. Berlin/Heidelberg, Springer-Verlag, (2001)

Abstract

This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.

Links and resources

Tags

community

  • @giwebb
  • @dblp
@giwebb's tags highlighted