Abstract
Adding examples of the majority class to the training set can have a detrimental effect on the learner's behavior: noisy or otherwise unreliable examples from the majority class can overwhelm the minority class. The paper discusses criteria to evaluate the utility of classifiers induced from such imbalanced training sets, gives explanation of the poor behavior of some learners under these circumstances, and suggests as a solution a simple technique called one-sided selection of examples. 1 Introduction The general topic of this paper is learning from examples described by pairs (x; c(x), where x is a vector of attribute values and c(x) is the corresponding concept label. For simplicity, we consider only problems where c(x) is either positive or negative, and all attributes are continuous. Since Fisher (1936), this task has received plenty of attention from statisticians as well as from researchers in artificial neural networks, AI, and ML. A typical scenario assumes the e...
Users
Please
log in to take part in the discussion (add own reviews or comments).