Artikel,

A genetic algorithm for discovering small disjunct rules in data mining

D. Carvalho, und A. Freitas.
Applied Soft Computing, 2 (2): 75--88 (Dezember 2002)
DOI: doi:10.1016/S1568-4946(02)00031-5

Volltext

Zusammenfassung

This paper addresses the well-known classification task of data mining, where the goal is to discover rules predicting the class of examples (records of a given dataset). In the context of data mining, small disjuncts are rules covering a small number of examples. Hence, these rules are usually error-prone, which contributes to a decrease in predictive accuracy. At first glance, this is not a serious problem, since the impact on predictive accuracy should be small. However, although each small-disjunct covers few examples, the set of all small disjuncts can cover a large number of examples. This paper presents evidence that this is the case in several datasets. This paper also addresses the problem of small disjuncts by using a hybrid decision-tree/genetic-algorithm approach. In essence, examples belonging to large disjuncts are classified by rules produced by a decision-tree algorithm (C4.5), while examples belonging to small disjuncts are classified by a genetic-algorithm specifically designed for discovering small-disjunct rules. We present results comparing the predictive accuracy of this hybrid system with the prediction accuracy of three versions of C4.5 alone in eight public domain datasets. Overall, the results show that our hybrid system achieves better predictive accuracy than all three versions of C4.5 alone.

BibTeX-Schlüssel: Carvalho:2002:ASC
Eintragstyp: article
Jahr: 2002
Monat: December
Zeitschrift: Applied Soft Computing
Nummer: 2
Seiten: 75--88
Band: 2
DOI: doi:10.1016/S1568-4946(02)00031-5
Dokument: http://www.cs.kent.ac.uk/people/staff/aaf/my-publications-ukc.html

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Bitte melden Sie sich an um selbst Rezensionen oder Kommentare zu erstellen.

Zitieren Sie diese Publikation

%0 Journal Article %1 Carvalho:2002:ASC %A Carvalho, D. R. %A Freitas, A. A. %D 2002 %J Applied Soft Computing %K Rule Small algorithms, classification, data discovery, disjuncts genetic mining, %N 2 %P 75--88 %R doi:10.1016/S1568-4946(02)00031-5 %T A genetic algorithm for discovering small disjunct rules in data mining %U http://www.cs.kent.ac.uk/people/staff/aaf/my-publications-ukc.html %V 2 %X This paper addresses the well-known classification task of data mining, where the goal is to discover rules predicting the class of examples (records of a given dataset). In the context of data mining, small disjuncts are rules covering a small number of examples. Hence, these rules are usually error-prone, which contributes to a decrease in predictive accuracy. At first glance, this is not a serious problem, since the impact on predictive accuracy should be small. However, although each small-disjunct covers few examples, the set of all small disjuncts can cover a large number of examples. This paper presents evidence that this is the case in several datasets. This paper also addresses the problem of small disjuncts by using a hybrid decision-tree/genetic-algorithm approach. In essence, examples belonging to large disjuncts are classified by rules produced by a decision-tree algorithm (C4.5), while examples belonging to small disjuncts are classified by a genetic-algorithm specifically designed for discovering small-disjunct rules. We present results comparing the predictive accuracy of this hybrid system with the prediction accuracy of three versions of C4.5 alone in eight public domain datasets. Overall, the results show that our hybrid system achieves better predictive accuracy than all three versions of C4.5 alone.

@article{Carvalho:2002:ASC, abstract = {This paper addresses the well-known classification task of data mining, where the goal is to discover rules predicting the class of examples (records of a given dataset). In the context of data mining, small disjuncts are rules covering a small number of examples. Hence, these rules are usually error-prone, which contributes to a decrease in predictive accuracy. At first glance, this is not a serious problem, since the impact on predictive accuracy should be small. However, although each small-disjunct covers few examples, the set of all small disjuncts can cover a large number of examples. This paper presents evidence that this is the case in several datasets. This paper also addresses the problem of small disjuncts by using a hybrid decision-tree/genetic-algorithm approach. In essence, examples belonging to large disjuncts are classified by rules produced by a decision-tree algorithm (C4.5), while examples belonging to small disjuncts are classified by a genetic-algorithm specifically designed for discovering small-disjunct rules. We present results comparing the predictive accuracy of this hybrid system with the prediction accuracy of three versions of C4.5 alone in eight public domain datasets. Overall, the results show that our hybrid system achieves better predictive accuracy than all three versions of C4.5 alone.}, added-at = {2008-06-19T17:35:00.000+0200}, author = {Carvalho, D. R. and Freitas, A. A.}, biburl = {https://www.bibsonomy.org/bibtex/299a62363ffa15496d9a2d78f2adc1ea6/brazovayeye}, doi = {doi:10.1016/S1568-4946(02)00031-5}, interhash = {20c44ae7268282c27c37c5d5f2f2b496}, intrahash = {99a62363ffa15496d9a2d78f2adc1ea6}, journal = {Applied Soft Computing}, keywords = {Rule Small algorithms, classification, data discovery, disjuncts genetic mining,}, month = {December}, number = 2, pages = {75--88}, timestamp = {2008-06-19T17:37:24.000+0200}, title = {A genetic algorithm for discovering small disjunct rules in data mining}, url = {http://www.cs.kent.ac.uk/people/staff/aaf/my-publications-ukc.html}, volume = 2, year = 2002 }

BibSonomy

A genetic algorithm for discovering small disjunct rules in data mining

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf