Misc,

A Statistical Learning Theory Approach of Bloat

S. Gelly, O. Teytaud, N. Bredeche, and M. Schoenauer.
www, (2005)

Abstract

Code bloat, the excessive increase of code size, is an important issue in Genetic Programming (GP). This paper proposes a theoretical analysis of code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function lies in the search space or not. Then, important mathematical results are proved using classical results from Statistical Learning. Namely, the Vapnik-Chervonenkis dimension of programs is computed, and further results from Statistical Learning allow to prove that a parsimonious fitness ensures Universal Consistency (the solution minimising the empirical error does converge to the best possible error when the number of examples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of examples might still result in programs of infinitely increasing size with their accuracy; a more complicated modification of the fitness is proposed that theoretically avoids unnecessary bloat while nevertheless preserving the Universal Consistency.

BibTeX key: gelly:2005:longBloat
entry type: misc
year: 2005
howpublished: www
notes: cited by 1068309 Replaced by DBLP:conf/cfap/GellyTBS05 Equipe TAO - INRIA Futurs LRI, Bat. 490, University Paris-Sud 91405 Orsay Cedex. France
size: 8 pages
Document: http://www.lri.fr/~gelly/paper/antibloatGecco2005_long_version.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@misc{gelly:2005:longBloat, abstract = {Code bloat, the excessive increase of code size, is an important issue in Genetic Programming (GP). This paper proposes a theoretical analysis of code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function lies in the search space or not. Then, important mathematical results are proved using classical results from Statistical Learning. Namely, the Vapnik-Chervonenkis dimension of programs is computed, and further results from Statistical Learning allow to prove that a parsimonious fitness ensures Universal Consistency (the solution minimising the empirical error does converge to the best possible error when the number of examples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of examples might still result in programs of infinitely increasing size with their accuracy; a more complicated modification of the fitness is proposed that theoretically avoids unnecessary bloat while nevertheless preserving the Universal Consistency.}, added-at = {2008-06-19T17:35:00.000+0200}, author = {Gelly, Sylvain and Teytaud, Olivier and Bredeche, Nicolas and Schoenauer, Marc}, biburl = {https://www.bibsonomy.org/bibtex/260d9bc5ea80ee988345b21a91a857d6e/brazovayeye}, howpublished = {www}, interhash = {df26d7dbf2c2d3511b119b8e614b9d4d}, intrahash = {60d9bc5ea80ee988345b21a91a857d6e}, keywords = {VC Vapnik-Chervonenkis, algorithms, bloat dimension, genetic programming,}, notes = {cited by \cite{1068309} Replaced by \cite{DBLP:conf/cfap/GellyTBS05} Equipe TAO - INRIA Futurs LRI, Bat. 490, University Paris-Sud 91405 Orsay Cedex. France}, size = {8 pages}, timestamp = {2008-06-19T17:40:10.000+0200}, title = {A Statistical Learning Theory Approach of Bloat}, url = {http://www.lri.fr/~gelly/paper/antibloatGecco2005_long_version.pdf}, year = 2005 }

BibSonomy

A Statistical Learning Theory Approach of Bloat

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on