S. Gelly, O. Teytaud, N. Bredeche, and M. Schoenauer. GECCO 2005: Proceedings of the 2005 conference on
Genetic and evolutionary computation, 2, page 1783--1784. Washington DC, USA, ACM Press, (25-29 June 2005)
Abstract
Code bloat, the excessive increase of code size, is an
important issue in Genetic Programming (GP). This paper
proposes a theoretical analysis of code bloat in the
framework of symbolic regression in GP, from the
viewpoint of Statistical Learning Theory, a well
grounded mathematical toolbox for Machine Learning. Two
kinds of bloat must be distinguished in that context,
depending whether the target function lies in the
search space or not. Then, important mathematical
results are proved using classical results from
Statistical Learning. Namely, the Vapnik-Chervonenkis
dimension of programs is computed, and further results
from Statistical Learning allow to prove that a
parsimonious fitness ensures Universal Consistency (the
solution minimising the empirical error does converge
to the best possible error when the number of examples
goes to infinity). However, it is proved that the
standard method consisting in choosing a maximal
program size depending on the number of examples might
still result in programs of infinitely increasing size
with their accuracy; a more complicated modification of
the fitness is proposed that theoretically avoids
unnecessary bloat while nevertheless preserving the
Universal Consistency.
Full paper available at
http://www.lri.fr/~teytaud/longBloat.pdf
gelly:2005:longBloat
GECCO 2005: Proceedings of the 2005 conference on
Genetic and evolutionary computation
year
2005
month
25-29 June
pages
1783--1784
publisher
ACM Press
volume
2
organisation
ACM SIGEVO (formerly ISGEC)
publisher_address
New York, NY, 10286-1405, USA
size
2 pages
isbn
1-59593-010-8
notes
GECCO-2005 A joint meeting of the fourteenth
international conference on genetic algorithms
(ICGA-2005) and the tenth annual genetic programming
conference (GP-2005).
ACM Order Number 910052
eabloat.pdf is substantially more complete than poster
in GECCO proceedings
%0 Conference Paper
%1 1068309
%A Gelly, Sylvain
%A Teytaud, Olivier
%A Bredeche, Nicolas
%A Schoenauer, Marc
%B GECCO 2005: Proceedings of the 2005 conference on
Genetic and evolutionary computation
%C Washington DC, USA
%D 2005
%E Beyer, Hans-Georg
%E O'Reilly, Una-May
%E Arnold, Dirk V.
%E Banzhaf, Wolfgang
%E Blum, Christian
%E Bonabeau, Eric W.
%E Cantu-Paz, Erick
%E Dasgupta, Dipankar
%E Deb, Kalyanmoy
%E Foster, James A.
%E de
Jong, Edwin D.
%E Lipson, Hod
%E Llora, Xavier
%E Mancoridis, Spiros
%E Pelikan, Martin
%E Raidl, Guenther R.
%E Soule, Terence
%E Tyrrell, Andy M.
%E Watson, Jean-Paul
%E Zitzler, Eckart
%I ACM Press
%K Poster, algorithms, bloat, code genetic growth, learning programming, reliability, statistical theory theory,
%P 1783--1784
%T A statistical learning theory approach of bloat
%U http://doi.acm.org/10.1145/1068009.1068309
%V 2
%X Code bloat, the excessive increase of code size, is an
important issue in Genetic Programming (GP). This paper
proposes a theoretical analysis of code bloat in the
framework of symbolic regression in GP, from the
viewpoint of Statistical Learning Theory, a well
grounded mathematical toolbox for Machine Learning. Two
kinds of bloat must be distinguished in that context,
depending whether the target function lies in the
search space or not. Then, important mathematical
results are proved using classical results from
Statistical Learning. Namely, the Vapnik-Chervonenkis
dimension of programs is computed, and further results
from Statistical Learning allow to prove that a
parsimonious fitness ensures Universal Consistency (the
solution minimising the empirical error does converge
to the best possible error when the number of examples
goes to infinity). However, it is proved that the
standard method consisting in choosing a maximal
program size depending on the number of examples might
still result in programs of infinitely increasing size
with their accuracy; a more complicated modification of
the fitness is proposed that theoretically avoids
unnecessary bloat while nevertheless preserving the
Universal Consistency.
Full paper available at
http://www.lri.fr/~teytaud/longBloat.pdf
gelly:2005:longBloat
%@ 1-59593-010-8
@inproceedings{1068309,
abstract = {Code bloat, the excessive increase of code size, is an
important issue in Genetic Programming (GP). This paper
proposes a theoretical analysis of code bloat in the
framework of symbolic regression in GP, from the
viewpoint of Statistical Learning Theory, a well
grounded mathematical toolbox for Machine Learning. Two
kinds of bloat must be distinguished in that context,
depending whether the target function lies in the
search space or not. Then, important mathematical
results are proved using classical results from
Statistical Learning. Namely, the Vapnik-Chervonenkis
dimension of programs is computed, and further results
from Statistical Learning allow to prove that a
parsimonious fitness ensures Universal Consistency (the
solution minimising the empirical error does converge
to the best possible error when the number of examples
goes to infinity). However, it is proved that the
standard method consisting in choosing a maximal
program size depending on the number of examples might
still result in programs of infinitely increasing size
with their accuracy; a more complicated modification of
the fitness is proposed that theoretically avoids
unnecessary bloat while nevertheless preserving the
Universal Consistency.
Full paper available at
http://www.lri.fr/~teytaud/longBloat.pdf
\cite{gelly:2005:longBloat}},
added-at = {2008-06-19T17:35:00.000+0200},
address = {Washington DC, USA},
author = {Gelly, Sylvain and Teytaud, Olivier and Bredeche, Nicolas and Schoenauer, Marc},
biburl = {https://www.bibsonomy.org/bibtex/2bbe6781639cdd0b378dedf6d940adb63/brazovayeye},
booktitle = {{GECCO 2005}: Proceedings of the 2005 conference on
Genetic and evolutionary computation},
editor = {Beyer, Hans-Georg and O'Reilly, Una-May and Arnold, Dirk V. and Banzhaf, Wolfgang and Blum, Christian and Bonabeau, Eric W. and Cantu-Paz, Erick and Dasgupta, Dipankar and Deb, Kalyanmoy and Foster, James A. and {de
Jong}, Edwin D. and Lipson, Hod and Llora, Xavier and Mancoridis, Spiros and Pelikan, Martin and Raidl, Guenther R. and Soule, Terence and Tyrrell, Andy M. and Watson, Jean-Paul and Zitzler, Eckart},
interhash = {df26d7dbf2c2d3511b119b8e614b9d4d},
intrahash = {bbe6781639cdd0b378dedf6d940adb63},
isbn = {1-59593-010-8},
keywords = {Poster, algorithms, bloat, code genetic growth, learning programming, reliability, statistical theory theory,},
month = {25-29 June},
notes = {GECCO-2005 A joint meeting of the fourteenth
international conference on genetic algorithms
(ICGA-2005) and the tenth annual genetic programming
conference (GP-2005).
ACM Order Number 910052
eabloat.pdf is substantially more complete than poster
in GECCO proceedings},
organisation = {ACM SIGEVO (formerly ISGEC)},
pages = {1783--1784},
publisher = {ACM Press},
publisher_address = {New York, NY, 10286-1405, USA},
size = {2 pages},
timestamp = {2008-06-19T17:40:09.000+0200},
title = {A statistical learning theory approach of bloat},
url = {http://doi.acm.org/10.1145/1068009.1068309},
volume = 2,
year = 2005
}