Abstract
The emerging field called evolutionary computation
(EC) consists of the design and analysis of
probabilistic algorithms inspired by the principles of
natural selection and variation. Genetic Programming
(GP) is one subfield of EC that emphasizes (1) the
capability to discover and exploit intrinsic
characteristics of the problem domain and (2) the
flexibility to adapt the shape and complexity of
structures manipulated in order to fit the specificity
of the application. GP appears to be a robust
stochastic search technique that can be applied to
complex design, control, and knowledge discovery
applications. However, the computational load when
applying it to non-trivial problems is considerable.
The problem of understanding and controlling the GP
technique is notoriously difficult.
This dissertation describes theoretical and
experimental work aimed at (1) understanding the
characteristics and limitations of GP search, and (2)
extending the capabilities of GP in order to cope with
problems of increasing complexity. For the first
challenge, we focus on the properties and the role of
the problem representation and analyze the implicit
biases when manipulating variable length structures
represented as trees. We analyze uniform random
sampling on the space of tree structures to offer
insights into the role of procedural representations.
We describe the dynamics of a useful structural
property of representations called rooted tree
schemata. This demonstrates the role played by the
adaptive complexity of evolved structures. We also
capture the dynamics of behaviors using the concept of
population entropy.
The solution to the second challenge relies on the
observation that on non-trivial problems GP essentially
assembles and adapts the bits and pieces of a huge
monolithic model. We propose, instead, that the
learning system provide abstraction mechanisms for
adaptively creating and exploiting modularity and
problem decomposition. We evaluate previous steps in
this direction by looking at GP search biases and
complexity measures of solutions, such as expanded and
descriptional complexity, and we characterize the types
of modular structures that would result in a minimum
description length of solutions. Then we describe two
GP extension approaches for creating and exploiting
procedural abstractions in the form of subroutines in
order to facilitate scale-up. The first extension,
called Adaptive Representation (AR) is a heuristic
modular learning approach based on the discovery of
hierarchies of subroutines. The second extension,
called Evolutionary Divide-and-Conquer (EDC) views the
population as a pool of candidates for selecting a team
that solve the problem. Both techniques extract simple
or "natural" relationships and build modular
representations to explain data. The techniques are
brought to life in several increasingly complex
algorithms.
The effects of embedding procedural and data
abstraction mechanisms in the learning process are
evaluated from several perspectives, such as reuse of
code or structure, automatic problem decomposition,
generalization, and automatic discovery of features on
several challenging problems. AR was successfully
applied to the intelligent control of an agent in a
dynamic and non-deterministic environment. Ideas are
further extended for designing graphical models. EDC
was applied to regression problems characterized by
complex input spaces.
Users
Please
log in to take part in the discussion (add own reviews or comments).