Zusammenfassung
Most machine learning theory and practice is concerned with learning a single
task. In this thesis it is argued that in general there is insufficient
information in a single task for a learner to generalise well and that what is
required for good generalisation is information about many similar learning
tasks. Similar learning tasks form a body of prior information that can be used
to constrain the learner and make it generalise better. Examples of learning
scenarios in which there are many similar tasks are handwritten character
recognition and spoken word recognition.
The concept of the environment of a learner is introduced as a probability
measure over the set of learning problems the learner might be expected to
learn. It is shown how a sample from the environment may be used to learn a
representation, or recoding of the input space that is appropriate for the
environment. Learning a representation can equivalently be thought of as
learning the appropriate features of the environment. Bounds are derived on the
sample size required to ensure good generalisation from a representation
learning process. These bounds show that under certain circumstances learning a
representation appropriate for $n$ tasks reduces the number of examples
required of each task by a factor of $n$.
Once a representation is learnt it can be used to learn novel tasks from the
same environment, with the result that far fewer examples are required of the
new tasks to ensure good generalisation. Bounds are given on the number of
tasks and the number of samples from each task required to ensure that a
representation will be a good one for learning novel tasks.
The results on representation learning are generalised to cover any form of
automated hypothesis space bias.
Nutzer