Abstract
Over the past several years, the explosive growth of biological data
generated by new high-throughput instruments has virtually begun
to drown the biological community. There is no established infrastructure
to deal with these data in a consistent and successful fashion. This
thesis presents a new informatics platform capable of supporting
a large subsection of the experimental methods found in modem biology.
A consistent data definition strategy is outlined that can handle
gel electrophoresis, microarray, fluorescence activated cell sorting,
mass spectrometry, and microscopy within a single coherent set of
information object definitions. A key issue for interoperability
is that common attributes are made truly identical between the different
methods. This dramatically decreases the overhead of separate and
distinct classes for each method, and reserves the uniqueness for
attributes that are different between the methods. Thus, at least
one higher level of integration is obtained. The thesis shows that
rich object-oriented modeling together with object-relational database
features and the uniform treatment of data and metadata is an ideal
candidate for complex experimental information integration tasks.
This claim is substantiated by elaborating on the coherent set of
information object definitions and testing the corresponded database
using real experimental data. A first implementation of this work--ExperiBase--is
an integrated software platform to store and query data generated
by the leading experimental protocols used in biology within a single
database. It provides: comprehensive database features for searching
and classifying; web-based client interfaces; web services; data
import and export capabilities to accommodate other data repositories;
and direct support for metadata produced by analysis programs. Using
JDBC, Java Servlets and Java Server Pages, SOAP, XML, and IIOP/CORBA's
technologies, the information architecture is portable and platform
independent. The thesis develops an ExperiBase XML according to the
single coherent set of information object definitions, and also presents
a new way of database federation--translating heterogeneous database
schemas into the common ExperiBase XML schema and then merging the
output: XML messages to get data federated. ExperiBase has become
a reference implementation of the I3C Life Science Object Ontologies
group.
Users
Please
log in to take part in the discussion (add own reviews or comments).