PhD thesis,

ExperiBase : an integrated software architecture to support modern experimental biology

Massachusetts Institute of Technology, Dept. of Mechanical Engineering, (2004)


Over the past several years, the explosive growth of biological data generated by new high-throughput instruments has virtually begun to drown the biological community. There is no established infrastructure to deal with these data in a consistent and successful fashion. This thesis presents a new informatics platform capable of supporting a large subsection of the experimental methods found in modem biology. A consistent data definition strategy is outlined that can handle gel electrophoresis, microarray, fluorescence activated cell sorting, mass spectrometry, and microscopy within a single coherent set of information object definitions. A key issue for interoperability is that common attributes are made truly identical between the different methods. This dramatically decreases the overhead of separate and distinct classes for each method, and reserves the uniqueness for attributes that are different between the methods. Thus, at least one higher level of integration is obtained. The thesis shows that rich object-oriented modeling together with object-relational database features and the uniform treatment of data and metadata is an ideal candidate for complex experimental information integration tasks. This claim is substantiated by elaborating on the coherent set of information object definitions and testing the corresponded database using real experimental data. A first implementation of this work--ExperiBase--is an integrated software platform to store and query data generated by the leading experimental protocols used in biology within a single database. It provides: comprehensive database features for searching and classifying; web-based client interfaces; web services; data import and export capabilities to accommodate other data repositories; and direct support for metadata produced by analysis programs. Using JDBC, Java Servlets and Java Server Pages, SOAP, XML, and IIOP/CORBA's technologies, the information architecture is portable and platform independent. The thesis develops an ExperiBase XML according to the single coherent set of information object definitions, and also presents a new way of database federation--translating heterogeneous database schemas into the common ExperiBase XML schema and then merging the output: XML messages to get data federated. ExperiBase has become a reference implementation of the I3C Life Science Object Ontologies group.



  • @p_ansell

Comments and Reviews