Incollection,

Inference of gene-regulatory networks: A statistical physics approach

, , , and .
Abstract Book of the XXIII IUPAP International Conference on Statistical Physics, Genova, Italy, (9-13 July 2007)

Abstract

Gene regulation is the basic mechanism driving cell development and differentiation: All cells in a multi-cellular organism carry the same genetic information, but the expression level of many gene varies strongly between different cell tissues. The most fundamental mechanism hereby is trancriptional regulation, i.e., the activation or repression of the expression of a gene via the binding of regulatory proteins, so-called transcription factors (TF), to the DNA in the vicinity of the beginning of the gene. Also these TF are coded at some point in the genome, and their expression is regulated. The resulting complex gene-regulatory network (GRN) represents the set of all such regulatory interactions. Since the experimental determination of GRN is a very complicated and time-extensive task, the inverse problem of inferring a GRN from gene-expression patterns for different tissues and/or environmental conditions has become one of the most challenging task in systems biology. It is complicated by the following facts: In general there are relatively few expression patterns (order of 100 microarrays) of very high dimension (order of 10000 genes), data are noisy due to intrinsic biological and measurement noise, the contained information is sparse (order or 1-10 TF for one gene), and gene-regulation is combinatorially controlled. The computational extraction of this sparse information is a NP-hard problem and therefore hardly feasible for exact algorithms on a genome-wide scale. In our work, we propose a new algorithm which is based on a message-passing procedure (belief propagation) which is equivalent to a generalized Bethe-Peierls approximation in statistical physics. We first show on well-controlled artificial data that our algorithm efficiently infers combinatorial control mechanisms, and it clearly outperforms pair-correlations based tools like co-expression or relevance networks. The case of artificial data can be characterized completely analytically using the replica method from spin-glass theory. We finally apply the algorithm to genome-wide expression data of baker's yeast under 173 environmental conditions. In this context we also integrate further biological knowledge in form of known and potential regulatory proteins (460 TFs, signaling molecules and structurally similar proteins) and of TF binding sites known from ChIP experiments. We find that the algorithm suggests sparse combinatorial control mechanisms which show a statistically relevant enrichment in gene ontology annotations. We critically discuss limitations of statistical inference due to the relatively small amount of expression patterns and due to their high noise level.

Tags

Users

  • @statphys23

Comments and Reviews