Article,

A Multi-Method Approach for Proteomic Network Inference in 11 Human Cancers

, , , , , , and .
PLoS Comput Biol, 12 (2): e1004765+ (Feb 29, 2016)
DOI: 10.1371/journal.pcbi.1004765

Abstract

Protein expression and post-translational modification levels are tightly regulated in neoplastic cells to maintain cellular processes known as 'cancer hallmarks'. The first Pan-Cancer initiative of The Cancer Genome Atlas (TCGA) Research Network has aggregated protein expression profiles for 3,467 patient samples from 11 tumor types using the antibody based reverse phase protein array (RPPA) technology. The resultant proteomic data can be utilized to computationally infer protein-protein interaction (PPI) networks and to study the commonalities and differences across tumor types. In this study, we compare the performance of 13 established network inference methods in their capacity to retrieve the curated Pathway Commons interactions from RPPA data. We observe that no single method has the best performance in all tumor types, but a group of six methods, including diverse techniques such as correlation, mutual information, and regression, consistently rank highly among the tested methods. We utilize the high performing methods to obtain a consensus network; and identify four robust and densely connected modules that reveal biological processes as well as suggest antibody–related technical biases. Mapping the consensus network interactions to Reactome gene lists confirms the pan-cancer importance of signal transduction pathways, innate and adaptive immune signaling, cell cycle, metabolism, and DNA repair; and also suggests several biological processes that may be specific to a subset of tumor types. Our results illustrate the utility of the RPPA platform as a tool to study proteomic networks in cancer. Pan-cancer proteomic datasets from The Cancer Genome Atlas provide a unique opportunity to study the functions of proteins in human cancers. Such datasets, where proteins are measured in different conditions and where correlations are informative, can enable the discovery of potentially causal protein-protein interactions, which may in turn shed light on the function of proteins. However, it has been shown that the dominant correlations in a system can be the result of parallel transitive (i.e. indirect) interactions. A wide suite of computational methods has been proposed in the literature for the discrimination between direct and transitive interactions. These methods have been extensively tested for their performance in gene regulatory network inference due to the prevalence of mRNA data. However, the understanding of the performance and limitations of these methods in retrieving curated pathway interactions is lacking. Here, we utilize a high-throughput proteomic dataset from The Cancer Genome Atlas to systematically test different families of network inference methods. We observe that most methods are able to achieve a similar level of performance provided their parameter space is sufficiently explored; but a group of six methods consistently rank highly among the tested methods. The protein-protein interactions inferred by the high-performing methods reveal the pathways that are shared by or specific to different cancer types.

Tags

Users

  • @karthikraman

Comments and Reviews