Abstract
Motivation: A major goal of biomedical research in personalized
medicine is to find relationships between mutations and their
corresponding disease phenotypes. However, most of the disease-related
mutational data are currently buried in the biomedical
literature in textual form and lack the necessary structure to allow
easy retrieval and visualization. We introduce a high-throughput
computational method for the identification of relevant disease
mutations in PubMed abstracts applied to prostate (PCa) and breast
cancer (BCa) mutations.
Results: We developed the extractor of mutations (EMU)
tool to identify mutations and their associated genes. We
benchmarked EMU against MutationFinder—a tool to extract point
mutations from text. Our results show that both methods achieve
comparable performance on two manually curated datasets. We
also benchmarked EMU’s performance for extracting the complete
mutational information and phenotype. Remarkably, we show that
one of the steps in our approach, a filter based on sequence analysis,
increases the precision for that task from 0.34 to 0.59 (PCa) and from
0.39 to 0.61 (BCa). We also show that this high-throughput approach
can be extended to other diseases.
Discussion: Our method improves the current status of disease mutation
databases by significantly increasing the number of
annotated mutations. We found 51 and 128 mutations manually
verified to be related to PCa and Bca, respectively, that are not
currently annotated for these cancer types in the OMIM or Swiss-
Prot databases. EMU’s retrieval performance represents a 2-fold
improvement in the number of annotated mutations for PCa and
BCa. We further show that our method can benefit from full-text
analysis once there is an increase in Open Access availability of
full-text articles.
Availability: Freely available at: http://bioinf.umbc.edu/EMU/ftp.
Contact: mkann@umbc.edu
Supplementary information: Supplementary data are available at
Bioinformatics online.
Received on July 29, 2010; revised on November 17, 2010; accepted
on November 23, 2010
Users
Please
log in to take part in the discussion (add own reviews or comments).