Abstract
Discovering significant relationships between biological entities from text documents is an
important task for biologists in order to develop biological models for research and discovery,
especially with the existing gigantic amounts of biomedical documents and the rate at which they are
increasing everyday. We propose a new text mining method to extract associations between biological
entities from text documents; and we focus and apply the method in our experiments on discovering
proteins-to-diseases associations. The proposed method uses two sets of documents on the topic of
interest a negative set and positive (or relevant) set and utilizes the concepts of expectation (ex),
evidence (ev) and Z-scores in combining positive and negative evidences in determining the significant
associations. Moreover, the method offers an efficient way to handle protein names, aliases and
abbreviations and to disambiguate them from common abbreviations, gene symbols and such. We
evaluated the method in discovering protein-to-disease associations from Medline abstracts and the
results are very encouraging. We confirmed the correctness of the results, in each experiment, through
articles from Medline. Our method was able to discover associations between certain proteins and
various diseases like Alzheimer, Creutzfeldt-Jakob, Crohn Disease, Dengue, Jaundice, Lung cancer
and more. For example, in Alzheimer test, the method ran on 83,933 abstracts and discovered that
Alzheimer has significant association with 6 proteins, among them, Amyloid beta A4 protein precursor,
Apolipoprotein E precursor and Presenilin 1 PMIDs: 8596911, 1465129, 8346443, 12614323,
8766720 and 8878479. We further tested our method on some already discovered and published
relationships between genes and diseases and the method was also successful in supporting those
discoveries.
Users
Please
log in to take part in the discussion (add own reviews or comments).