Article,

A New Text Mining Approach for Finding Protein-to-Disease Associations

H. Al-Mubaid, and R. Singh.
American Journal of Biochemistry and Biotechnology, 1 (3): 145--152 (2005)

Abstract

Discovering significant relationships between biological entities from text documents is an important task for biologists in order to develop biological models for research and discovery, especially with the existing gigantic amounts of biomedical documents and the rate at which they are increasing everyday. We propose a new text mining method to extract associations between biological entities from text documents; and we focus and apply the method in our experiments on discovering proteins-to-diseases associations. The proposed method uses two sets of documents on the topic of interest a negative set and positive (or relevant) set and utilizes the concepts of expectation (ex), evidence (ev) and Z-scores in combining positive and negative evidences in determining the significant associations. Moreover, the method offers an efficient way to handle protein names, aliases and abbreviations and to disambiguate them from common abbreviations, gene symbols and such. We evaluated the method in discovering protein-to-disease associations from Medline abstracts and the results are very encouraging. We confirmed the correctness of the results, in each experiment, through articles from Medline. Our method was able to discover associations between certain proteins and various diseases like Alzheimer, Creutzfeldt-Jakob, Crohn Disease, Dengue, Jaundice, Lung cancer and more. For example, in Alzheimer test, the method ran on 83,933 abstracts and discovered that Alzheimer has significant association with 6 proteins, among them, Amyloid beta A4 protein precursor, Apolipoprotein E precursor and Presenilin 1 PMIDs: 8596911, 1465129, 8346443, 12614323, 8766720 and 8878479. We further tested our method on some already discovered and published relationships between genes and diseases and the method was also successful in supporting those discoveries.

BibTeX key: protein-disease2005
entry type: article
year: 2005
journal: American Journal of Biochemistry and Biotechnology
number: 3
pages: 145--152
volume: 1
Document: http://www.scipub.org/fulltext/ajbb/ajbb13145-152.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 protein-disease2005 %A Al-Mubaid, Hisham %A Singh, Rajit K %D 2005 %E 145-152, 2005 1 (3): %E 1553-3668, ISSN %J American Journal of Biochemistry and Biotechnology %K CAT CAT-REL-COOR CAT-REL-STAT disease mining protein text %N 3 %P 145--152 %T A New Text Mining Approach for Finding Protein-to-Disease Associations %U http://www.scipub.org/fulltext/ajbb/ajbb13145-152.pdf %V 1 %X Discovering significant relationships between biological entities from text documents is an important task for biologists in order to develop biological models for research and discovery, especially with the existing gigantic amounts of biomedical documents and the rate at which they are increasing everyday. We propose a new text mining method to extract associations between biological entities from text documents; and we focus and apply the method in our experiments on discovering proteins-to-diseases associations. The proposed method uses two sets of documents on the topic of interest a negative set and positive (or relevant) set and utilizes the concepts of expectation (ex), evidence (ev) and Z-scores in combining positive and negative evidences in determining the significant associations. Moreover, the method offers an efficient way to handle protein names, aliases and abbreviations and to disambiguate them from common abbreviations, gene symbols and such. We evaluated the method in discovering protein-to-disease associations from Medline abstracts and the results are very encouraging. We confirmed the correctness of the results, in each experiment, through articles from Medline. Our method was able to discover associations between certain proteins and various diseases like Alzheimer, Creutzfeldt-Jakob, Crohn Disease, Dengue, Jaundice, Lung cancer and more. For example, in Alzheimer test, the method ran on 83,933 abstracts and discovered that Alzheimer has significant association with 6 proteins, among them, Amyloid beta A4 protein precursor, Apolipoprotein E precursor and Presenilin 1 PMIDs: 8596911, 1465129, 8346443, 12614323, 8766720 and 8878479. We further tested our method on some already discovered and published relationships between genes and diseases and the method was also successful in supporting those discoveries.

@article{protein-disease2005, abstract = {Discovering significant relationships between biological entities from text documents is an important task for biologists in order to develop biological models for research and discovery, especially with the existing gigantic amounts of biomedical documents and the rate at which they are increasing everyday. We propose a new text mining method to extract associations between biological entities from text documents; and we focus and apply the method in our experiments on discovering proteins-to-diseases associations. The proposed method uses two sets of documents on the topic of interest [a negative set and positive (or relevant) set] and utilizes the concepts of expectation (ex), evidence (ev) and Z-scores in combining positive and negative evidences in determining the significant associations. Moreover, the method offers an efficient way to handle protein names, aliases and abbreviations and to disambiguate them from common abbreviations, gene symbols and such. We evaluated the method in discovering protein-to-disease associations from Medline abstracts and the results are very encouraging. We confirmed the correctness of the results, in each experiment, through articles from Medline. Our method was able to discover associations between certain proteins and various diseases like Alzheimer, Creutzfeldt-Jakob, Crohn Disease, Dengue, Jaundice, Lung cancer and more. For example, in Alzheimer test, the method ran on 83,933 abstracts and discovered that Alzheimer has significant association with 6 proteins, among them, Amyloid beta A4 protein precursor, Apolipoprotein E precursor and Presenilin 1 [PMIDs: 8596911, 1465129, 8346443, 12614323, 8766720 and 8878479]. We further tested our method on some already discovered and published relationships between genes and diseases and the method was also successful in supporting those discoveries.}, added-at = {2009-02-13T01:37:03.000+0100}, author = {Al-Mubaid, Hisham and Singh, Rajit K}, biburl = {https://www.bibsonomy.org/bibtex/2f5e76a46e14430af8f2722d97d90fb07/huiyangsfsu}, editor = {145-152, 2005 1 (3): and 1553-3668, ISSN}, interhash = {46dd336837a06ec127b7df7b7aec0e76}, intrahash = {f5e76a46e14430af8f2722d97d90fb07}, journal = {American Journal of Biochemistry and Biotechnology}, keywords = {CAT CAT-REL-COOR CAT-REL-STAT disease mining protein text}, number = 3, pages = {145--152}, timestamp = {2010-11-12T05:08:01.000+0100}, title = {A New Text Mining Approach for Finding Protein-to-Disease Associations}, url = {http://www.scipub.org/fulltext/ajbb/ajbb13145-152.pdf}, volume = 1, year = 2005 }

BibSonomy

A New Text Mining Approach for Finding Protein-to-Disease Associations

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on