Proceedings,

On building a quantative food-disease-gene network

, , , and .
the 2nd International Conference on Bioinformatics and Computational Biology (BICoB 2010), (March 2010)

Abstract

Nutritional genomics is a new science that studies the relationship between foods (or nutrients), diseases, and genes. Large amounts of scientific findings have been published in this area, primarily in unstructured text. Moreover, given a pair of entities, different studies can report different findings. It is hence important to obtain a holistic view of the reported relationships. In this article, we describe an information extraction system aiming to reach this goal. The system integrates natural language processing techniques, domain ontology, statistical, and machine learning methods. It consists of four main modules: (1) entity extraction, which recognizes and extracts five types of entities: foods, chemicals (or nutrients), diseases, proteins and genes; (2) relationship extraction, which extracts binary relationships between entities; (3) relationship polarity analysis, which categorizes relationships into three groups: positive, negative, and neutral; and (4) strength analysis, which rates a relationship as weak, medium, or strong. To the best of our knowledge, we are the first to propose to analyze the polarity and strength of a binary relationship. We have evaluated our system using the GENIA corpus and datasets drawn from the MEDLINE database. The first two modules outperform the reported best results with an average F- score of 0.89 and 0.82, respectively; while the last two also achieve promising results with an accuracy of 0.75- 0.84 and ~0.90, respectively. Key words: nutritional genomics, text mining, relationship extraction, relationship polarity, relationship strength

Tags

Users

  • @huiyangsfsu

Comments and Reviews