Inproceedings,

Opinion Mining for Biomedical Text Data: Feature Space Design and Feature Selection

, , and .
the Nineth International Workshop on Data Mining in Bioinformatics (BIOKDD 2010), (July 2010)

Abstract

Unstructured text (e.g., journal articles) remains as the primary means for publishing biomedical research results. To extract and integrate knowledge from such data, text mining has been routinely applied. One important task is extracting relationships between bio-entities such as foods and diseases. Most existing studies however stop short of further analyzing the extracted relationships such as the polarity and the level of certainty at which the authors reported on a given relationship. The latter is termed as the relationship strength and marked at three levels— weak, medium and strong. We have previously reported a preliminary study on this issue 22, and here we detail our studies on constructing a novel feature space towards effectively predicting the polarity and strength of a relationship. Unlike previous work, four types of polarity instead of three are considered, namely, positive, negative, neutral and no- relationship. Another contribution is that in addition to the commonly accepted lexicon-based features, we have identified a set of novel features that capture both the semantic and structural aspects of a relationship. Our intensive evaluations demonstrate that combining these new features with the lexicon-based ones can achieve the best accuracy for polarity prediction (~0.91). This however is not the case for strength prediction, where lexicon- based features alone are sufficient (~0.96).

Tags

Users

  • @huiyangsfsu

Comments and Reviews