Article,

Mining metabolites: extracting the yeast metabolome from the literature

C. Nobata, P. Dobson, S. Iqbal, P. Mendes, J. Tsujii, D. Kell, and S. Ananiadou.
Metabolomics, 7 (1): 94--101 (Mar 1, 2011)
DOI: 10.1007/s11306-010-0251-6

Abstract

Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-010-0251-6) contains supplementary material, which is available to authorized users.

BibTeX key: Nobata2011
entry type: article
year: 2011
month: mar
day: 1
journal: Metabolomics
number: 1
pages: 94--101
publisher: Springer Boston
volume: 7
citeulike-linkout-2: http://view.ncbi.nlm.nih.gov/pubmed/21687783
citeulike-linkout-1: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111869/
citeulike-linkout-4: http://www.springerlink.com/content/e1727327007hx663
citeulike-linkout-3: http://www.hubmed.org/display.cgi?uids=21687783
citeulike-article-id: 8208819
pmid: 21687783
priority: 2
posted-at: 2011-12-22 18:50:59
issn: 1573-3890
citeulike-linkout-0: http://dx.doi.org/10.1007/s11306-010-0251-6
comment: 1573-3890
pmcid: PMC3111869
DOI: 10.1007/s11306-010-0251-6
url: http://dx.doi.org/10.1007/s11306-010-0251-6

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 Nobata2011 %A Nobata, Chikashi %A Dobson, Paul %A Iqbal, Syed %A Mendes, Pedro %A Tsujii, Jun'ichi %A Kell, Douglas %A Ananiadou, Sophia %D 2011 %I Springer Boston %J Metabolomics %K *cul cul metabolomics oclcml text-mining yeast %N 1 %P 94--101 %R 10.1007/s11306-010-0251-6 %T Mining metabolites: extracting the yeast metabolome from the literature %U http://dx.doi.org/10.1007/s11306-010-0251-6 %V 7 %X Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-010-0251-6) contains supplementary material, which is available to authorized users.

BibSonomy

Mining metabolites: extracting the yeast metabolome from the literature

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on