Abstract
It is now obvious that the rate-limiting step in high throughput experimentation is neither data acquisition nor
analysis, but rather our ability to interpret data on a genome-wide scale. Indeed, the explosion of data sampling
capacity combined with increasing publication rates greatly impairs our ability to find meaning in vast
collections of data. In order to support data interpretation, bioinformatic tools are needed to identify critical
information contained in large bodies of literature. However, extracting knowledge embedded in free text is an
arduous task, compounded in the biomedical field by an inconsistent gene nomenclature, domain-specific
language and restricted access to full text articles.
This paper presents a selection of currently available biomedical literature mining software. These tools rely
on statistic and, more recently, semantic analyses (Natural Language Processing) to automatically extract
information from the literature. In addition, a literature mining strategy has been developed to explore patterns of
term occurrences in abstracts. This method automatically identifies relevant keywords in collections of abstracts,
and uses a pattern discovery algorithm to generate a visual interface for exploring functional associations among
genes. Term occurrence heatmaps can also be combined with gene expression profiles to provide valuable
functional annotations. Furthermore, as demonstrated with tumor cell line literature profiling results, this
approach can be applied to a variety of themes beyond genomic data analysis. Altogether, these examples
illustrate how literature analysis can be employed to support knowledge discovery in biomedical research.
Users
Please
log in to take part in the discussion (add own reviews or comments).