copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

ChemicalTagger: A tool for semantic text-mining in chemistry.

L. Hawizy, D. Jessop, N. Adams, and P. Murray-Rust. Journal of cheminformatics, 3 (1): 17+ (May 16, 2011)
DOI: 10.1186/1758-2946-3-17

Abstract

BACKGROUND: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches.RESULTS: We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9\% for phrase recognition and 91.9\% for phrase-type identification (Action names).CONCLUSIONS: It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5\% precision.

Links and resources

BibTeX key: Hawizy2011
entry type: article
year: 2011
month: may
day: 16
journal: Journal of cheminformatics
number: 1
pages: 17+
volume: 3
citeulike-linkout-2: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117806/
citeulike-linkout-1: http://dx.doi.org/10.1186/1758-2946-3-17
citeulike-linkout-4: http://www.hubmed.org/display.cgi?uids=21575201
citeulike-linkout-3: http://view.ncbi.nlm.nih.gov/pubmed/21575201
citeulike-article-id: 9303115
pmid: 21575201
priority: 2
posted-at: 2011-11-08 10:25:38
pdf: file:///H:/publications/Hawizy2011.pdf
issn: 1758-2946
citeulike-linkout-0: http://www.jcheminf.com/content/3/1/17
pmcid: PMC3117806
DOI: 10.1186/1758-2946-3-17
url: http://www.jcheminf.com/content/3/1/17

@fairybasslet's tags highlighted

Cite this publication

search on

Meta data

Last update 5 years ago
Created 5 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

ChemicalTagger: A tool for semantic text-mining in chemistry.

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML ChemicalTagger: A tool for semantic text-mining in chemistry.

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

ChemicalTagger: A tool for semantic text-mining in chemistry.

Comments and Reviews
(0)