Abstract

OSCAR3 is a tool for shallow, chemistry-specific parsing of chemical documents. It identifies (or attempts to identify): Chemical names: singular nouns, plurals, verbs etc., also formulae and acronyms, some enzymes and reaction names. Ontology terms: if you can do it by string-matching, you can get OSCAR to do it. Chemical data: Spectra, melting/boiling point, yield etc. in experimental sections.In addition, where possible the chemical names that are detected are annotated with structures, either via lookup or name-to-structure parsing (ÖPSIN"), and with identifiers from the chemical ontology ChEBI.OSCAR3 also includes the OSCAR Server, a Jetty-powered set of servlets. These provide the following services: Parsing of text/HTML by OSCAR. Text/InChI/SMILES/SMILES substructues/SMILES similarity search of papers, coupled with keyword and ontology-based search, using Lucene and the CDK. List of all names found / all names that co-occur with a search term or terms. Online management of a chemical/stopword lexicon. Manual editing of SciXML fragments containing named entities, for creating of gold standards and training data.OSCAR3 files may be downloaded from the sourceforge download page. As well as OSCAR3 itself, two OSCAR3 components have been separated out, and made available as separate modules: OPSIN, a chemical name-to-structure converter, and ChemTok, a tokeniser optimised for text containing chemical names.

Links and resources

Tags