Abstract
We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high precision (99.8\%+) and, when using general organic chemical nomenclature, high recall (98.7-99.2\%). This software can serve as the basis for future open source developments of chemical name interpretation.
Users
Please
log in to take part in the discussion (add own reviews or comments).