Improving NER Performance by Applying Text Summarization on Pharmaceutical Articles

J. Dobreva, N. Jofche, M. Jovanovik, and D. Trajanov. ICT Innovations 2020. Machine Learning and Applications , page 87--97. Cham, Springer International Publishing, (2020)


Analyzing long text articles in the pharmaceutical domain, for the purpose of knowledge extraction and recognizing entities of interest, is a tedious task. In our previous research efforts, we were able to develop a platform which successfully extracts entities and facts from pharmaceutical texts and populates a knowledge graph with the extracted knowledge. However, one drawback of our approach was the processing time; the analysis of a single text source was not interactive enough, and the batch processing of entire article datasets took too long. In this paper, we propose a modified pipeline where the texts are summarized before the analysis begins. With this, the source articles is reduced significantly, to a compact version which contains only the most commonly encountered entities. We show that by reducing the text size, we get knowledge extraction results comparable to the full text analysis approach and, at the same time, we significantly reduce the processing time, which is essential for getting both real-time results on single text sources, and faster results when analyzing entire batches of collected articles from the domain.

Links and resources

BibTeX key:
search on:

Comments and Reviews  

There is no review or comment yet. You can write one!


Cite this publication