This project aims to develop an efficient rule based extractor of entries of references, located in scientific articles in English language. The application takes a pdf file or a directory of pdf and then returns an html file, containing the list of all entries with their respective title. Moreover the title of the article cited is searched through Google Web Service to get the URL that identifying the article on the web. If the URL provides on the page a Bibtex entry, this will appear in the html output under the relative entries, stolen from some typical site like citeseer, ieeexlpore etc. The application does not make search over pdf file based on images.
M. Schwab, R. Jäschke, and F. Fischer. Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, page 110--115. Association for Computational Linguistics, (2023)
M. Schwab, R. Jäschke, and F. Fischer. Proceedings of the 5th International Conference on Natural Language and Speech Processing, page 282--287. Association for Computational Linguistics, (2022)
F. Arnold, and R. Jäschke. Proceedings of the Workshop Understanding LIterature references in academic full TExt at JCDL 2022, volume 3220 of ULITE-ws '22, page 7--15. CEUR Workshop Proceedings, (2022)