Text mining and web scraping involves chunk parsing and recognition of named entities (institutions, dates, titles)...The extraction of named entities is mostly based on a strategy that combines look up in gazetteers (lists of companies, cities, etc.) wit
Anything To Triples (any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents.
NYT10 is originally released by the paper "Sebastian Riedel, Limin Yao, and Andrew McCallum. Modeling relations and their mentions without labeled text."
M. Schwab, R. Jäschke, и F. Fischer. Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, стр. 110--115. Association for Computational Linguistics, (2023)
F. Arnold, и R. Jäschke. Proceedings of the Workshop Understanding LIterature references in academic full TExt at JCDL 2022, том 3220 из ULITE-ws '22, стр. 7--15. CEUR Workshop Proceedings, (2022)
M. Schwab, R. Jäschke, и F. Fischer. Proceedings of the 5th International Conference on Natural Language and Speech Processing, стр. 282--287. Association for Computational Linguistics, (2022)