This post is meant as a summary of many of the concepts that I learned in Marti Hearst's Natural Language Processing class at the UC Berkeley School of Information.
If you use the code, please kindly cite the following paper:
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, Xuan Zhu. Learning Entity and Relation Embeddings for Knowledge Graph Completion. The 29th AAAI Conference on Artificial Intelligence (AAAI'15).
ConceptNet Numberbatch consists of state-of-the-art semantic vectors (also known as word embeddings) that can be used directly as a representation of word meanings or as a starting point for further machine learning.
Stanford CoreNLP provides a set of human language technology tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and syntactic dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract particular or open-class relations between entity mentions, get the quotes people said, etc.
A dependency parser analyzes the grammatical structure of a sentence, establishing relationships between "head" words and words which modify those heads.
S. Bordia, и S. Bowman. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, стр. 7--15. Minneapolis, Minnesota, Association for Computational Linguistics, (июня 2019)
S. Blodgett, S. Barocas, H. Daumé III, и H. Wallach. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, стр. 5454--5476. Online, Association for Computational Linguistics, (июля 2020)