auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning.
Babelfy is a unified graph-based approach to multilingual Entity Linking and Word Sense Disambiguation based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations.
LOD-a-lot democratizes access to the Linked Open Data (LOD) Cloud by serving more than 28 billion unique triples from 650K datasets from a single self-indexed file. This corpus can be queried online with a sustainable Linked Data Fragments interface, or it can be downloaded and consumed locally: LOD-a-lot is easy to deploy and only requires limited resources (524 GB of disk space and 15.7 GB of RAM), enabling web-scale repeatable experimentation and research from a high-end laptop.
In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
S. Cordeiro, C. Ramisch, M. Idiart, and A. Villavicencio. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1, page 1986--1997. The Association for Computer Linguistics, (2016)
M. Hartung, F. Kaupmann, S. Jebbara, and P. Cimiano. Proceedings of the 15th Meeting of the European Chapter of the Association for Computational Linguistics (EACL), 1, Association for Computational Linguistics, (2017)