The paper describes a named entity recognition system for Amharic, an under-resourced language, using a recurrent neural network, a bi-directional long short term memory model to identify and classify tokens into six predefined classes: Person, Location, Organization, Time, Title, and Other (non-named entity tokens). Word vectors based on semantic information are built for all tokens using an unsupervised learning algorithm, word2vec. The word vectors were merged with a set of specifically developed language independent features and together fed to the neural network model to predict the classes of the words. When evaluated by 10-fold cross-validation, the created Amharic named entity recogniser achieved good average precision (77.2%), but did worse on recall (63.4%), for a 69.7% F1-score.
Named entity recognition for Amharic using deep learning - IEEE Conference Publication