Inbook,

Symbolic and Neural Learning of Named-Entity Recognition and Classification Systems in Two Languages

, , , , , and .
volume 18 of International Series in Intelligent Technologies, page 193--210. Springer Berlin / Heidelberg, (January 2002)http://www.springer.com/mathematics/book/978-0-7923-7645-3.

Abstract

This paper compares two alternative approaches to the problem of acquiring named-entity recognition and classification systems from training corpora, in two different languages. The process of named-entity recognition and classification is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, effective methods to acquire such systems automatically from data are very desirable. In this paper we compare two popular learning methods on this task: a decision-tree induction method and a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate data representation for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on large corpora of English and Greek texts and present the results. In addition to the good performance of both methods, one very interesting result is the fact that a simple representation of the data, which ignores the order of the words within a named entity, leads to improved results over a more complex approach that preserves word order.

Tags

Users

  • @petasis
  • @dblp

Comments and Reviews