
Adaptive information extraction from text by rule induction and generalisation

. Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2, page 1251--1256. San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., (2001)


(LP)<sup>2</sup> is a covering algorithm for adaptive Information Extraction from text (IE). It induces symbolic rules that insert SGML tags into texts by learning from examples found in a user-defined tagged corpus. Training is performed in two steps: initially a set of tagging rules is learned; then additional rules are induced to correct mistakes and imprecision in tagging. Induction is performed by bottom-up generalization of examples in the training corpus. Shallow knowledge about Natural Language Processing (NLP) is used in the generalization process. The algorithm has a considerable success story. From a scientific point of view, experiments report excellent results with respect to the current state of the art on two publicly available corpora. From an application point of view, a successful industrial IE tool has been based on (LP)<sup>2</sup>. Real world applications have been developed and licenses have been released to external companies for building other applications. This paper presents (LP)<sup>2</sup>, experimental results and applications, and discusses the role of shallow NLP in rule induction.


Adaptive information extraction from text by rule induction and generalisation

Links and resources



  • @seb
  • @sudhir
  • @flawed
  • @jil
  • @dblp
  • @diana
  • @cbrewster
@jil's tags highlighted